Architecture Documentation

21 UML diagrams created with StarUML, synced from source code

Core Modules

Package Overview

Layered architecture with 8 core modules and 5 utility modules. Core: CLICore, Scrapers, Adaptors, Analysis, Enhancement, Packaging, MCP, Sync.

Package Overview

CLI Core

CLIDispatcher maps subcommands via COMMAND_MODULES. CreateCommand auto-detects source type via SourceDetector, initializes ExecutionContext singleton, then calls get_converter() → converter.run().

CLI Core

Scrapers

18 converter classes inheriting SkillConverter base class (Template Method: run() → extract() → build_skill()). Factory: get_converter(source_type, config) via CONVERTER_REGISTRY.

Scrapers

Adaptors

SkillAdaptor ABC with 3 abstract methods. Two-level hierarchy: direct subclasses (Claude, Gemini, OpenAI, etc.) and OpenAICompatibleAdaptor intermediate (MiniMax, Kimi, DeepSeek, Qwen, etc.).

Adaptors

C3.x Analysis Pipeline

UnifiedCodebaseAnalyzer orchestrates: CodeAnalyzer (AST, 9 languages), PatternRecognizer (10 GoF detectors), TestExampleExtractor, HowToGuideBuilder, ConfigExtractor, and more.

C3.x Analysis Pipeline

Enhancement

Two enhancement hierarchies: AIEnhancer (API mode, multi-provider via AgentClient) and UnifiedEnhancer (C3.x pipeline enhancers). WorkflowEngine orchestrates multi-stage enhancement workflows.

Enhancement

Packaging

PackageSkill delegates to adaptors for format-specific packaging. UploadSkill handles platform API uploads. InstallSkill/InstallAgent install to AI agent directories.

Packaging

MCP Server

SkillSeekerMCPServer (FastMCP) with 40 tools in 10 categories. Supporting: SourceManager, AgentDetector, GitConfigRepo, MarketplacePublisher, ConfigPublisher.

MCP Server

Sync

SyncMonitor schedules periodic checks via ChangeDetector (SHA-256 hashing, HTTP headers, content diffing). Notifier sends alerts when changes are found.

Sync

Utility Modules

Parsers

SubcommandParser ABC with 18 subclasses. All source types route through CreateParser.

Parsers

Storage

BaseStorageAdaptor ABC with S3, GCS, Azure implementations.

Storage

Embedding

EmbeddingGenerator (multi-provider: OpenAI, Sentence Transformers, Voyage AI). EmbeddingPipeline coordinates provider, caching, and cost tracking.

Embedding

Benchmark

BenchmarkRunner orchestrates Benchmark instances. BenchmarkResult collects timings/memory/metrics and produces reports.

Benchmark

Utilities

16 shared helper classes: LanguageDetector, MarkdownCleaner, RAGChunker, RateLimitHandler, ConfigValidator, and more.

Utilities

Behavioral Diagrams

Create Pipeline Sequence

CreateCommand pipeline: SourceDetector.detect() → ExecutionContext.initialize() → get_converter() → converter.run() → _run_enhancement() → _run_workflows(). Enhancement centralized in CreateCommand.

Create Pipeline Sequence

GitHub Unified Flow + C3.x

UnifiedScraper orchestrates GitHub scraping (3-stream fetch) then delegates to analyze_codebase() for all 5 C3.x stages.

GitHub Unified Flow + C3.x

Source Auto-Detection

Activity diagram showing source_detector.py decision tree: file extension → video URL → directory (Codebase) → GitHub pattern → HTTP URL → bare domain inference.

Source Auto-Detection

MCP Tool Invocation

MCP Client → FastMCPServer (stdio/HTTP) with two paths: Path A (scraping) uses get_converter() in-process, Path B (packaging/config) uses direct Python imports.

MCP Tool Invocation

Enhancement Pipeline

--enhance-level decision flow: Level 0 skips AI, Level 1+ selects API/local mode via AgentClient, Level 2+ enables architecture enhancement, Level 3 adds patterns and tests.

Enhancement Pipeline

Runtime Components

Component diagram with runtime dependencies. CLI Core → Scrapers → Codebase Analysis → Enhancement. MCP Server reaches Scrapers via get_converter(). Optional Browser Renderer (Playwright) for SPA sites.

Runtime Components

Browser Rendering Flow

When --browser flag is set, DocScraper delegates to BrowserRenderer.render_page() instead of requests.get(). Renderer auto-installs Chromium, navigates and waits for JavaScript execution.

Browser Rendering Flow

Key Design Patterns

Strategy + Factory

Adaptors, Storage, Embedding

Template Method + Factory

Scrapers (18 converters)

Singleton

Configuration (ExecutionContext)

Command

CLI Dispatch

Template Method

Pattern Detection, Parsers