Architecture Documentation
21 UML diagrams created with StarUML, synced from source code
Table of Contents
Core Modules (9)
Core Modules
Package Overview
Layered architecture with 8 core modules and 5 utility modules. Core: CLICore, Scrapers, Adaptors, Analysis, Enhancement, Packaging, MCP, Sync.
CLI Core
CLIDispatcher maps subcommands via COMMAND_MODULES. CreateCommand auto-detects source type via SourceDetector, initializes ExecutionContext singleton, then calls get_converter() → converter.run().
Scrapers
18 converter classes inheriting SkillConverter base class (Template Method: run() → extract() → build_skill()). Factory: get_converter(source_type, config) via CONVERTER_REGISTRY.
Adaptors
SkillAdaptor ABC with 3 abstract methods. Two-level hierarchy: direct subclasses (Claude, Gemini, OpenAI, etc.) and OpenAICompatibleAdaptor intermediate (MiniMax, Kimi, DeepSeek, Qwen, etc.).
C3.x Analysis Pipeline
UnifiedCodebaseAnalyzer orchestrates: CodeAnalyzer (AST, 9 languages), PatternRecognizer (10 GoF detectors), TestExampleExtractor, HowToGuideBuilder, ConfigExtractor, and more.
Enhancement
Two enhancement hierarchies: AIEnhancer (API mode, multi-provider via AgentClient) and UnifiedEnhancer (C3.x pipeline enhancers). WorkflowEngine orchestrates multi-stage enhancement workflows.
Packaging
PackageSkill delegates to adaptors for format-specific packaging. UploadSkill handles platform API uploads. InstallSkill/InstallAgent install to AI agent directories.
MCP Server
SkillSeekerMCPServer (FastMCP) with 40 tools in 10 categories. Supporting: SourceManager, AgentDetector, GitConfigRepo, MarketplacePublisher, ConfigPublisher.
Sync
SyncMonitor schedules periodic checks via ChangeDetector (SHA-256 hashing, HTTP headers, content diffing). Notifier sends alerts when changes are found.
Utility Modules
Parsers
SubcommandParser ABC with 18 subclasses. All source types route through CreateParser.
Storage
BaseStorageAdaptor ABC with S3, GCS, Azure implementations.
Embedding
EmbeddingGenerator (multi-provider: OpenAI, Sentence Transformers, Voyage AI). EmbeddingPipeline coordinates provider, caching, and cost tracking.
Benchmark
BenchmarkRunner orchestrates Benchmark instances. BenchmarkResult collects timings/memory/metrics and produces reports.
Utilities
16 shared helper classes: LanguageDetector, MarkdownCleaner, RAGChunker, RateLimitHandler, ConfigValidator, and more.
Behavioral Diagrams
Create Pipeline Sequence
CreateCommand pipeline: SourceDetector.detect() → ExecutionContext.initialize() → get_converter() → converter.run() → _run_enhancement() → _run_workflows(). Enhancement centralized in CreateCommand.
GitHub Unified Flow + C3.x
UnifiedScraper orchestrates GitHub scraping (3-stream fetch) then delegates to analyze_codebase() for all 5 C3.x stages.
Source Auto-Detection
Activity diagram showing source_detector.py decision tree: file extension → video URL → directory (Codebase) → GitHub pattern → HTTP URL → bare domain inference.
MCP Tool Invocation
MCP Client → FastMCPServer (stdio/HTTP) with two paths: Path A (scraping) uses get_converter() in-process, Path B (packaging/config) uses direct Python imports.
Enhancement Pipeline
--enhance-level decision flow: Level 0 skips AI, Level 1+ selects API/local mode via AgentClient, Level 2+ enables architecture enhancement, Level 3 adds patterns and tests.
Runtime Components
Component diagram with runtime dependencies. CLI Core → Scrapers → Codebase Analysis → Enhancement. MCP Server reaches Scrapers via get_converter(). Optional Browser Renderer (Playwright) for SPA sites.
Browser Rendering Flow
When --browser flag is set, DocScraper delegates to BrowserRenderer.render_page() instead of requests.get(). Renderer auto-installs Chromium, navigates and waits for JavaScript execution.
Key Design Patterns
Adaptors, Storage, Embedding
Scrapers (18 converters)
Configuration (ExecutionContext)
CLI Dispatch
Pattern Detection, Parsers