github - Repository Scraping

Scrape GitHub repositories and analyze code with deep AST parsing.

Basic Usage

skill-seekers github [OPTIONS]

Quick Examples

# Basic repository scraping
skill-seekers github --repo facebook/react

# Using a config file
skill-seekers github --config configs/react_github.json

# With authentication (higher rate limits)
export GITHUB_TOKEN=ghp_your_token_here
skill-seekers github --repo facebook/react

# Include issues and changelog
skill-seekers github --repo django/django \
    --include-issues \
    --max-issues 100 \
    --include-changelog \
    --include-releases

Options

Required

  • --repo OWNER/REPO - GitHub repository (e.g., facebook/react)
  • OR --config CONFIG - Load configuration from file

Optional

  • --include-issues - Extract GitHub Issues
  • --max-issues N - Limit issue count (default: 50)
  • --include-changelog - Extract CHANGELOG.md
  • --include-releases - Extract GitHub Releases
  • --code-analysis-depth DEPTH - Analysis depth: surface, medium, deep
  • --output DIR - Output directory

Authentication

# Set GitHub token for higher rate limits
export GITHUB_TOKEN=ghp_your_token_here

Features

Code Analysis

  • Deep AST Parsing - Python, JavaScript, TypeScript, Java, C++, Go
  • API Extraction - Functions, classes, methods with parameters
  • Type Detection - Automatic type inference
  • Conflict Detection - Compare docs vs code implementation

Repository Metadata

  • README - Automatic extraction
  • File Tree - Complete directory structure
  • Language Breakdown - By file count and bytes
  • Stars/Forks - Repository statistics

GitHub Features

  • Issues & PRs - Open/closed with labels
  • CHANGELOG - Version history
  • Releases - GitHub releases with notes
  • Milestones - Project planning

Analysis Depths

Surface (Fast - 1-2 min)

skill-seekers github --repo facebook/react --code-analysis-depth surface
  • File structure
  • Import relationships
  • Entry points
  • Basic metadata

Medium (Standard - 5-10 min)

skill-seekers github --repo facebook/react --code-analysis-depth medium
  • Everything from surface
  • Function/class signatures
  • API documentation
  • Common patterns

Deep (Comprehensive - 20-60 min)

skill-seekers github --repo facebook/react --code-analysis-depth deep
  • Everything from medium
  • Design pattern detection (C3.1)
  • Test example extraction (C3.2)
  • How-to guide generation (C3.3)
  • Configuration analysis (C3.4)
  • Architectural patterns (C3.7)

Output Structure

output/
└── {repo-name}/
    ├── SKILL.md
    ├── references/
    │   ├── index.md
    │   ├── api_reference.md
    │   ├── code_examples.md
    │   └── github_issues.md
    └── c3_analysis_temp/    # C3.x analysis data
        ├── patterns/
        ├── test_examples/
        └── config_patterns/

Advanced Examples

With All Features

export GITHUB_TOKEN=ghp_...

skill-seekers github --repo fastapi/fastapi \
    --code-analysis-depth deep \
    --include-issues \
    --max-issues 200 \
    --include-changelog \
    --include-releases \
    --output output/fastapi

Config File

{
  "name": "fastapi",
  "type": "github",
  "repo": "fastapi/fastapi",
  "include_code": true,
  "code_analysis_depth": "deep",
  "include_issues": true,
  "max_issues": 100,
  "include_changelog": true,
  "include_releases": true
}

Time Estimates

  • Surface analysis: 1-2 minutes
  • Medium analysis: 5-10 minutes
  • Deep analysis (C3.x): 20-60 minutes

Next Steps