Configuration Schema Reference

Complete reference for the Skill Seekers unified configuration format introduced in v2.6.0.

Overview

The unified config format allows you to combine multiple sources (documentation, GitHub, PDFs) into a single AI skill with intelligent content merging.

Schema Version: v2.6.0 Format: JSON Backward Compatible: Yes (legacy configs still supported)

Top-Level Fields

Required Fields

`name` (string)

Required - Unique identifier for the config.

Must be lowercase alphanumeric with hyphens or underscores
Used as the skill filename
Pattern: ^[a-z0-9-_]+$

{
  "name": "react"
}

`description` (string)

Required - Human-readable description of what the skill covers.

Should explain when to use this skill
Appears in skill metadata and gallery
Recommended length: 1-2 sentences

{
  "description": "Complete React knowledge combining official documentation and React codebase. Use when building React applications or understanding React internals."
}

`sources` (array)

Required - Array of source configurations (at least 1 required).

Supports 3 source types: documentation, github, pdf
Multiple sources are merged intelligently
Order matters for merge conflict resolution

{
  "sources": [
    { "type": "documentation", "base_url": "..." },
    { "type": "github", "repo": "..." }
  ]
}

Optional Fields

`merge_mode` (string)

Optional - Content merging strategy when using multiple sources.

Values: "rule-based" (default) | "claude-enhanced"
rule-based: Deterministic merging based on categories
claude-enhanced: AI-powered intelligent merging

{
  "merge_mode": "rule-based"
}

Source Types

Documentation Source

Extract content from documentation websites.

Required Fields

type (string) - Must be "documentation"
base_url (string) - Base URL of documentation site (with http:// or https://)

Optional Fields

`extract_api` (boolean)

Default: false - Whether to extract API reference sections separately.

{
  "extract_api": true
}

`start_urls` (array of strings)

Specific URLs to start scraping from (bypasses automatic discovery).

{
  "start_urls": [
    "https://docs.example.com/getting-started/",
    "https://docs.example.com/api/"
  ]
}

`selectors` (object)

CSS selectors for extracting content.

Fields:

main_content (string) - Main documentation content
title (string) - Page title
code_blocks (string) - Code examples

{
  "selectors": {
    "main_content": "article.docs",
    "title": "h1",
    "code_blocks": "pre code"
  }
}

`url_patterns` (object)

Control which URLs to include/exclude.

Fields:

include (array of strings) - URL patterns to include
exclude (array of strings) - URL patterns to exclude

{
  "url_patterns": {
    "include": ["/docs/", "/api/", "/guide/"],
    "exclude": ["/blog/", "/changelog/"]
  }
}

`categories` (object)

Categorize pages for better organization and merging.

Keys: category names
Values: arrays of keywords/patterns

{
  "categories": {
    "getting_started": ["intro", "installation", "quickstart"],
    "api": ["api", "reference", "methods"],
    "guides": ["tutorial", "guide", "how-to"]
  }
}

`rate_limit` (number)

Delay between requests in seconds (prevents rate limiting).

{
  "rate_limit": 0.5
}

`max_pages` (number or null)

Optional - Maximum pages to scrape. Defaults to unlimited if not specified.

Omit field: Unlimited scraping (recommended)
null: Explicitly unlimited
-1: Explicitly unlimited
Number: Limit to specific page count

{
  // Option 1: Omit for unlimited (recommended)
  // "max_pages": not specified

  // Option 2: Explicit unlimited
  "max_pages": null

  // Option 3: Limit to specific count
  // "max_pages": 300
}

Note: Since v2.6.0, unlimited scraping is the default. Only specify max_pages if you need to limit pages for testing or rate-limit concerns.

Complete Documentation Source Example

{
  "type": "documentation",
  "base_url": "https://docs.astro.build/en/",
  "extract_api": true,
  "start_urls": [
    "https://docs.astro.build/en/getting-started/",
    "https://docs.astro.build/en/core-concepts/"
  ],
  "selectors": {
    "main_content": "article",
    "title": "h1",
    "code_blocks": "pre code"
  },
  "url_patterns": {
    "include": ["/en/getting-started/", "/en/core-concepts/", "/en/guides/"],
    "exclude": ["/en/community/", "/en/blog/"]
  },
  "categories": {
    "getting_started": ["getting-started", "install"],
    "core_concepts": ["project-structure", "components"],
    "integrations": ["integrations", "framework"]
  },
  "rate_limit": 0.5
}

GitHub Source

Extract code, issues, and repository metadata from GitHub.

Required Fields

type (string) - Must be "github"
repo (string) - Repository in format owner/repo

Optional Fields

`enable_codebase_analysis` (boolean)

Default: false - Enable C3.x codebase analysis with AST parsing.

{
  "enable_codebase_analysis": true
}

`code_analysis_depth` (string)

Depth of code analysis when C3.x is enabled.

Values: "surface" | "deep" | "full"
surface: File structure, exports, imports
deep: Function signatures, class definitions
full: Complete AST analysis

{
  "code_analysis_depth": "deep"
}

`fetch_issues` (boolean)

Default: false - Include GitHub issues in the skill.

{
  "fetch_issues": true
}

`max_issues` (number)

Maximum number of issues to fetch (requires fetch_issues: true).

{
  "max_issues": 100
}

`fetch_changelog` (boolean)

Default: false - Extract CHANGELOG.md if it exists.

{
  "fetch_changelog": true
}

`fetch_releases` (boolean)

Default: false - Include GitHub releases.

{
  "fetch_releases": true
}

`file_patterns` (array of strings)

Glob patterns for files to analyze (requires codebase analysis).

{
  "file_patterns": [
    "src/**/*.ts",
    "packages/**/*.ts",
    "core/**/*.js"
  ]
}

`ai_mode` (string)

AI enhancement mode for C3.x analysis.

Values: "auto" | "api" | "local" | "none"
auto: Detect best mode automatically
api: Use Claude API for enhancement
local: Use local models
none: Skip AI enhancement

{
  "ai_mode": "auto"
}

Complete GitHub Source Example

{
  "type": "github",
  "repo": "withastro/astro",
  "enable_codebase_analysis": true,
  "code_analysis_depth": "deep",
  "fetch_issues": true,
  "max_issues": 100,
  "fetch_changelog": true,
  "fetch_releases": true,
  "file_patterns": [
    "packages/astro/src/**/*.ts",
    "packages/integrations/**/*.ts"
  ],
  "ai_mode": "auto"
}

PDF Source

Extract content from PDF documents.

Required Fields

type (string) - Must be "pdf"
path (string) - Path to PDF file (local or remote URL)

Optional Fields

`ocr` (boolean)

Default: false - Enable OCR for scanned PDFs.

{
  "ocr": true
}

`password` (string)

Password for encrypted PDFs.

{
  "password": "secret123"
}

`extract_tables` (boolean)

Default: false - Extract tables as structured data.

{
  "extract_tables": true
}

`parallel` (boolean)

Default: false - Process pages in parallel for faster extraction.

{
  "parallel": true
}

Complete PDF Source Example

{
  "type": "pdf",
  "path": "/path/to/manual.pdf",
  "ocr": true,
  "extract_tables": true,
  "parallel": true
}

Complete Examples

Single Source (Documentation Only)

{
  "name": "vue",
  "description": "Vue.js 3 documentation for building reactive web applications.",
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://vuejs.org/guide/",
      "selectors": {
        "main_content": "article",
        "title": "h1",
        "code_blocks": "pre code"
      },
      "rate_limit": 0.5
    }
  ]
}

Multi-Source (Docs + GitHub)

{
  "name": "fastapi",
  "description": "Complete FastAPI knowledge combining documentation and codebase analysis. Use for building high-performance Python APIs.",
  "merge_mode": "rule-based",
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://fastapi.tiangolo.com/",
      "extract_api": true,
      "selectors": {
        "main_content": ".md-content",
        "title": "h1",
        "code_blocks": "pre code"
      },
      "categories": {
        "getting_started": ["tutorial", "first-steps"],
        "advanced": ["advanced", "security", "database"],
        "deployment": ["deployment", "docker", "server"]
      },
      "rate_limit": 0.5
    },
    {
      "type": "github",
      "repo": "tiangolo/fastapi",
      "enable_codebase_analysis": true,
      "code_analysis_depth": "deep",
      "fetch_issues": true,
      "max_issues": 50,
      "file_patterns": [
        "fastapi/**/*.py"
      ]
    }
  ]
}

Multi-Source (Docs + GitHub + PDF)

{
  "name": "enterprise-system",
  "description": "Complete enterprise system documentation from multiple sources.",
  "merge_mode": "claude-enhanced",
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://docs.enterprise.com/",
      "rate_limit": 1.0
    },
    {
      "type": "github",
      "repo": "company/enterprise-system",
      "enable_codebase_analysis": true,
      "code_analysis_depth": "full",
      "file_patterns": ["src/**/*.ts", "api/**/*.ts"]
    },
    {
      "type": "pdf",
      "path": "/docs/enterprise-manual-v2.pdf",
      "extract_tables": true,
      "parallel": true
    }
  ]
}

Validation

Using the CLI

# Validate a config file
skill-seekers validate configs/my-config.json

# Convert legacy config to unified format
skill-seekers convert configs/legacy-config.json

Using the Web Validator

Visit skillseekersweb.com/configs and scroll to the “Validate Your Config” section:

Paste your JSON config
Click “Validate Config”
Fix any errors
Submit to GitHub when valid

Legacy Config Support

Skill Seekers v2.6.0+ still supports legacy configs (single-source format).

Legacy format:

{
  "name": "example",
  "description": "Example docs",
  "base_url": "https://docs.example.com",
  "selectors": { "main_content": "article" }
}

Automatically converts to:

{
  "name": "example",
  "description": "Example docs",
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://docs.example.com",
      "selectors": { "main_content": "article" }
    }
  ]
}

Best Practices

1. Naming Conventions

Use lowercase with hyphens: fast-api, react-router
Match the official framework name when possible
Be descriptive: godot-game-engine not just godot

2. Description Guidelines

Explain what knowledge the skill covers
Include when to use the skill
Keep it concise (1-2 sentences)
Use action verbs: “Use when building…”, “Helps with…“

3. Multi-Source Configs

Order sources by authority (official docs first, then GitHub)
Use merge_mode: "rule-based" for deterministic results
Use merge_mode: "claude-enhanced" for complex merging
Define clear categories for better merging

4. Rate Limiting

Start with rate_limit: 0.5 (500ms delay)
Increase if you get rate-limited
Official docs: 0.5-1.0 seconds
Community sites: 1.0-2.0 seconds

5. Page Limits

Default: Unlimited scraping (recommended for complete documentation)
When to limit: Testing configs, respecting aggressive rate limits
How to limit: Set "max_pages": 100 for specific page count
Unlimited modes: Omit field, use null, or use -1

6. GitHub Codebase Analysis

Use code_analysis_depth: "deep" for most cases
Use "full" only for critical framework analysis
Limit file_patterns to relevant directories
Set max_issues to avoid overwhelming content

Migration from Legacy Configs

If you have legacy configs (pre-v2.6.0), you can:

Option 1: Automatic Conversion

skill-seekers convert configs/legacy-config.json > configs/new-config.json

Option 2: Manual Migration

Old (legacy):

{
  "name": "django",
  "base_url": "https://docs.djangoproject.com/en/stable/",
  "selectors": { "main_content": "div.document" }
}

New (unified):

{
  "name": "django",
  "description": "Django web framework documentation",
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://docs.djangoproject.com/en/stable/",
      "selectors": { "main_content": "div.document" }
    }
  ]
}

Configuration Schema Reference

Overview

Top-Level Fields

Required Fields

name (string)

description (string)

sources (array)

Optional Fields

merge_mode (string)

Source Types

Documentation Source

Required Fields

Optional Fields

extract_api (boolean)

start_urls (array of strings)

selectors (object)

url_patterns (object)

categories (object)

rate_limit (number)

max_pages (number or null)

Complete Documentation Source Example

GitHub Source

Required Fields

Optional Fields

enable_codebase_analysis (boolean)

code_analysis_depth (string)

fetch_issues (boolean)

max_issues (number)

fetch_changelog (boolean)

fetch_releases (boolean)

file_patterns (array of strings)

ai_mode (string)

Complete GitHub Source Example

PDF Source

Required Fields

Optional Fields

ocr (boolean)

password (string)

extract_tables (boolean)

parallel (boolean)

Complete PDF Source Example

Complete Examples

Single Source (Documentation Only)

Multi-Source (Docs + GitHub)

Multi-Source (Docs + GitHub + PDF)

Validation

Using the CLI

Using the Web Validator

Legacy Config Support

Best Practices

1. Naming Conventions

2. Description Guidelines

3. Multi-Source Configs

4. Rate Limiting

5. Page Limits

6. GitHub Codebase Analysis

Migration from Legacy Configs

Option 1: Automatic Conversion

Option 2: Manual Migration

See Also

`name` (string)

`description` (string)

`sources` (array)

`merge_mode` (string)

`extract_api` (boolean)

`start_urls` (array of strings)

`selectors` (object)

`url_patterns` (object)

`categories` (object)

`rate_limit` (number)

`max_pages` (number or null)

`enable_codebase_analysis` (boolean)

`code_analysis_depth` (string)

`fetch_issues` (boolean)

`max_issues` (number)

`fetch_changelog` (boolean)

`fetch_releases` (boolean)

`file_patterns` (array of strings)

`ai_mode` (string)

`ocr` (boolean)

`password` (string)

`extract_tables` (boolean)

`parallel` (boolean)