基于 Git 的配置源
使用私有或团队 git 仓库存储和共享抓取配置。
概述
基于 Git 的配置源允许您:
- 在 git 仓库中存储配置(私有或团队仓库)
- 对配置进行版本控制(跟踪更改、回滚、分支)
- 与团队共享配置(集中配置管理)
- 使用身份验证(HTTPS + 令牌、SSH 密钥)
- 自动获取更新(在抓取之前拉取最新配置)
版本: v2.2.0+(Git 配置源功能)
快速开始
1. 添加 Git 源
# 将 git 仓库添加为配置源
skill-seekers add-git-source \
https://github.com/your-org/scraping-configs.git \
--name company-configs \
--branch main
# 使用身份验证(私有仓库)
skill-seekers add-git-source \
https://github.com/your-org/private-configs.git \
--name private-configs \
--token ghp_yourPersonalAccessToken
2. 使用来自 Git 源的配置
# 通过源名称 + 路径引用配置
skill-seekers scrape \
--config git:company-configs:configs/react.json
# 或使用简写(自动检测)
skill-seekers scrape --config company-configs:react.json
3. 列出并管理源
# 列出所有已配置的源
skill-seekers list-git-sources
# 获取最新更新
skill-seekers fetch-git-sources
# 删除源
skill-seekers remove-git-source company-configs
添加 Git 源
使用令牌的 HTTPS(推荐用于私有仓库)
# GitHub 个人访问令牌
skill-seekers add-git-source \
https://github.com/your-org/configs.git \
--name my-configs \
--token ghp_abc123... \
--branch main
# GitLab 个人访问令牌
skill-seekers add-git-source \
https://gitlab.com/your-org/configs.git \
--name gitlab-configs \
--token glpat-abc123... \
--branch main
# Bitbucket 应用密码
skill-seekers add-git-source \
https://bitbucket.org/your-org/configs.git \
--name bitbucket-configs \
--token ATBB...abc123 \
--branch main
SSH 密钥(替代方案)
# 使用 SSH URL(需要 SSH 密钥设置)
skill-seekers add-git-source \
git@github.com:your-org/configs.git \
--name ssh-configs \
--branch main
# SSH 密钥自动从 ~/.ssh/id_rsa 读取
公共仓库(无需身份验证)
# 公共仓库(不需要令牌)
skill-seekers add-git-source \
https://github.com/public-org/public-configs.git \
--name public-configs \
--branch main
配置仓库结构
推荐布局
scraping-configs/
├── README.md
├── configs/
│ ├── frontend/
│ │ ├── react.json
│ │ ├── vue.json
│ │ └── angular.json
│ ├── backend/
│ │ ├── django.json
│ │ ├── fastapi.json
│ │ └── flask.json
│ ├── game-engines/
│ │ ├── godot.json
│ │ └── unity.json
│ └── internal/
│ ├── company-docs.json
│ └── api-docs.json
├── presets/
│ └── company-preset.json
└── .gitignore
示例配置文件
configs/frontend/react.json:
{
"name": "react",
"description": "React framework documentation",
"base_url": "https://react.dev/",
"extract_api": true,
"max_pages": 200,
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"categories": {
"getting_started": ["learn", "tutorial"],
"api": ["reference", "api"]
}
}
使用 Git 配置
完整路径语法
# 显式语法
skill-seekers scrape --config git:SOURCE_NAME:PATH/TO/CONFIG.json
示例:
# 来自 company-configs 源的 React 配置
skill-seekers scrape --config git:company-configs:configs/frontend/react.json
# 内部文档配置
skill-seekers scrape --config git:company-configs:configs/internal/company-docs.json
简写语法
# 自动检测 git 源
skill-seekers scrape --config SOURCE_NAME:PATH/TO/CONFIG.json
示例:
# 与 git:company-configs:configs/frontend/react.json 相同
skill-seekers scrape --config company-configs:configs/frontend/react.json
# 如果配置在根目录,甚至更短
skill-seekers scrape --config company-configs:react.json
相对路径
# 从 configs/ 目录
skill-seekers scrape --config company-configs:frontend/react.json
# 从根目录
skill-seekers scrape --config company-configs:configs/frontend/react.json
管理 Git 源
列出源
# 显示所有已配置的源
skill-seekers list-git-sources
# 输出:
# Name: company-configs
# URL: https://github.com/your-org/scraping-configs.git
# Branch: main
# Status: ✅ Cloned, up-to-date
# Path: ~/.skill-seekers/git-sources/company-configs
#
# Name: gitlab-configs
# URL: https://gitlab.com/your-org/configs.git
# Branch: production
# Status: ⚠️ Behind remote by 3 commits
# Path: ~/.skill-seekers/git-sources/gitlab-configs
获取更新
# 获取所有源
skill-seekers fetch-git-sources
# 获取特定源
skill-seekers fetch-git-sources company-configs
# 在每次抓取之前获取(自动)
skill-seekers scrape --config company-configs:react.json --fetch-sources
删除源
# 删除 git 源(保留本地缓存)
skill-seekers remove-git-source company-configs
# 删除并删除本地缓存
skill-seekers remove-git-source company-configs --delete-cache
身份验证
GitHub 个人访问令牌
创建令牌:
- 访问 https://github.com/settings/tokens
- 点击 Generate new token (classic)
- 选择范围:
repo(用于私有仓库)或public_repo(用于公共仓库) - 复制令牌(以
ghp_开头)
添加源:
skill-seekers add-git-source \
https://github.com/your-org/configs.git \
--name github-configs \
--token ghp_abc123...
GitLab 个人访问令牌
创建令牌:
- 访问 https://gitlab.com/-/profile/personal_access_tokens
- 创建具有
read_repository范围的令牌 - 复制令牌(以
glpat-开头)
添加源:
skill-seekers add-git-source \
https://gitlab.com/your-org/configs.git \
--name gitlab-configs \
--token glpat-abc123...
Bitbucket 应用密码
创建应用密码:
- 访问 https://bitbucket.org/account/settings/app-passwords/
- 创建具有
Repositories: Read权限的密码 - 复制密码(以
ATBB开头)
添加源:
skill-seekers add-git-source \
https://bitbucket.org/your-org/configs.git \
--name bitbucket-configs \
--token ATBB...abc123
SSH 密钥
设置 SSH 密钥:
# 生成 SSH 密钥(如果您没有)
ssh-keygen -t ed25519 -C "your_email@example.com"
# 将公钥添加到 GitHub/GitLab/Bitbucket
cat ~/.ssh/id_ed25519.pub
添加源:
skill-seekers add-git-source \
git@github.com:your-org/configs.git \
--name ssh-configs
分支和版本控制
使用不同分支
# 生产配置
skill-seekers add-git-source \
https://github.com/your-org/configs.git \
--name prod-configs \
--branch production
# 开发配置
skill-seekers add-git-source \
https://github.com/your-org/configs.git \
--name dev-configs \
--branch development
# 使用生产配置
skill-seekers scrape --config prod-configs:react.json
# 使用开发配置
skill-seekers scrape --config dev-configs:react.json
固定到特定提交/标签
# 使用特定提交 SHA
skill-seekers add-git-source \
https://github.com/your-org/configs.git \
--name pinned-configs \
--commit abc123def456
# 使用特定标签
skill-seekers add-git-source \
https://github.com/your-org/configs.git \
--name tagged-configs \
--tag v1.2.0
团队协作
共享团队仓库
设置(每个团队一次):
# 1. 为团队配置创建 git 仓库
mkdir scraping-configs
cd scraping-configs
git init
mkdir -p configs/{frontend,backend,internal}
# 2. 添加配置
# (在 configs/ 中创建 JSON 文件)
# 3. 推送到团队仓库
git add .
git commit -m "Initial team configs"
git remote add origin https://github.com/your-org/scraping-configs.git
git push -u origin main
团队成员(每个人):
# 添加团队源
skill-seekers add-git-source \
https://github.com/your-org/scraping-configs.git \
--name team-configs \
--token ghp_teamToken...
# 使用团队配置
skill-seekers scrape --config team-configs:frontend/react.json
配置更新
当有人更新配置时:
# 选项 1:手动获取
skill-seekers fetch-git-sources team-configs
# 选项 2:抓取前自动获取
skill-seekers scrape --config team-configs:react.json --fetch-sources
贡献新配置:
# 1. 克隆团队仓库
git clone https://github.com/your-org/scraping-configs.git
cd scraping-configs
# 2. 创建新配置
cat > configs/backend/new-framework.json <<EOF
{
"name": "new-framework",
"base_url": "https://new-framework.dev/",
...
}
EOF
# 3. 提交并推送
git add configs/backend/new-framework.json
git commit -m "Add new-framework config"
git push origin main
# 4. 团队成员获取更新
skill-seekers fetch-git-sources team-configs
环境特定配置
开发、预发布、生产
仓库结构:
scraping-configs/
├── envs/
│ ├── dev/
│ │ └── company-docs.json # 开发文档 URL
│ ├── staging/
│ │ └── company-docs.json # 预发布文档 URL
│ └── production/
│ └── company-docs.json # 生产文档 URL
设置源:
# 开发环境
skill-seekers add-git-source \
https://github.com/company/configs.git \
--name dev-configs \
--branch development
# 预发布环境
skill-seekers add-git-source \
https://github.com/company/configs.git \
--name staging-configs \
--branch staging
# 生产环境
skill-seekers add-git-source \
https://github.com/company/configs.git \
--name prod-configs \
--branch production
使用:
# 在开发环境中
skill-seekers scrape --config dev-configs:envs/dev/company-docs.json
# 在生产环境中
skill-seekers scrape --config prod-configs:envs/production/company-docs.json
MCP 集成
Git 源的 MCP 工具
可用工具:
add_git_source- 将 git 仓库添加为配置源list_git_sources- 列出所有已配置的源remove_git_source- 删除源fetch_git_sources- 从远程获取更新
在 Claude Desktop 中使用
示例对话:
您:添加我们公司的抓取配置仓库
Claude:我会添加 git 源。
[Claude 调用 add_git_source MCP 工具]
{
"url": "https://github.com/company/scraping-configs.git",
"name": "company-configs",
"token": "ghp_...",
"branch": "main"
}
完成!您现在可以使用以下配置:
skill-seekers scrape --config company-configs:PATH/TO/CONFIG.json
列出源:
您:我配置了哪些 git 源?
Claude:[Claude 调用 list_git_sources]
您有 2 个 git 源:
1. company-configs (https://github.com/company/configs.git)
2. team-configs (https://github.com/team/configs.git)
存储和缓存
本地存储
Git 源克隆到:
~/.skill-seekers/git-sources/SOURCE_NAME/
示例:
~/.skill-seekers/git-sources/
├── company-configs/
│ ├── .git/
│ ├── configs/
│ └── README.md
└── team-configs/
├── .git/
└── configs/
缓存行为
自动获取行为:
- 默认情况下: Git 源在添加时获取一次
- 手动获取:
skill-seekers fetch-git-sources - 自动获取:
skill-seekers scrape --config X --fetch-sources - 缓存失效: 每 24 小时获取更新(可配置)
配置:
# 设置自动获取间隔(小时)
skill-seekers config set git_fetch_interval 6 # 每 6 小时获取一次
# 禁用自动获取
skill-seekers config set git_auto_fetch false
# 在抓取前始终获取
skill-seekers config set git_always_fetch true
最佳实践
1. 使用描述性源名称
# ✅ 好
skill-seekers add-git-source URL --name company-internal-configs
skill-seekers add-git-source URL --name team-frontend-configs
# ❌ 差
skill-seekers add-git-source URL --name configs1
skill-seekers add-git-source URL --name source
2. 分层组织配置
configs/
├── internal/ # 公司内部文档
├── external/ # 外部/开源文档
├── production/ # 生产配置
└── experimental/ # 实验/测试配置
3. 对所有内容进行版本控制
# 添加 .gitignore
cat > .gitignore <<EOF
*.log
*.tmp
.DS_Store
EOF
# 跟踪更改
git add configs/
git commit -m "Update React config: increase max_pages to 300"
4. 为环境使用分支
# main - 生产配置
# staging - 预发布配置
# development - 开发配置
# feature/* - 实验配置
5. 记录您的配置
# README.md
## 配置仓库结构
- `configs/frontend/` - 前端框架配置
- `configs/backend/` - 后端框架配置
- `configs/internal/` - 内部公司文档
## 用法
```bash
skill-seekers scrape --config team-configs:frontend/react.json
贡献
- 创建功能分支
- 添加/更新配置
- 使用
skill-seekers validate测试 - 创建 PR
---
## 故障排除
### 问题:身份验证失败
**症状:**
Error: Failed to clone repository Authentication failed for ‘https://github.com/org/configs.git’
**解决方案:**
1. **验证令牌是否有效:**
- GitHub: https://github.com/settings/tokens
- GitLab: https://gitlab.com/-/profile/personal_access_tokens
2. **检查令牌权限:**
- GitHub: 需要 `repo` 或 `public_repo` 范围
- GitLab: 需要 `read_repository` 范围
3. **使用正确的令牌重新添加源:**
```bash
skill-seekers remove-git-source SOURCE_NAME
skill-seekers add-git-source URL --name SOURCE_NAME --token CORRECT_TOKEN
问题:找不到配置
症状:
Error: Config file not found: git:source:path/to/config.json
解决方案:
- 列出源内容:
ls ~/.skill-seekers/git-sources/SOURCE_NAME/ - 获取最新更新:
skill-seekers fetch-git-sources SOURCE_NAME - 使用正确的路径:
# 如果配置在:configs/frontend/react.json skill-seekers scrape --config SOURCE_NAME:configs/frontend/react.json
问题:源落后于远程
症状:
⚠️ Source 'company-configs' is behind remote by 5 commits
解决方案:
# 获取更新
skill-seekers fetch-git-sources company-configs
# 或在抓取前自动获取
skill-seekers scrape --config company-configs:react.json --fetch-sources
问题:找不到 SSH 密钥
症状:
Error: Could not read from remote repository
Permission denied (publickey)
解决方案:
- 生成 SSH 密钥:
ssh-keygen -t ed25519 -C "your_email@example.com" - 将公钥添加到 GitHub:
cat ~/.ssh/id_ed25519.pub # 复制输出并添加到 https://github.com/settings/keys - 测试 SSH 连接:
ssh -T git@github.com
配置文件
~/.skill-seekers/git-sources.json
{
"sources": [
{
"name": "company-configs",
"url": "https://github.com/company/configs.git",
"branch": "main",
"auth_method": "token",
"local_path": "~/.skill-seekers/git-sources/company-configs",
"last_fetch": "2025-01-14T10:30:00Z",
"status": "up-to-date"
},
{
"name": "team-configs",
"url": "git@github.com:team/configs.git",
"branch": "production",
"auth_method": "ssh",
"local_path": "~/.skill-seekers/git-sources/team-configs",
"last_fetch": "2025-01-14T09:15:00Z",
"status": "behind"
}
],
"settings": {
"auto_fetch": true,
"fetch_interval_hours": 24,
"always_fetch": false
}
}
下一步
状态:✅ 生产就绪(v2.2.0+)
发现问题或有建议?打开 issue