llms.txt Automatic Detection
Skill Seekers automatically detects llms.txt files for 10x faster scraping with AI-optimized content.
Overview
llms.txt is an emerging standard for providing AI-optimized documentation in a single file. When a website offers llms.txt, Skill Seekers automatically detects and prioritizes it over traditional web scraping.
Benefits:
- ⚡ 10x faster - Single file download vs. scraping 100+ pages
- 🎯 AI-optimized - Content already formatted for LLMs
- 📦 Complete - Usually contains entire documentation
- 🔄 Maintained - Site owners keep it updated
Version: v2.5.0+
How It Works
Automatic Detection Order
Skill Seekers checks for llms.txt variants in this order:
llms-full.txt- Complete documentation (preferred)llms.txt- Standard documentationllms-small.txt- Condensed version- Fallback to web scraping - If no llms.txt found
Detection happens automatically - no configuration needed!
Example Workflow
# Standard scraping command
skill-seekers scrape https://example.com/ --output output/example/
# Behind the scenes:
# 1. Check https://example.com/llms-full.txt ✅ Found!
# 2. Download llms-full.txt (2 seconds)
# 3. Parse and convert to skill format
# 4. Done! (vs. 5 minutes to scrape 200 pages)
llms.txt Format
Standard Structure
# Example.com Documentation
> AI-optimized documentation for Example.com
# Getting Started
## Installation
```bash
npm install example
Quick Start
- Create a new project
- Configure settings
- Run the application
API Reference
Core Functions
doSomething(param)
Description of the function…
Examples
Basic Example
const result = doSomething('value');
**Key Features:**
- Plain markdown format
- Hierarchical structure
- Code examples included
- Comprehensive and complete
---
## Detection and Usage
### Automatic Detection (Default)
**No configuration needed:**
```bash
# Automatically uses llms.txt if available
skill-seekers scrape https://docs.example.com/ --output output/example/
Detection log:
🔍 Checking for llms.txt...
✅ Found llms-full.txt at https://docs.example.com/llms-full.txt
📥 Downloading (2.3 MB)...
✅ Downloaded in 1.8 seconds
📝 Parsing content...
✅ Skill created: example (4,231 tokens)
⚡ Time saved: 4m 32s (llms.txt vs. traditional scraping)
Force llms.txt
Explicitly use llms.txt even if web scraping is preferred:
skill-seekers scrape https://docs.example.com/ \
--prefer-llms-txt \
--output output/example/
Disable llms.txt
Force traditional web scraping:
skill-seekers scrape https://docs.example.com/ \
--no-llms-txt \
--output output/example/
Comparison: llms.txt vs. Web Scraping
Speed
| Documentation Size | llms.txt | Web Scraping | Speed-up |
|---|---|---|---|
| Small (50 pages) | 1-2 sec | 30-60 sec | 30x |
| Medium (200 pages) | 2-3 sec | 3-5 min | 60x |
| Large (1000 pages) | 3-5 sec | 15-20 min | 180x |
Quality
| Aspect | llms.txt | Web Scraping |
|---|---|---|
| Content Completeness | ✅ Curated by maintainers | ⚠️ Depends on scraping config |
| AI Optimization | ✅ Formatted for LLMs | ❌ May include non-essential content |
| Code Examples | ✅ Usually included | ⚠️ Depends on selectors |
| Up-to-Date | ⚠️ Depends on maintainers | ✅ Always latest |
| Structure | ✅ Hierarchical markdown | ⚠️ Depends on site structure |
When to Use Each
Use llms.txt (automatic detection) when:
- ✅ Site offers
llms.txt(detected automatically) - ✅ Speed is important
- ✅ You trust site maintainers
Force web scraping when:
- ❌ llms.txt is outdated (check last modified date)
- ❌ You need specific selectors/categories
- ❌ You want more control over content extraction
Sites with llms.txt Support
Known Sites (as of 2025)
Framework Documentation:
- Next.js:
https://nextjs.org/llms-full.txt - Astro:
https://docs.astro.build/llms.txt - Remix:
https://remix.run/llms.txt
Tools & Libraries:
- Supabase:
https://supabase.com/docs/llms.txt - Vercel:
https://vercel.com/docs/llms-full.txt - Railway:
https://docs.railway.app/llms.txt
Check for llms.txt:
# Test if site has llms.txt
curl -I https://docs.example.com/llms-full.txt
curl -I https://docs.example.com/llms.txt
curl -I https://docs.example.com/llms-small.txt
Advanced Usage
Inspect llms.txt Before Using
# Download and inspect
curl https://docs.example.com/llms-full.txt -o llms-full.txt
head -n 50 llms-full.txt
# Check file size and last modified
curl -I https://docs.example.com/llms-full.txt | grep -E 'Content-Length|Last-Modified'
Combine llms.txt with Additional Sources
# Use llms.txt as base, scrape additional pages
skill-seekers scrape https://docs.example.com/ \
--use-llms-txt \
--additional-pages "changelog,releases,roadmap" \
--output output/example/
Manual Download and Conversion
# 1. Download manually
curl https://docs.example.com/llms-full.txt -o llms-full.txt
# 2. Convert to skill
skill-seekers convert llms-full.txt \
--format llms-txt \
--output output/example/
llms.txt Standard
Specification
The llms.txt format is a community-driven standard for AI-optimized documentation:
Key Principles:
- Plain markdown - No HTML, no fancy formatting
- Complete - All essential documentation in one file
- Hierarchical - Clear heading structure
- Optimized - Removes navigation, sidebars, footers
- Updated - Maintained by project owners
Learn more: llms.txt specification (if site exists)
Creating Your Own llms.txt
For documentation site owners:
# Your Project Documentation
> Complete documentation for Your Project - optimized for LLMs
# Overview
Brief description of your project...
# Installation
Step-by-step installation guide...
# API Reference
Complete API documentation...
# Examples
Practical code examples...
# FAQ
Common questions and answers...
Best Practices:
- ✅ Include all essential content (no links to external pages)
- ✅ Use clear hierarchical headings (H1, H2, H3)
- ✅ Include code examples inline
- ✅ Keep updated with documentation changes
- ✅ Offer variants:
llms-full.txt(complete),llms.txt(standard),llms-small.txt(condensed) - ❌ Don’t include navigation, sidebars, or UI elements
- ❌ Don’t use HTML or complex formatting
- ❌ Don’t include non-essential content (changelog, blog posts)
Configuration Options
Config File Support
{
"name": "example",
"base_url": "https://docs.example.com/",
"llms_txt": {
"enabled": true,
"prefer": "full",
"fallback_to_scraping": true,
"max_age_days": 30
}
}
Options:
enabled: Auto-detect llms.txt (default:true)prefer: Which variant to prefer (full|standard|small)fallback_to_scraping: Use web scraping if llms.txt not found (default:true)max_age_days: Skip llms.txt if older than N days (default:null)
Performance Metrics
Real-World Examples
Next.js Documentation:
- Pages: 300+
- llms-full.txt size: 3.2 MB
- Web scraping time: 6 minutes
- llms.txt download time: 2 seconds
- Speed-up: 180x faster
Supabase Documentation:
- Pages: 500+
- llms.txt size: 4.8 MB
- Web scraping time: 9 minutes
- llms.txt download time: 3 seconds
- Speed-up: 180x faster
Astro Documentation:
- Pages: 200+
- llms.txt size: 2.1 MB
- Web scraping time: 4 minutes
- llms.txt download time: 1.5 seconds
- Speed-up: 160x faster
Troubleshooting
Issue: llms.txt is outdated
Symptoms:
⚠️ llms.txt last modified: 45 days ago
⚠️ Using web scraping instead
Solutions:
-
Force use anyway:
skill-seekers scrape URL --force-llms-txt -
Contact site maintainers to update llms.txt
-
Use web scraping:
skill-seekers scrape URL --no-llms-txt
Issue: llms.txt not found
Symptoms:
🔍 Checking for llms.txt...
❌ Not found: llms-full.txt
❌ Not found: llms.txt
❌ Not found: llms-small.txt
ℹ️ Falling back to web scraping
Solutions:
-
Check manually:
curl -I https://docs.example.com/llms.txt -
Use web scraping (automatic fallback)
-
Request llms.txt from site owner
Issue: llms.txt incomplete
Symptoms: Skill missing expected sections
Solutions:
-
Supplement with web scraping:
skill-seekers scrape URL --use-llms-txt --additional-pages "missing-section" -
Use web scraping only:
skill-seekers scrape URL --no-llms-txt
Best Practices
1. Trust Automatic Detection
✅ Skill Seekers intelligently detects and uses llms.txt when beneficial
2. Verify Content Completeness
✅ After using llms.txt, spot-check the generated skill:
cat output/example/SKILL.md | head -n 100
3. Check Last Modified Date
✅ If llms.txt is > 60 days old, consider web scraping:
curl -I https://docs.example.com/llms.txt | grep Last-Modified
4. Combine with Other Sources
✅ Use llms.txt as base, add GitHub issues/changelog:
skill-seekers unified --config unified-config.json
# Where unified-config uses llms.txt + GitHub scraping
Next Steps
- Documentation Scraping - Traditional web scraping options
- Unified Scraping - Combine llms.txt with other sources
- Large Documentation - Handling 10K+ page sites
Status: ✅ Production Ready (v2.5.0+)
Found an issue or have suggestions? Open an issue