llms.txt Automatic Detection
Skill Seekers automatically detects llms.txt files for 10x faster scraping with AI-optimized content.
Overview
llms.txt is an emerging standard for providing AI-optimized documentation in a single file. When a website offers llms.txt, Skill Seekers automatically detects and prioritizes it over traditional web scraping.
Benefits:
- β‘ 10x faster - Single file download vs. scraping 100+ pages
- π― AI-optimized - Content already formatted for LLMs
- π¦ Complete - Usually contains entire documentation
- π Maintained - Site owners keep it updated
Version: v2.5.0+
How It Works
Automatic Detection Order
Skill Seekers checks for llms.txt variants in this order:
llms-full.txt- Complete documentation (preferred)llms.txt- Standard documentationllms-small.txt- Condensed version- Fallback to web scraping - If no llms.txt found
Detection happens automatically - no configuration needed!
Example Workflow
# Standard scraping command
skill-seekers scrape https://example.com/ --output output/example/
# Behind the scenes:
# 1. Check https://example.com/llms-full.txt β
Found!
# 2. Download llms-full.txt (2 seconds)
# 3. Parse and convert to skill format
# 4. Done! (vs. 5 minutes to scrape 200 pages)
llms.txt Format
Standard Structure
# Example.com Documentation
> AI-optimized documentation for Example.com
# Getting Started
## Installation
```bash
npm install example
Quick Start
- Create a new project
- Configure settings
- Run the application
API Reference
Core Functions
doSomething(param)
Description of the functionβ¦
Examples
Basic Example
const result = doSomething('value');
**Key Features:**
- Plain markdown format
- Hierarchical structure
- Code examples included
- Comprehensive and complete
---
## Detection and Usage
### Automatic Detection (Default)
**No configuration needed:**
```bash
# Automatically uses llms.txt if available
skill-seekers scrape https://docs.example.com/ --output output/example/
Detection log:
π Checking for llms.txt...
β
Found llms-full.txt at https://docs.example.com/llms-full.txt
π₯ Downloading (2.3 MB)...
β
Downloaded in 1.8 seconds
π Parsing content...
β
Skill created: example (4,231 tokens)
β‘ Time saved: 4m 32s (llms.txt vs. traditional scraping)
Force llms.txt
Explicitly use llms.txt even if web scraping is preferred:
skill-seekers scrape https://docs.example.com/ \
--prefer-llms-txt \
--output output/example/
Disable llms.txt
Force traditional web scraping:
skill-seekers scrape https://docs.example.com/ \
--no-llms-txt \
--output output/example/
Comparison: llms.txt vs. Web Scraping
Speed
| Documentation Size | llms.txt | Web Scraping | Speed-up |
|---|---|---|---|
| Small (50 pages) | 1-2 sec | 30-60 sec | 30x |
| Medium (200 pages) | 2-3 sec | 3-5 min | 60x |
| Large (1000 pages) | 3-5 sec | 15-20 min | 180x |
Quality
| Aspect | llms.txt | Web Scraping |
|---|---|---|
| Content Completeness | β Curated by maintainers | β οΈ Depends on scraping config |
| AI Optimization | β Formatted for LLMs | β May include non-essential content |
| Code Examples | β Usually included | β οΈ Depends on selectors |
| Up-to-Date | β οΈ Depends on maintainers | β Always latest |
| Structure | β Hierarchical markdown | β οΈ Depends on site structure |
When to Use Each
Use llms.txt (automatic detection) when:
- β
Site offers
llms.txt(detected automatically) - β Speed is important
- β You trust site maintainers
Force web scraping when:
- β llms.txt is outdated (check last modified date)
- β You need specific selectors/categories
- β You want more control over content extraction
Sites with llms.txt Support
Known Sites (as of 2025)
Framework Documentation:
- Next.js:
https://nextjs.org/llms-full.txt - Astro:
https://docs.astro.build/llms.txt - Remix:
https://remix.run/llms.txt
Tools & Libraries:
- Supabase:
https://supabase.com/docs/llms.txt - Vercel:
https://vercel.com/docs/llms-full.txt - Railway:
https://docs.railway.app/llms.txt
Check for llms.txt:
# Test if site has llms.txt
curl -I https://docs.example.com/llms-full.txt
curl -I https://docs.example.com/llms.txt
curl -I https://docs.example.com/llms-small.txt
Advanced Usage
Inspect llms.txt Before Using
# Download and inspect
curl https://docs.example.com/llms-full.txt -o llms-full.txt
head -n 50 llms-full.txt
# Check file size and last modified
curl -I https://docs.example.com/llms-full.txt | grep -E 'Content-Length|Last-Modified'
Combine llms.txt with Additional Sources
# Use llms.txt as base, scrape additional pages
skill-seekers scrape https://docs.example.com/ \
--use-llms-txt \
--additional-pages "changelog,releases,roadmap" \
--output output/example/
Manual Download and Conversion
# 1. Download manually
curl https://docs.example.com/llms-full.txt -o llms-full.txt
# 2. Convert to skill
skill-seekers convert llms-full.txt \
--format llms-txt \
--output output/example/
llms.txt Standard
Specification
The llms.txt format is a community-driven standard for AI-optimized documentation:
Key Principles:
- Plain markdown - No HTML, no fancy formatting
- Complete - All essential documentation in one file
- Hierarchical - Clear heading structure
- Optimized - Removes navigation, sidebars, footers
- Updated - Maintained by project owners
Learn more: llms.txt specification (if site exists)
Creating Your Own llms.txt
For documentation site owners:
# Your Project Documentation
> Complete documentation for Your Project - optimized for LLMs
# Overview
Brief description of your project...
# Installation
Step-by-step installation guide...
# API Reference
Complete API documentation...
# Examples
Practical code examples...
# FAQ
Common questions and answers...
Best Practices:
- β Include all essential content (no links to external pages)
- β Use clear hierarchical headings (H1, H2, H3)
- β Include code examples inline
- β Keep updated with documentation changes
- β
Offer variants:
llms-full.txt(complete),llms.txt(standard),llms-small.txt(condensed) - β Donβt include navigation, sidebars, or UI elements
- β Donβt use HTML or complex formatting
- β Donβt include non-essential content (changelog, blog posts)
Configuration Options
Config File Support
{
"name": "example",
"base_url": "https://docs.example.com/",
"llms_txt": {
"enabled": true,
"prefer": "full",
"fallback_to_scraping": true,
"max_age_days": 30
}
}
Options:
enabled: Auto-detect llms.txt (default:true)prefer: Which variant to prefer (full|standard|small)fallback_to_scraping: Use web scraping if llms.txt not found (default:true)max_age_days: Skip llms.txt if older than N days (default:null)
Performance Metrics
Real-World Examples
Next.js Documentation:
- Pages: 300+
- llms-full.txt size: 3.2 MB
- Web scraping time: 6 minutes
- llms.txt download time: 2 seconds
- Speed-up: 180x faster
Supabase Documentation:
- Pages: 500+
- llms.txt size: 4.8 MB
- Web scraping time: 9 minutes
- llms.txt download time: 3 seconds
- Speed-up: 180x faster
Astro Documentation:
- Pages: 200+
- llms.txt size: 2.1 MB
- Web scraping time: 4 minutes
- llms.txt download time: 1.5 seconds
- Speed-up: 160x faster
Troubleshooting
Issue: llms.txt is outdated
Symptoms:
β οΈ llms.txt last modified: 45 days ago
β οΈ Using web scraping instead
Solutions:
-
Force use anyway:
skill-seekers scrape URL --force-llms-txt -
Contact site maintainers to update llms.txt
-
Use web scraping:
skill-seekers scrape URL --no-llms-txt
Issue: llms.txt not found
Symptoms:
π Checking for llms.txt...
β Not found: llms-full.txt
β Not found: llms.txt
β Not found: llms-small.txt
βΉοΈ Falling back to web scraping
Solutions:
-
Check manually:
curl -I https://docs.example.com/llms.txt -
Use web scraping (automatic fallback)
-
Request llms.txt from site owner
Issue: llms.txt incomplete
Symptoms: Skill missing expected sections
Solutions:
-
Supplement with web scraping:
skill-seekers scrape URL --use-llms-txt --additional-pages "missing-section" -
Use web scraping only:
skill-seekers scrape URL --no-llms-txt
Best Practices
1. Trust Automatic Detection
β Skill Seekers intelligently detects and uses llms.txt when beneficial
2. Verify Content Completeness
β After using llms.txt, spot-check the generated skill:
cat output/example/SKILL.md | head -n 100
3. Check Last Modified Date
β If llms.txt is > 60 days old, consider web scraping:
curl -I https://docs.example.com/llms.txt | grep Last-Modified
4. Combine with Other Sources
β Use llms.txt as base, add GitHub issues/changelog:
skill-seekers unified --config unified-config.json
# Where unified-config uses llms.txt + GitHub scraping
Next Steps
- Documentation Scraping - Traditional web scraping options
- Unified Scraping - Combine llms.txt with other sources
- Large Documentation - Handling 10K+ page sites
Status: β Production Ready (v2.5.0+)
Found an issue or have suggestions? Open an issue