Test Example Extraction (C3.2)
Transform test files into documentation assets by extracting real API usage patterns
Overview
The Test Example Extractor analyzes test files to automatically extract meaningful usage examples showing:
- Object Instantiation: Real parameter values and configuration
- Method Calls: Expected behaviors and return values
- Configuration Examples: Valid configuration dictionaries
- Setup Patterns: Initialization from setUp() methods and pytest fixtures
- Multi-Step Workflows: Integration test sequences
Supported Languages (9)
| Language | Extraction Method | Supported Features |
|---|---|---|
| Python | AST-based (deep) | All categories, high accuracy |
| JavaScript | Regex patterns | Instantiation, assertions, configs |
| TypeScript | Regex patterns | Instantiation, assertions, configs |
| Go | Regex patterns | Table tests, assertions |
| Rust | Regex patterns | Test macros, assertions |
| Java | Regex patterns | JUnit patterns |
| C# | Regex patterns | xUnit patterns |
| PHP | Regex patterns | PHPUnit patterns |
| Ruby | Regex patterns | RSpec patterns |
Quick Start
CLI Usage
# Extract from directory
skill-seekers extract-test-examples tests/ --language python
# Extract from single file
skill-seekers extract-test-examples --file tests/test_scraper.py
# JSON output
skill-seekers extract-test-examples tests/ --json > examples.json
# Markdown output
skill-seekers extract-test-examples tests/ --markdown > examples.md
# Filter by confidence
skill-seekers extract-test-examples tests/ --min-confidence 0.7
# Limit examples per file
skill-seekers extract-test-examples tests/ --max-per-file 5
MCP Tool Usage
# From Claude Code
extract_test_examples(directory="tests/", language="python")
# Single file with JSON output
extract_test_examples(file="tests/test_api.py", json=True)
# High confidence only
extract_test_examples(directory="tests/", min_confidence=0.7)
Codebase Integration
# Combine with codebase analysis
skill-seekers analyze --directory . --extract-test-examples
Output Formats
JSON Schema
{
"total_examples": 42,
"examples_by_category": {
"instantiation": 15,
"method_call": 12,
"config": 8,
"setup": 4,
"workflow": 3
},
"examples_by_language": {
"Python": 42
},
"avg_complexity": 0.65,
"high_value_count": 28,
"examples": [
{
"example_id": "a3f2b1c0",
"test_name": "test_database_connection",
"category": "instantiation",
"code": "db = Database(host=\"localhost\", port=5432)",
"language": "Python",
"description": "Instantiate Database: Test database connection",
"expected_behavior": "self.assertTrue(db.connect())",
"setup_code": null,
"file_path": "tests/test_db.py",
"line_start": 15,
"line_end": 15,
"complexity_score": 0.6,
"confidence": 0.85,
"tags": ["unittest"],
"dependencies": ["unittest", "database"]
}
]
}
Markdown Format
# Test Example Extraction Report
**Total Examples**: 42
**High Value Examples** (confidence > 0.7): 28
**Average Complexity**: 0.65
## Examples by Category
- **instantiation**: 15
- **method_call**: 12
- **config**: 8
- **setup**: 4
- **workflow**: 3
## Extracted Examples
### test_database_connection
**Category**: instantiation
**Description**: Instantiate Database: Test database connection
**Expected**: self.assertTrue(db.connect())
**Confidence**: 0.85
**Tags**: unittest
```python
db = Database(host="localhost", port=5432)
Source: tests/test_db.py:15
## Extraction Categories
### 1. Instantiation
**Extracts**: Object creation with real parameters
```python
# Example from test
db = Database(
host="localhost",
port=5432,
user="admin",
password="secret"
)
Use Case: Shows valid initialization parameters
2. Method Call
Extracts: Method calls followed by assertions
# Example from test
response = api.get("/users/1")
assert response.status_code == 200
Use Case: Demonstrates expected behavior
3. Config
Extracts: Configuration dictionaries (2+ keys)
# Example from test
config = {
"debug": True,
"database_url": "postgresql://localhost/test",
"cache_enabled": False
}
Use Case: Shows valid configuration examples
4. Setup
Extracts: setUp() methods and pytest fixtures
# Example from setUp
self.client = APIClient(api_key="test-key")
self.client.connect()
Use Case: Demonstrates initialization sequences
5. Workflow
Extracts: Multi-step integration tests (3+ steps)
# Example workflow
user = User(name="John", email="john@example.com")
user.save()
user.verify()
session = user.login(password="secret")
assert session.is_active
Use Case: Shows complete usage patterns
Quality Filtering
Confidence Scoring (0.0 - 1.0)
- Instantiation: 0.8 (high - clear object creation)
- Method Call + Assertion: 0.85 (very high - behavior proven)
- Config Dict: 0.75 (good - clear configuration)
- Workflow: 0.9 (excellent - complete pattern)
Automatic Filtering
Removes:
- Trivial patterns:
assertTrue(True),assertEqual(1, 1) - Mock-only code:
Mock(),MagicMock() - Too short: < 20 characters
- Empty constructors:
MyClass()with no parameters
Adjustable Thresholds:
# High confidence only (0.7+)
--min-confidence 0.7
# Allow lower confidence for discovery
--min-confidence 0.4
Use Cases
1. Enhanced Documentation
Problem: Documentation often lacks real usage examples
Solution: Extract examples from working tests
# Generate examples for SKILL.md
skill-seekers extract-test-examples tests/ --markdown >> SKILL.md
2. API Understanding
Problem: New developers struggle with API usage
Solution: Show how APIs are actually tested
3. Tutorial Generation
Problem: Creating step-by-step guides is time-consuming
Solution: Use workflow examples as tutorial steps
4. Configuration Examples
Problem: Valid configuration is unclear
Solution: Extract config dictionaries from tests
Architecture
Core Components
TestExampleExtractor (Orchestrator)
├── PythonTestAnalyzer (AST-based)
│ ├── extract_from_test_class()
│ ├── extract_from_test_function()
│ ├── _find_instantiations()
│ ├── _find_method_calls_with_assertions()
│ ├── _find_config_dicts()
│ └── _find_workflows()
├── GenericTestAnalyzer (Regex-based)
│ └── PATTERNS (per-language regex)
└── ExampleQualityFilter
├── filter()
└── _is_trivial()
Data Flow
- Find Test Files: Glob patterns (test_*.py, *_test.go, etc.)
- Detect Language: File extension mapping
- Extract Examples:
- Python → PythonTestAnalyzer (AST)
- Others → GenericTestAnalyzer (Regex)
- Apply Quality Filter: Remove trivial patterns
- Limit Per File: Top N by confidence
- Generate Report: JSON or Markdown
Limitations
Current Scope
- Python: Full AST-based extraction (all categories)
- Other Languages: Regex-based (limited to common patterns)
- Focus: Test files only (not production code)
- Complexity: Simple to moderate test patterns
Not Extracted
- Complex mocking setups
- Parameterized tests (partial support)
- Nested helper functions
- Dynamically generated tests
Future Enhancements (Roadmap C3.3-C3.5)
- C3.3: Build ‘how to’ guides from workflow examples
- C3.4: Extract configuration patterns
- C3.5: Architectural overview from test coverage
Troubleshooting
No Examples Extracted
Symptom: total_examples: 0
Causes:
- Test files not found (check patterns: test_*.py, *_test.go)
- Confidence threshold too high
- Language not supported
Solutions:
# Lower confidence threshold
--min-confidence 0.3
# Check test file detection
ls tests/test_*.py
# Verify language support
--language python # Use supported language
Low Quality Examples
Symptom: Many trivial or incomplete examples
Causes:
- Tests use heavy mocking
- Tests are too simple
- Confidence threshold too low
Solutions:
# Increase confidence threshold
--min-confidence 0.7
# Reduce examples per file (get best only)
--max-per-file 3
Parsing Errors
Symptom: Failed to parse warnings
Causes:
- Syntax errors in test files
- Incompatible Python version
- Dynamic code generation
Solutions:
- Fix syntax errors in test files
- Ensure tests are valid Python/JS/Go code
- Errors are logged but don’t stop extraction
Examples
Python unittest
# tests/test_database.py
import unittest
class TestDatabase(unittest.TestCase):
def test_connection(self):
"""Test database connection with real params"""
db = Database(
host="localhost",
port=5432,
user="admin",
timeout=30
)
self.assertTrue(db.connect())
Extracts:
- Category: instantiation
- Code:
db = Database(host="localhost", port=5432, user="admin", timeout=30) - Confidence: 0.8
- Expected:
self.assertTrue(db.connect())
Python pytest
# tests/test_api.py
import pytest
@pytest.fixture
def client():
return APIClient(base_url="https://api.test.com")
def test_get_user(client):
"""Test fetching user data"""
response = client.get("/users/123")
assert response.status_code == 200
assert response.json()["id"] == 123
Extracts:
- Category: method_call
- Setup:
# Fixtures: client - Code:
response = client.get("/users/123")\nassert response.status_code == 200 - Confidence: 0.85
Go Table Test
// add_test.go
func TestAdd(t *testing.T) {
calc := Calculator{mode: "basic"}
result := calc.Add(2, 3)
if result != 5 {
t.Errorf("Add(2, 3) = %d; want 5", result)
}
}
Extracts:
- Category: instantiation
- Code:
calc := Calculator{mode: "basic"} - Confidence: 0.6
Performance
| Metric | Value |
|---|---|
| Processing Speed | ~100 files/second (Python AST) |
| Memory Usage | ~50MB for 1000 test files |
| Example Quality | 80%+ high-confidence (>0.7) |
| False Positives | <5% (with default filtering) |
Integration Points
1. Standalone CLI
skill-seekers extract-test-examples tests/
2. Codebase Analysis
codebase-scraper --directory . --extract-test-examples
3. MCP Server
# Via Claude Code
extract_test_examples(directory="tests/")
4. Python API
from skill_seekers.cli.test_example_extractor import TestExampleExtractor
extractor = TestExampleExtractor(min_confidence=0.6)
report = extractor.extract_from_directory("tests/")
print(f"Found {report.total_examples} examples")
for example in report.examples:
print(f"- {example.test_name}: {example.code[:50]}...")
Next Steps
- Pattern Detection (C3.1) - Detect design patterns
- How-To Guide Generation (C3.3) - Build guides from workflows
- C3.x Codebase Analysis - Complete analysis suite
Status: ✅ Implemented in v2.6.0 Issue: #TBD (C3.2) Related Tasks: C3.1 (Pattern Detection), C3.3-C3.5 (Future enhancements)