Tutorial: Extracting PDFs

Learn how to extract technical documentation from PDFs and create searchable AI skills.

Time: 10 minutes | Level: Beginner | Result: PDF-based skill


Basic PDF Extraction

skill-seekers pdf \
  --input /path/to/manual.pdf \
  --output output/manual/

OCR for Scanned PDFs

# Install Tesseract first
# Ubuntu: sudo apt-get install tesseract-ocr
# macOS: brew install tesseract

skill-seekers pdf \
  --input /path/to/scanned.pdf \
  --output output/scanned/ \
  --ocr

Password-Protected PDFs

skill-seekers pdf \
  --input /path/to/encrypted.pdf \
  --output output/encrypted/ \
  --password "your-password"

Extract Tables

skill-seekers pdf \
  --input /path/to/spec.pdf \
  --output output/spec/ \
  --extract-tables

Parallel Processing (3x Faster)

skill-seekers pdf \
  --input /path/to/large.pdf \
  --output output/large/ \
  --parallel \
  --workers 8

See: PDF Scraping Manual for complete guide.