tutorial rag langchain chroma react
From Any Source to RAG Pipeline in 5 Minutes
Learn how to transform documentation, GitHub repos, PDFs, or codebases into a LangChain + Chroma RAG pipeline with Skill Seekers v3.0.0
Skill Seekers Team •
From Any Source to RAG Pipeline in 5 Minutes
In this tutorial, you’ll learn how to transform any source into a RAG pipeline:
- Scrape React documentation (or any source)
- Convert it to LangChain Documents
- Store in Chroma vector database
- Query with natural language
Prerequisites
pip install skill-seekers langchain chromadb openai
Step 1: Scrape Your Source
From Documentation
skill-seekers scrape --format langchain --config configs/react.json
From GitHub Repository
skill-seekers scrape --format langchain --github https://github.com/facebook/react
From PDF
skill-seekers scrape --format langchain --pdf ./react-docs.pdf
From Local Codebase
skill-seekers analyze --directory ./my-react-project --format langchain
This will:
- Extract content from your chosen source
- Convert pages to LangChain Documents
- Save to
output/react-langchain/
Step 2: Load Documents
from skill_seekers.cli.adaptors import get_adaptor
adaptor = get_adaptor('langchain')
documents = adaptor.load_documents("output/react-langchain/")
print(f"Loaded {len(documents)} documents")
Step 3: Store in Chroma
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
# Initialize embeddings
embeddings = OpenAIEmbeddings()
# Create vector store
vectorstore = Chroma.from_documents(
documents,
embeddings,
collection_name="react-docs",
persist_directory="./chroma_db"
)
print("Documents stored in Chroma!")
Step 4: Query
# Search for React hooks information
results = vectorstore.similarity_search("How do I use useState?")
print(results[0].page_content)
# Search with filter
results = vectorstore.similarity_search(
"useEffect cleanup",
filter={"category": "hooks"}
)
Complete Script
#!/usr/bin/env python3
"""RAG pipeline from any source."""
from skill_seekers.cli.adaptors import get_adaptor
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
# Load documents
adaptor = get_adaptor('langchain')
documents = adaptor.load_documents("output/react-langchain/")
# Store in Chroma
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
documents,
embeddings,
collection_name="react-docs",
persist_directory="./chroma_db"
)
# Query
query = "How do I use useState?"
results = vectorstore.similarity_search(query, k=3)
print(f"Query: {query}\n")
for i, doc in enumerate(results, 1):
print(f"Result {i}:")
print(doc.page_content[:500] + "...")
print()
Advanced: Other Vector Databases
Skill Seekers v3.0.0 supports 6 vector databases:
# Weaviate
skill-seekers scrape --format weaviate --config react.json
# Qdrant
skill-seekers scrape --format qdrant --config react.json
# FAISS
skill-seekers scrape --format faiss --config react.json
# Pinecone (via Markdown export)
skill-seekers scrape --target markdown --config react.json
Next Steps
- Try with GitHub repositories
- Process PDF manuals
- Analyze your own codebase
- Experiment with different chunk sizes
- Add metadata filtering
- Deploy to production with cloud storage
That’s it! You now have a RAG pipeline that works with any source.