Weaviate

使用 Weaviate 进行企业级向量搜索——GraphQL 接口和模块化 AI。

快速开始

skill-seekers scrape --format weaviate --config configs/react.json

设置

pip install weaviate-client>=4.0.0

Python 示例(v4 API)

import weaviate
import json

# 连接本地 Weaviate
client = weaviate.connect_to_local()

# 加载数据
with open("output/react-weaviate.json") as f:
    data = json.load(f)

# 如果集合不存在则创建
from weaviate.classes.config import Configure, Property, DataType

if not client.collections.exists("ReactDoc"):
    client.collections.create(
        name="ReactDoc",
        vectorizer_config=Configure.Vectorizer.none(),  # 我们将提供向量
        properties=[
            Property(name="content", data_type=DataType.TEXT),
            Property(name="category", data_type=DataType.TEXT),
            Property(name="source", data_type=DataType.TEXT),
        ]
    )

# 获取集合
collection = client.collections.get("ReactDoc")

# 导入带有嵌入的数据
with collection.batch.dynamic() as batch:
    for item in data:
        batch.add_object(
            properties={
                "content": item["content"],
                "category": item.get("category", ""),
                "source": item.get("source", "")
            },
            vector=item["embedding"]
        )

client.close()

使用 v4 API 查询

import weaviate

client = weaviate.connect_to_local()
collection = client.collections.get("ReactDoc")

# 向量搜索
response = collection.query.near_text(
    query="React Hooks",
    limit=3,
    return_properties=["content", "category", "source"]
)

for obj in response.objects:
    print(f"内容: {obj.properties['content'][:200]}...")
    print(f"类别: {obj.properties['category']}")
    print("---")

client.close()

混合搜索

# 结合向量和关键词搜索
response = collection.query.hybrid(
    query="React Hooks useState",
    alpha=0.5,  # 向量与关键词之间的平衡
    limit=5,
    return_properties=["content", "category"]
)

功能特性

  • GraphQL 接口 - 灵活的查询
  • 模块化 AI - 选择您的向量化器
  • 多租户 - 企业级安全
  • 实时更新 - 即时更新
  • 混合搜索 - 向量 + 关键词

Weaviate Cloud

用于生产环境,使用 Weaviate Cloud:

import weaviate

client = weaviate.connect_to_weaviate_cloud(
    cluster_url="https://your-cluster.weaviate.cloud",
    auth_credentials=weaviate.auth.AuthApiKey("your-api-key")
)

下一步