diff --git a/README.md b/README.md index f0dadeaaf..c7e994bbb 100644 --- a/README.md +++ b/README.md @@ -3,16 +3,15 @@ - 示意图 + seekdb -### **🔷 The AI-Native Search Database** +### The AI-Native Search Database -**Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.** +Vector, full-text, and relational in one engine. Embedding, rerank, and LLM are built in. ----

@@ -41,450 +40,87 @@

-**English** | [中文版](README_CN.md) - ---- +**English** | [中文](README_CN.md)
-## 🚀 What is OceanBase seekdb? - -**OceanBase seekdb** is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows. - ---- - -## 🔥 Why OceanBase seekdb? - -| **Feature** | **seekdb** | **OceanBase** | **Chroma** | **Milvus** | **MySQL 9.0** | **PostgreSQL
+pgvector** | **DuckDB** | **Elasticsearch** | -| ------------------------ |:--------------------:|:-------------:|:----------:|:----------:|:-----------------------:|:----------------------------:|:----------:|:-----------------------------------:| -| **Embedded** | ✅ | ❌ | ✅ | ✅ | ❌[1] | ❌ | ✅ | ❌ | -| **Single-Node** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| **Distributed** | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | -| **MySQL Compatible** | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | -| **Vector Search** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | -| **Full-Text Search** | ✅ | ✅ | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | -| **Hybrid Search** | ✅ | ✅ | ✅ | ✅ | ❌ | ⚠️ | ❌ | ✅ | -| **OLTP** | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | -| **OLAP** | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⚠️ | -| **License** | Apache 2.0 | MulanPubL 2.0 | Apache 2.0 | Apache 2.0 | GPL 2.0 | PostgreSQL License | MIT | AGPLv3
+SSPLv1
+Elastic 2.0 | -> [1] Embedded capability is removed in MySQL 8.0 -> - ✅ Supported -> - ❌ Not Supported -> - ⚠️ Limited - -## ✨ Key Features - -### Build fast + Hybrid search + Multi model -1. **Build fast:** From prototype to production in minutes: create AI apps using Python, run VectorDBBench on 1C2G. -2. **Hybrid Search:** Combine vector search, full-text search and relational query in a single statement. -3. **Multi-Model:** Support relational, vector, text, JSON and GIS in a single engine. +## What is OceanBase seekdb? +OceanBase seekdb is an open-source database that does vector search, full-text search, and SQL in one engine. Embedding, reranking, and LLM inference are built in, so a RAG pipeline can stay in one place instead of calling out to separate services. It uses the OceanBase engine and speaks MySQL protocol; existing MySQL clients and migrations work as-is. -### AI inside + SQL inside -1. **AI Inside:** Run embedding, reranking, LLM inference and prompt management inside the database, supporting a complete document-in/data-out RAG workflow. -2. **SQL Inside:** Powered by the proven OceanBase engine, delivering real-time writes and queries with full ACID compliance, and seamless MySQL ecosystem compatibility. +## Quick Start - - ---- - -## 🎬 Quick Start - -### Installation - -Choose your platform: - -
-🐍 Python (Recommended for AI/ML) +Install the Python client: ```bash pip install -U pyseekdb ``` -
- -
-🐳 Docker (Quick Testing) - -```bash -docker run -d \ - --name seekdb \ - -p 2881:2881 \ - -p 2886:2886 \ - -v ./data:/var/lib/oceanbase \ - oceanbase/seekdb:latest -``` -Please refer to the [document](https://github.com/oceanbase/docker-images/blob/main/seekdb/README.md) of this docker image for details. - -
- -
-📦 Binary (Standalone) - -```bash -# Linux -rpm -ivh seekdb-1.x.x.x-xxxxxxx.el8.x86_64.rpm -``` -Please replace the version number with the actual RPM package version. - -
- - -### 🎯 AI Search Example - -Build a semantic search system in 5 minutes: - -
-🗄️ 🐍 Python SDK - -```bash -# install sdk first -pip install -U pyseekdb -``` +Example: create a collection, add documents, and query by text. Embeddings are generated by the database or the SDK. ```python -""" -this example demonstrates the most common operations with embedding functions: -1. Create a client connection -2. Create a collection with embedding function -3. Add data using documents (embeddings auto-generated) -4. Query using query texts (embeddings auto-generated) -5. Print query results - -This is a minimal example to get you started quickly with embedding functions. -""" - import pyseekdb -from pyseekdb import DefaultEmbeddingFunction - -# ==================== Step 1: Create Client Connection ==================== -# You can use embedded mode, server mode, or OceanBase mode -# For this example, we'll use server mode (you can change to embedded or OceanBase) - -# Embedded mode (local SeekDB) -client = pyseekdb.Client( - path="./seekdb.db", - database="test" -) -# Alternative: Server mode (connecting to remote SeekDB server) -# client = pyseekdb.Client( -# host="127.0.0.1", -# port=2881, -# database="test", -# user="root", -# password="" -# ) - -# Alternative: Remote server mode (OceanBase Server) -# client = pyseekdb.Client( -# host="127.0.0.1", -# port=2881, -# tenant="test", # OceanBase default tenant -# database="test", -# user="root", -# password="" -# ) - -# ==================== Step 2: Create a Collection with Embedding Function ==================== -# A collection is like a table that stores documents with vector embeddings -collection_name = "my_simple_collection" - -# Create collection with default embedding function -# The embedding function will automatically convert documents to embeddings -collection = client.create_collection( - name=collection_name, - #embedding_function=DefaultEmbeddingFunction() # Uses default model (384 dimensions) -) - -print(f"Created collection '{collection_name}' with dimension: {collection.dimension}") -print(f"Embedding function: {collection.embedding_function}") - -# ==================== Step 3: Add Data to Collection ==================== -# With embedding function, you can add documents directly without providing embeddings -# The embedding function will automatically generate embeddings from documents -documents = [ - "Machine learning is a subset of artificial intelligence", - "Python is a popular programming language", - "Vector databases enable semantic search", - "Neural networks are inspired by the human brain", - "Natural language processing helps computers understand text" -] +client = pyseekdb.Client(path="./seekdb.db", database="test") +collection = client.create_collection("docs") -ids = ["id1", "id2", "id3", "id4", "id5"] - -# Add data with documents only - embeddings will be auto-generated by embedding function collection.add( - ids=ids, - documents=documents, # embeddings will be automatically generated - metadatas=[ - {"category": "AI", "index": 0}, - {"category": "Programming", "index": 1}, - {"category": "Database", "index": 2}, - {"category": "AI", "index": 3}, - {"category": "NLP", "index": 4} - ] + ids=["1", "2", "3"], + documents=[ + "Machine learning is a subset of AI.", + "Vector databases enable semantic search.", + "Natural language processing helps computers understand text.", + ], ) -print(f"\nAdded {len(documents)} documents to collection") -print("Note: Embeddings were automatically generated from documents using the embedding function") - -# ==================== Step 4: Query the Collection ==================== -# With embedding function, you can query using text directly -# The embedding function will automatically convert query text to query vector - -# Query using text - query vector will be auto-generated by embedding function -query_text = "artificial intelligence and machine learning" - -results = collection.query( - query_texts=query_text, # Query text - will be embedded automatically - n_results=3 # Return top 3 most similar documents -) - -print(f"\nQuery: '{query_text}'") -print(f"Query results: {len(results['ids'][0])} items found") - -# ==================== Step 5: Print Query Results ==================== -for i in range(len(results['ids'][0])): - print(f"\nResult {i+1}:") - print(f" ID: {results['ids'][0][i]}") - print(f" Distance: {results['distances'][0][i]:.4f}") - if results.get('documents'): - print(f" Document: {results['documents'][0][i]}") - if results.get('metadatas'): - print(f" Metadata: {results['metadatas'][0][i]}") - -# ==================== Step 6: Cleanup ==================== -# Delete the collection -client.delete_collection(collection_name) -print(f"\nDeleted collection '{collection_name}'") - -``` -Please refer to the [User Guide](https://github.com/oceanbase/pyseekdb) for more details. -
- -
-🗄️ SQL - -```sql --- Create table with vector column -CREATE TABLE articles ( - id INT PRIMARY KEY, - title TEXT, - content TEXT, - embedding VECTOR(384), - FULLTEXT INDEX idx_fts(content) WITH PARSER ik, - VECTOR INDEX idx_vec (embedding) WITH(DISTANCE=l2, TYPE=hnsw, LIB=vsag) - ) ORGANIZATION = HEAP; - --- Insert documents with embeddings --- Note: Embeddings should be pre-computed using your embedding model -INSERT INTO articles (id, title, content, embedding) -VALUES - (1, 'AI and Machine Learning', 'Artificial intelligence is transforming...', '[0.1, 0.2, ...]'), - (2, 'Database Systems', 'Modern databases provide high performance...', '[0.3, 0.4, ...]'), - (3, 'Vector Search', 'Vector databases enable semantic search...', '[0.5, 0.6, ...]'); - --- Example: Hybrid search combining vector and full-text --- Replace '[query_embedding]' with your actual query embedding vector -SELECT - title, - content, - l2_distance(embedding, '[query_embedding]') AS vector_distance, - MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE) AS text_score -FROM articles -WHERE MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE) -ORDER BY vector_distance APPROXIMATE -LIMIT 10; +results = collection.query(query_texts="what is AI?", n_results=2) +print(results["documents"]) ``` -We suggest developers use sqlalchemy to access data by SQL for python developers. -
- - -## 📚 Use Cases - -
- 📖 RAG & Knowledge Retrieval - -Large language models are limited by their training data. RAG introduces timely and trusted external knowledge to improve answer quality and reduce hallucination. seekdb enhances search accuracy through vector search, full-text search, hybrid search, built-in AI functions, and efficient indexing, while multi-level access control safeguards data privacy across heterogeneous knowledge sources. -1. Enterprise QA -2. Customer support -3. Industry insights -4. Personal knowledge - -
- -
- 🔍 Semantic Search Engine -Traditional keyword search struggles to capture intent. Semantic search leverages embeddings and vector search to understand meaning and connect text, images, and other modalities. seekdb's hybrid search and multi-model querying deliver more precise, context-aware results across complex search scenarios. -1. Product search -2. Text-to-image -3. Image-to-product +Other options: [Docker](https://github.com/oceanbase/docker-images/blob/main/seekdb/README.md) or [binary (RPM)](https://www.oceanbase.ai/docs/seekdb-overview/). Documentation: [seekdb](https://www.oceanbase.ai/docs/seekdb-overview/) and [pyseekdb](https://github.com/oceanbase/pyseekdb). -
+## Capabilities -
- 🎯 Agentic AI Applications +| Area | Description | +|------|-------------| +| Built-in AI | `AI_EMBED`, `AI_RERANK`, `AI_COMPLETE`, and prompt templates; run embedding and LLM from SQL. | +| Hybrid search | One query can combine vector, full-text, and relational filters; RRF or LLM rerank; filters pushed to storage. | +| Vector and full-text | Dense and sparse vectors (L2, IP, cosine), HNSW/IVF indexes, BM25 full-text with phrase and boolean. | +| SQL | OceanBase engine, ACID, MySQL protocol; works with standard MySQL clients and migration tools. | +| Deployment | Embedded (e.g. pyseekdb), single-node server, or [OceanBase](https://www.oceanbase.com/) for scale-out. | -Agentic AI requires memory, planning, perception, and reasoning. seekdb provides a unified foundation for agents through metadata management, vector/text/mixed queries, multimodal data processing, RAG, built-in AI functions and inference, and robust privacy controls—enabling scalable, production-grade agent systems. -1. Personal assistants -2. Enterprise automation -3. Vertical agents -4. Agent platforms +## Use Cases -
+- **RAG and knowledge bases** — Enterprise QA, customer support, internal docs; hybrid search with built-in embedding, rerank, and LLM. +- **Semantic search** — Product search, text-to-image, image-to-product; vector and full-text in one engine. +- **AI agents** — One store for memory, tool results, and RAG; metadata and vector/text search with built-in AI functions. +- **AI-assisted coding** — Code and docs in the same DB; semantic and keyword search, multi-project isolation. +- **Edge and embedded** — Lightweight embedded or micro-server mode; optional sync to OceanBase cloud. +- **Enterprise apps** — Add vector/search/AI to existing MySQL-compatible apps with minimal migration. -
- 💻 AI-Assisted Coding & Development +More: [Use cases on oceanbase.ai](https://www.oceanbase.ai/). -AI-powered coding combines natural-language understanding and code semantic analysis to enable generation, completion, debugging, testing, and refactoring. seekdb enhances code intelligence with semantic search, multi-model storage for code and documents, isolated multi-project management, and time-travel queries—supporting both local and cloud IDE environments. -1. IDE plugins -2. Design-to-web -3. Local IDEs -4. Web IDEs +## Ecosystem -
+Supports LangChain, LangGraph, LlamaIndex, Dify, Coze, FastGPT, DB-GPT, Hugging Face, Firecrawl, Spring AI Alibaba, Jina, Ragas, Instructor, Baseten, Cloudflare Workers AI, and others. See the [documentation](https://www.oceanbase.ai/docs/seekdb-overview/) for integration guides. -
- ⬆️ Enterprise Application Intelligence +## Community & Support -AI transforms enterprise systems from passive tools into proactive collaborators. seekdb provides a unified AI-ready storage layer, fully compatible with MySQL syntax and views, and accelerates mixed workloads with parallel execution and hybrid row-column storage. Legacy applications gain intelligent capabilities with minimal migration across office, workflow, and business analytics scenarios. -1. Document intelligence -2. Business insights -3. Finance systems +- [Discord](https://discord.gg/74cF8vbNEs) +- [GitHub Discussions](https://github.com/oceanbase/seekdb/discussions) +- [Forum (中文)](https://ask.oceanbase.com/) -
+## Development - -
- 📱 On-Device & Edge AI Applications - -Edge devices—from mobile to vehicle and industrial terminals—operate with constrained compute and storage. seekdb's lightweight architecture supports embedded and micro-server modes, delivering full SQL, JSON, and hybrid search under low resource usage. It integrates seamlessly with OceanBase cloud services to enable unified edge-to-cloud intelligent systems. -1. Personal assistants -2. In-vehicle systems -3. AI education -4. Companion robots -5. Healthcare devices - -
- ---- - -## 🌟 Ecosystem & Integrations - -
- -

- - HuggingFace - - - LangChain - - - LangGraph - - - Dify - - - Coze - - - LlamaIndex - - - Firecrawl - - - FastGPT - - - DB-GPT - - - Camel-AI - - - spring-ai-alibaba - - - Cloudflare Workers AI - - - Jina AI - - - Ragas - - - Instructor - - - Baseten - -

- -
- -Please refer to the [User Guide](https://www.oceanbase.ai/docs/seekdb-overview/) for more details. - - -
- ---- - - -## 🤝 Community & Support - -
- -

- - Discord - - - GitHub Discussion - - - Forum - -

- -
- ---- - -## 🛠️ Development - -### Build from Source - -Before building, please install the required toolchain and dependencies for your operating system. See [Install Toolchain](docs/developer-guide/en/toolchain.md) for detailed instructions. +To build from source, install the toolchain first. See [toolchain](docs/developer-guide/en/toolchain.md) and [developer guide](docs/developer-guide/en/README.md): ```bash -# Clone the repository -git clone https://github.com/oceanbase/seekdb.git -cd seekdb +git clone https://github.com/oceanbase/seekdb.git && cd seekdb bash build.sh debug --init --make -mkdir ~/seekdb -mkdir ~/seekdb/bin -cp build_debug/src/observer/seekdb ~/seekdb/bin -cd ~/seekdb -./bin/seekdb ``` -In this example, the working director is $HOME/seekdb, please use a fresh director for testing, Please see the [Developer Guide](docs/developer-guide/en/README.md) for detailed instructions. - -### Contributing - -We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) to get started. - ---- - - -## 📄 License - -OceanBase seekdb is licensed under the [Apache License, Version 2.0](LICENSE). +The developer guide describes how to run and where build outputs go. To contribute: [CONTRIBUTING.md](CONTRIBUTING.md). +## License +[Apache License, Version 2.0](LICENSE). diff --git a/README_CN.md b/README_CN.md index ebd12ba46..29db5609d 100644 --- a/README_CN.md +++ b/README_CN.md @@ -2,9 +2,9 @@ # -### **🔷 AI 原生混合搜索数据库** +### AI 原生搜索数据库 -**在一个数据库中融合向量、文本、结构化与半结构化数据能力,并通过内置 AI Functions 支持多模混合搜索与智能推理。** +统一的向量、全文、关系查询引擎;内置嵌入、重排和 LLM 等 AI 能力。
@@ -16,14 +16,14 @@ Documentation - Static Badge + Static Badge zread - + Forum - 钉钉群 33254054 + 钉钉群 33254054 Downloads @@ -36,408 +36,89 @@
-[English](README.md) | **中文版** - ---- +[English](README.md) | **中文**
-## 🚀 什么是 OceanBase seekdb? - -**OceanBase seekdb** 是 OceanBase 打造的一款开发者友好的 AI 原生数据库产品,专注于为 AI 应用提供高效的混合搜索能力。它支持向量、文本、结构化与半结构化数据的统一存储与检索,并通过内置 AI Functions 支持数据嵌入、重排与库内实时推理。seekdb 在继承 OceanBase 核心引擎高性能优势与 MySQL 全面兼容特性的基础上,通过深度优化数据搜索架构,为开发者提供更符合 AI 应用数据处理需求的解决方案。 - ---- - -## 🔥 为什么选择 OceanBase seekdb? - -| **Feature** | **seekdb** | **OceanBase** | **Chroma** | **Milvus** | **MySQL 9.0** | **PostgreSQL
+pgvector** | **DuckDB** | **Elasticsearch** | -| ------------------------ |:--------------------:|:-------------:|:----------:|:----------:|:-----------------------:|:----------------------------:|:----------:|:-----------------------------------:| -| **Embedded** | ✅ | ❌ | ✅ | ✅ | ❌[1] | ❌ | ✅ | ❌ | -| **Single-Node** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| **Distributed** | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | -| **MySQL Compatible** | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | -| **Vector Search** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | -| **Full-Text Search** | ✅ | ✅ | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | -| **Hybrid Search** | ✅ | ✅ | ✅ | ✅ | ❌ | ⚠️ | ❌ | ✅ | -| **OLTP** | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | -| **OLAP** | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⚠️ | -| **License** | Apache 2.0 | MulanPubL 2.0 | Apache 2.0 | Apache 2.0 | GPL 2.0 | PostgreSQL License | MIT | AGPLv3
+SSPLv1
+Elastic 2.0 | -> [1] MySQL 8.0 移除了嵌入式能力 -> - ✅ 支持 -> - ❌ 不支持 -> - ⚠️ 有限支持 - -## ✨ 核心特性 - -### 开箱即用,极速开发,易学易用 -采用单点架构设计,可快速完成安装配置;无其他组件依赖,单点启动即可运行,适用于 AI 业务敏捷开发场景。提供灵活多样的部署方式,支持服务器和嵌入式两种部署模式:服务器部署模式下,支持 yum install、docker 或 Windows/macOS 桌面版部署方式;嵌入式部署模式下,支持原生 Python 集成,可作为 AI 应用内嵌数据库。已集成各类 AI 应用开发框架,几分钟即可快速构建 AI 应用。 - -### 支持 1C2G 小规格,垂直弹性扩缩容 -1 核 CPU + 2GB 内存即可运行 VectorDBBench Performance1536D50K 基准测试。当系统对并发量、数据量、查询复杂度有较高要求时,可灵活垂直扩展资源规格。 - -### 高性能向量索引、全文索引,支持向量、全文、标量混合搜索 -* 向量搜索:支持高达 16,000 维向量存储与高性能检索,兼容 L2、内积、余弦相似度等多种距离计算方式。提供 HNSW/IVF 索引及相关量化算法,支持精确最近邻及近似最近邻搜索,满足 AI 场景多样化的向量检索需求。 -* 全文搜索:支持基于 BM25 相关性排序算法的高性能全文索引,实现面向关键词的精准搜索。提供 Space、Beng、Ngram、IK、Jieba 等多种分词器,支持 Natural Language Mode、Boolean Mode、Phrase Query、Multi Match 等多种查询模式,可在海量数据中高效检索符合过滤规则的相关文本。 -* 混合搜索:支持向量、全文、标量、空间等多类数据的混合搜索,一条 SQL 即可完成多路查询与重排序,大幅提升 RAG 应用查询结果的准确性。 - -### 向量搜索升级,基于 Semantic Index 指定文本也可进行语义搜索 -seekdb 提供了 Semantic Index 功能,只需写入文本数据,系统即可自动进行 Embedding 并生成向量索引,查询时仅需指定文本搜索条件即可进行语义搜索。该功能对用户屏蔽了向量嵌入和查询结果 Rerank 的复杂流程,显著简化 AI 应用开发对数据库的使用方式。 +## OceanBase seekdb 是什么? -### 无缝对接各类模型,内置 AI Function 实现库内实时推理 -seekdb 支持大语言模型和向量嵌入模型接入,通过 DBMS_AI_SERVICE 系统包实现模型注册和管理。内置 AI_COMPLETE、AI_PROMPT、AI_EMBED、AI_RERANK 等 AI Function,支持在标准 SQL 语法下进行数据嵌入和库内实时推理。 +OceanBase seekdb 是一款开源数据库,在统一的引擎中提供向量搜索、全文搜索和 SQL 等能力,并内置嵌入、重排和 LLM 等 AI 相关特性。在 RAG 开发时无需额外对接嵌入服务或 API,只需和 seekdb 直接进行交互。底层是 OceanBase 引擎,协议兼容 MySQL,用现有 MySQL 客户端和迁移方式即可。 -### 基于 JSON 的动态 Schema,支持文档元数据动态存储和高效访问 -seekdb 支持 JSON 数据类型,具备动态 Schema 能力。支持 JSON 的部分更新以降低数据更新成本,提供 JSON 函数索引、多值索引来优化查询性能。实现半结构化编码降低存储成本。在 AI 应用中,JSON 可作为文档元信息的存储类型,并支持与全文、向量的混合搜索。 +## 快速开始 -### 数据实时写入,实时可查 -基于 LSM-Tree 存储架构,seekdb 支持数据的高频实时写入。在执行数据 DML 操作时同步构建全文、向量、标量等各类索引,数据入库成功后立即可查。 - -### 兼容 MySQL 不止于 MySQL,支撑 HTAP 混合负载 -深度兼容 MySQL 的语法、协议、数据字典等,确保 MySQL 应用无缝迁移。同时通过创新架构突破 MySQL 支持的场景边界,基于行列混存技术和向量化执行能力,一个实例可同时支持联机交易和实时分析等多种负载,省去数据同步的时间延迟和同步链路的维护成本。 - ---- - -## 🎬 快速开始 - -### 安装 - -选择您的平台: - -
-🐍 Python(推荐用于 AI/ML) +安装 Python 客户端: ```bash pip install -U pyseekdb - -``` -
- -
-🐳 Docker(快速测试) - -```bash -docker run -d \ - --name seekdb \ - -p 2881:2881 \ - -p 2886:2886 \ - -v ./data:/var/lib/oceanbase \ - oceanbase/seekdb:latest ``` -请参考此 docker 镜像的[文档](https://github.com/oceanbase/docker-images/blob/main/seekdb/README_CN.md)获取详细信息。 - -
-
-📦 二进制文件(独立安装) - -```bash -# Linux -rpm -ivh seekdb-1.x.x.x-xxxxxxx.el8.x86_64.rpm -``` -请将版本号替换为实际的 RPM 包版本。 - -
- - -### 🎯 AI 搜索示例 - -在 5 分钟内构建语义搜索系统: - -
-🗄️ 🐍 Python SDK - -```bash -# install sdk first -pip install -U pyseekdb -``` +示例:建一个集合、插入文档、用自然语言查询。向量由数据库或 SDK 生成。 ```python -""" -this example demonstrates the most common operations with embedding functions: -1. Create a client connection -2. Create a collection with embedding function -3. Add data using documents (embeddings auto-generated) -4. Query using query texts (embeddings auto-generated) -5. Print query results - -This is a minimal example to get you started quickly with embedding functions. -""" - import pyseekdb -from pyseekdb import DefaultEmbeddingFunction - -# ==================== Step 1: Create Client Connection ==================== -# You can use embedded mode, server mode, or OceanBase mode -# For this example, we'll use server mode (you can change to embedded or OceanBase) - -# Embedded mode (local SeekDB) -client = pyseekdb.Client( - path="./seekdb.db", - database="test" -) -# Alternative: Server mode (connecting to remote SeekDB server) -# client = pyseekdb.Client( -# host="127.0.0.1", -# port=2881, -# database="test", -# user="root", -# password="" -# ) - -# Alternative: Remote server mode (OceanBase Server) -# client = pyseekdb.Client( -# host="127.0.0.1", -# port=2881, -# tenant="test", # OceanBase default tenant -# database="test", -# user="root", -# password="" -# ) - -# ==================== Step 2: Create a Collection with Embedding Function ==================== -# A collection is like a table that stores documents with vector embeddings -collection_name = "my_simple_collection" - -# Create collection with default embedding function -# The embedding function will automatically convert documents to embeddings -collection = client.create_collection( - name=collection_name, - #embedding_function=DefaultEmbeddingFunction() # Uses default model (384 dimensions) -) - -print(f"Created collection '{collection_name}' with dimension: {collection.dimension}") -print(f"Embedding function: {collection.embedding_function}") -# ==================== Step 3: Add Data to Collection ==================== -# With embedding function, you can add documents directly without providing embeddings -# The embedding function will automatically generate embeddings from documents +client = pyseekdb.Client(path="./seekdb.db", database="test") +collection = client.create_collection("docs") -documents = [ - "Machine learning is a subset of artificial intelligence", - "Python is a popular programming language", - "Vector databases enable semantic search", - "Neural networks are inspired by the human brain", - "Natural language processing helps computers understand text" -] - -ids = ["id1", "id2", "id3", "id4", "id5"] - -# Add data with documents only - embeddings will be auto-generated by embedding function collection.add( - ids=ids, - documents=documents, # embeddings will be automatically generated - metadatas=[ - {"category": "AI", "index": 0}, - {"category": "Programming", "index": 1}, - {"category": "Database", "index": 2}, - {"category": "AI", "index": 3}, - {"category": "NLP", "index": 4} - ] -) - -print(f"\nAdded {len(documents)} documents to collection") -print("Note: Embeddings were automatically generated from documents using the embedding function") - -# ==================== Step 4: Query the Collection ==================== -# With embedding function, you can query using text directly -# The embedding function will automatically convert query text to query vector - -# Query using text - query vector will be auto-generated by embedding function -query_text = "artificial intelligence and machine learning" - -results = collection.query( - query_texts=query_text, # Query text - will be embedded automatically - n_results=3 # Return top 3 most similar documents + ids=["1", "2", "3"], + documents=[ + "Machine learning is a subset of AI.", + "Vector databases enable semantic search.", + "Natural language processing helps computers understand text.", + ], ) -print(f"\nQuery: '{query_text}'") -print(f"Query results: {len(results['ids'][0])} items found") - -# ==================== Step 5: Print Query Results ==================== -for i in range(len(results['ids'][0])): - print(f"\nResult {i+1}:") - print(f" ID: {results['ids'][0][i]}") - print(f" Distance: {results['distances'][0][i]:.4f}") - if results.get('documents'): - print(f" Document: {results['documents'][0][i]}") - if results.get('metadatas'): - print(f" Metadata: {results['metadatas'][0][i]}") - -# ==================== Step 6: Cleanup ==================== -# Delete the collection -client.delete_collection(collection_name) -print(f"\nDeleted collection '{collection_name}'") -``` -更多详情请参考[用户指南](https://github.com/oceanbase/pyseekdb)。 -
- -
-🗄️ SQL - -```sql --- Create table with vector column -CREATE TABLE articles ( - id INT PRIMARY KEY, - title TEXT, - content TEXT, - embedding VECTOR(384), - FULLTEXT INDEX idx_fts(content) WITH PARSER ik, - VECTOR INDEX idx_vec (embedding) WITH(DISTANCE=l2, TYPE=hnsw, LIB=vsag) - ) ORGANIZATION = HEAP; - --- Insert documents with embeddings --- Note: Embeddings should be pre-computed using your embedding model -INSERT INTO articles (id, title, content, embedding) -VALUES - (1, 'AI and Machine Learning', 'Artificial intelligence is transforming...', '[0.1, 0.2, ...]'), - (2, 'Database Systems', 'Modern databases provide high performance...', '[0.3, 0.4, ...]'), - (3, 'Vector Search', 'Vector databases enable semantic search...', '[0.5, 0.6, ...]'); - --- Example: Hybrid search combining vector and full-text --- Replace '[query_embedding]' with your actual query embedding vector -SELECT - title, - content, - l2_distance(embedding, '[query_embedding]') AS vector_distance, - MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE) AS text_score -FROM articles -WHERE MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE) -ORDER BY vector_distance APPROXIMATE -LIMIT 10; +results = collection.query(query_texts="what is AI?", n_results=2) +print(results["documents"]) ``` -对于python 开发者, 推荐使用sqlalchemy 来操作数据 -
- - -## 📚 使用场景 +也可通过 [Docker](https://github.com/oceanbase/docker-images/blob/main/seekdb/README_CN.md) 或 [二进制(RPM)](https://www.oceanbase.ai/docs/seekdb-overview/) 运行。文档:[seekdb](https://www.oceanbase.ai/docs/seekdb-overview/)、[pyseekdb](https://github.com/oceanbase/pyseekdb)。 -### 📖 RAG 应用 -针对智能聊天机器人、知识库及领域专家系统等 RAG(检索增强生成)场景,seekdb 提供了一套完整的 RAG Pipeline 解决方案。该方案整合了文档解析处理、向量嵌入(Embedding)、结果重排序(Rerank)及大语言模型(LLM)交互能力,支持向量、全文与标量的混合搜索,可在单一数据库实例内完成从文档输入到数据输出的端到端处理(Doc In Data Out)。以知识库场景为例,seekdb 能够从知识库中高效检索事实信息,为 LLM 提供精准、实时的数据支撑,既提升了生成内容的准确性,又增强了生成过程的可解释性。 -### 💻 AI 辅助编程 -面向 AI 辅助编程场景,seekdb 支持对代码仓库构建向量和全文索引,基于代码关键词或代码语义进行高效的代码搜索和生成补全。同时,seekdb 提供了高效的数据组织能力,支持代码片段的结构化存储(如语法树、依赖关系图谱)与非结构化存储(如原始代码文本),并通过动态元数据管理实现对代码属性(如语言类型、函数名、参数列表)的灵活扩展与高效查询。 +## 能力概览 -### 🎯 AI Agent 平台类应用 -seekdb 为 AI Agent 开发提供了一站式的数据解决方案,支持快速启动和嵌入式部署,可及时拉起服务以满足敏捷开发需求。其高性能引擎保障高频增删改操作和实时查询能力,有效消除数据库性能瓶颈对 AI 开发效率的影响。内置向量搜索、全文搜索及混合搜索功能,配合灵活的元数据管理和会话管理能力,同时集成记忆存储模块,无需引入其他库即可快速构建完备的 AI Agent,显著降低系统复杂度和开发门槛。 +| 方面 | 说明 | +|------|------| +| 内置 AI | `AI_EMBED`、`AI_RERANK`、`AI_COMPLETE` 与提示词模板;在 SQL 中完成嵌入与 LLM 推理。 | +| 混合搜索 | 一条查询可同时使用向量、全文与关系条件;支持 RRF 或 LLM 重排;过滤下推到存储。 | +| 向量与全文 | 稠密/稀疏向量(L2、内积、余弦),HNSW/IVF 索引,BM25 全文支持短语与布尔查询。 | +| SQL | OceanBase 引擎,ACID,MySQL 协议;可用常见 MySQL 客户端与迁移工具。 | +| 部署 | 嵌入式(如 pyseekdb)、单机服务,或迁移至 [OceanBase](https://www.oceanbase.com/) 做扩展。 | -### 🔍 语义搜索引擎 -针对电商商品搜索与推荐、多媒体内容检索、图片搜索、人脸识别等语义搜索场景,seekdb 提供了完整的向量搜索解决方案。支持对接主流向量嵌入模型,将文本或图像特征以向量形式存储在 seekdb 中,并通过高性能索引实现高效的相似度计算,快速返回与查询内容最匹配的结果。同时,seekdb 的 Semantic Index 功能进一步简化了开发流程,用户只需提交文本查询即可自动完成向量嵌入和结果重排序(Rerank),无需关注底层复杂实现,显著降低 AI 应用与数据库的集成门槛,使语义搜索更加易用且高效。 +## 使用场景 -### ⬆️ MySQL 应用现代化和 AI 化升级 -seekdb 继承了 OceanBase 单机存储引擎、执行引擎、事务引擎、高级查询优化器的完整能力,高度兼容 MySQL,并在此基础上扩展了 AI 能力。小规格适用于物联网边缘设备、小型应用开发和实验教学等场景,中大规格适用于各行业 OLTP、HTAP 或 AI 业务场景。 +- **RAG 与知识库** — 企业问答、客服、内部文档;混合检索配合内置嵌入、重排与 LLM。 +- **语义搜索** — 商品搜索、以文搜图、以图搜商品;向量与全文在同一引擎。 +- **AI Agent** — 记忆、工具结果与 RAG 统一存储;元数据与向量/文本检索,内置 AI 函数。 +- **AI 辅助编程** — 代码与文档同库;语义与关键词检索、多项目隔离。 +- **边缘与嵌入式** — 轻量嵌入式或微服务模式;可按需与 OceanBase 云同步。 +- **企业应用** — 在兼容 MySQL 的现有应用上增加向量/搜索/AI,迁移量小。 -## 🌟 生态系统与集成 +更多见 [oceanbase.ai 应用场景](https://www.oceanbase.ai/)。 -
- -

- - HuggingFace - - - LangChain - - - LangGraph - - - Dify - - - Coze - - - LlamaIndex - - - Firecrawl - - - FastGPT - - - DB-GPT - - - Camel-AI - - - spring-ai-alibaba - - - Cloudflare Workers AI - - - Jina AI - - - Ragas - - - Instructor - - - Baseten - -

+## 生态 -
+支持与 LangChain、LangGraph、LlamaIndex、Dify、Coze、FastGPT、DB-GPT、Hugging Face、Firecrawl、Spring AI Alibaba、Jina、Ragas、Instructor、Baseten、Cloudflare Workers AI 等集成。集成说明见 [文档](https://www.oceanbase.ai/docs/seekdb-overview/)。 -更多详情请参考[用户指南](https://www.oceanbase.ai/docs/seekdb-overview/)。 +## 社区与支持 +- [Discord](https://discord.gg/74cF8vbNEs) +- [GitHub Discussions](https://github.com/oceanbase/seekdb/discussions) +- [问答论坛(中文)](https://ask.oceanbase.com/) +- [钉钉群 33254054](https://h5.dingtalk.com/circle/joinCircle.html?corpId=ding320493024256007024f2f5cc6abecb85&token=be84625101d2c2b2b675e1835e5b7988&groupCode=v1,k1,EoWBexMbnAnivFZPFszVivlsxkpAYNcvXRdF071nRRY=&from=group&ext=%7B%22channel%22%3A%22QR_GROUP_NORMAL%22%2C%22extension%22%3A%7B%22groupCode%22%3A%22v1%2Ck1%2CEoWBexMbnAnivFZPFszVivlsxkpAYNcvXRdF071nRRY%3D%22%2C%22groupFrom%22%3A%22group%22%7D%2C%22inviteId%22%3A1057855%2C%22orgId%22%3A313467091%2C%22shareType%22%3A%22GROUP%22%7D&origin=11?#/) -
+## 开发 ---- - - -## 🤝 社区与支持 - -
- -

- - 钉钉群 33254054 - - - Forum - -

- -
- ---- - -## 🛠️ 开发 - -### 从源码构建 - -在构建之前,请先根据你的操作系统安装所需的工具链和依赖。详见 [安装工具链](docs/developer-guide/zh/toolchain.md)。 +从源码构建前需先安装工具链,参见 [安装工具链](docs/developer-guide/zh/toolchain.md) 与 [开发者指南](docs/developer-guide/zh/README.md): ```bash -# Clone the repository -git clone https://github.com/oceanbase/seekdb.git -cd seekdb +git clone https://github.com/oceanbase/seekdb.git && cd seekdb bash build.sh debug --init --make -mkdir ~/seekdb -mkdir ~/seekdb/bin -cp build_debug/src/observer/observer ~/seekdb/bin -cd ~/seekdb -./bin/observer ``` -本例中, 使用 $HOME/seekdb 作为测试目录, 开发者可以酌情使用一个空目录作为测试工作目录, 更多详细说明请参见[开发者指南](docs/developer-guide/zh/README.md)。 - -### 贡献 - -我们欢迎贡献!请查看我们的[贡献指南](CONTRIBUTING.md)开始。 - ---- - - -## 📄 许可证 - -OceanBase seekdb 采用 [Apache License, Version 2.0](LICENSE) 许可证。 +开发者指南中说明了如何运行以及构建产物的位置。贡献指南见 [CONTRIBUTING.md](CONTRIBUTING.md)。 +## 许可证 +[Apache License, Version 2.0](LICENSE)。