OriginHubAI · MaYangrui6 · Jan 29, 2026 · Jan 12, 2026 · Jan 21, 2026 · Jan 21, 2026
diff --git a/README.md b/README.md
@@ -1,21 +1,78 @@
-# How To Run
+# MCP-VectorSQL
+
+## Overview
+
+MCP-VectorSQL is a powerful vector SQL generation tool that converts natural language questions into high-quality SQL queries, specifically designed for vector databases. It enables users to interact with vector databases using natural language, simplifying complex vector search operations.
+
+## Architecture
+
+![Text2VectorSQL Evaluation Process](./benchmark/figures/mcp_vector_sql.png)
+
+The architecture consists of three main components:
+
+1. **Text2VectorSql**: Handles natural language input and generates unified SQL output
+2. **LLM**: Processes natural language questions and generates vector queries
+3. **VecDB (MyScale)**: Performs vector similarity searches and stores vector data
+
+The workflow includes:
+- Step 1: LLM lists database tables and schemas from the vector database
+- Step 2: Text2VectorSql gets vector queries based on natural language questions
+- Step 3: VecDB executes vector queries and returns results
+
+## Core Features
+
+### Natural Language Processing
+- Accepts direct natural language questions from users
+- Converts natural language into structured vector queries
+- Supports complex questions with multiple conditions
+
+### Vector Similarity Search
+- Performs efficient similarity searches on vector databases
+- Supports various similarity metrics (cosine similarity, Euclidean distance, etc.)
+- Optimized for large-scale vector datasets
+
+### Answer Integration
+- Processes and integrates results from vector searches
+- Combines information from multiple sources if needed
+- Generates coherent and comprehensive answers
+
+### Response Generation
+- Returns natural language answers based on search results
+- Provides relevant and accurate information to users
+- Maintains context and relevance throughout the conversation
+
+## Quick Start
+
+### 1. Configure Environment
 
-1. Configuare local service env:
 ```bash
+# Copy environment variable example file
 cp .env.example .env
 ```
 
-Modify ref env config
-
+Modify the `.env` file with your configuration:
+- API settings (API_KEY, API_URL, etc.)
+- Database settings (MYSCALE_HOST, MYSCALE_PORT, MYSCALE_USER, etc.)
+- Server settings (MCP_SERVER_TRANSPORT, MCP_BIND_HOST, etc.)
 
-2. Run Mcp Server
+### 2. Initialize and Run MCP Server
 
 ```bash
-# init runtime env
+# Initialize runtime environment
 uv sync --all-extras --dev
 
-# run mcp server
+# Run MCP server
 uv run python -m mcp_server.main
 ```
 
-3. Regist Mcp Tools in Dify
+### 3. Register MCP Tools in Dify
+
+Register the MCP server with the Dify platform to use its SQL generation capabilities.
+
+## License
+
+Please refer to the [LICENSE](LICENSE) file for license information.
+
+## Contact
+
+For questions or suggestions, please contact the development team.
diff --git a/benchmark/.env.example b/benchmark/.env.example
@@ -0,0 +1,16 @@
+# Dify API
+API_KEY=your-api-key-here
+DIFY_URL=https://api.dify.ai/v1/chat-messages
+
+# MyScale config or other vector database configuration
+MYSCALE_HOST=your-myscale-host-here
+MYSCALE_PORT=8123
+MYSCALE_USER=your-myscale-username-here
+MYSCALE_PASSWORD=your-myscale-password-here
+MYSCALE_DATABASE=your-database-name-here
+
+# LLM API
+LLM_API_URL=https://your-llm-api-url-here/v1/chat/completions
+LLM_API_KEY=your-llm-api-key-here
+LLM_MODEL=your-llm-model-name-here
+LLM_EVALUATION_ENABLED=True
diff --git a/benchmark/READMD.md b/benchmark/READMD.md
@@ -0,0 +1,167 @@
+# MCP SQLVectorDB Benchmark
+
+This benchmark is designed to evaluate the performance of MCP SQLVectorDB models on Text2VectorSQL tasks. It provides comprehensive metrics to assess the accuracy, recall, and overall quality of SQL generation from natural language questions.
+
+## How to Run this Benchmark
+
+### Prerequisites
+
+Before running the benchmark, ensure you have:
+- Python 3.8+
+- Dify API access with a valid API key
+- LLM API access with a valid API key
+- MyScale database access
+- Required Python packages (install via `pip install -r requirements.txt`)
+
+### Configuration
+
+The benchmark requires the following configuration, which can be modified in `.env` file:
+
+```
+# Dify API配置
+API_KEY=your-api-key-here
+DIFY_URL=https://api.dify.ai/v1/chat-messages
+
+# MyScale数据库配置
+MYSCALE_HOST=your-myscale-host
+MYSCALE_PORT=8123
+MYSCALE_USER=your-myscale-username
+MYSCALE_PASSWORD=your-myscale-password
+MYSCALE_DATABASE=your-database-name
+
+# LLM API配置
+LLM_API_URL=your-llm-api-url
+LLM_API_KEY=your-llm-api-key
+LLM_MODEL=your-llm-model
+LLM_EVALUATION_ENABLED=True
+```
+
+### Running the Benchmark
+
+You can run the benchmark using the following command:
+
+```bash
+cd benchmark
+python benchmark.py [options]
+```
+
+#### Command Line Options
+
+- `--dataset`: Path to the dataset file (default: `./data/results/test/olympics/olympics_qs.json`)
+- `--output`: Path to save the results (default: `./results`)
+- `--text-num`: Number of samples to test (default: all)
+- `--no-llm`: Disable LLM evaluation (default: enabled)
+
+#### Examples
+
+1. Run with default settings:
+   ```bash
+   python benchmark.py
+   ```
+
+2. Run with custom dataset and output path:
+   ```bash
+   python benchmark.py --dataset ./custom_dataset.json --output ./custom_results
+   ```
+
+3. Run with only 50 samples and without LLM evaluation:
+   ```bash
+   python benchmark.py --text-num 50 --no-llm
+   ```
+
+### Output
+
+The benchmark generates a JSON file with timestamp in the output directory (e.g., `benchmark_results-20260114-182456.json`). The output includes:
+
+- Summary statistics (total samples, success rate, average metrics)
+- Detailed results for each sample (question, standard SQL, predicted SQL, evaluation metrics)
+
+## Evaluation
+
+The benchmark uses a comprehensive set of metrics to evaluate Text2SQL performance:
+
+### 1. Exact Match Metrics
+
+- **Exact Match**: Whether the predicted SQL exactly matches any of the ground truth SQL statements
+
+### 2. Set Metrics
+
+- **Precision**: The proportion of correctly predicted results among all predicted results
+- **Recall**: The proportion of correctly predicted results among all ground truth results
+- **F1 Score**: The harmonic mean of precision and recall
+
+### 3. Ranking Metrics
+
+- **MAP (Mean Average Precision)**: Average precision across all queries, considering the order of results
+- **MRR (Mean Reciprocal Rank)**: Average of the reciprocals of the ranks of the first relevant result
+- **NDCG (Normalized Discounted Cumulative Gain)**: Measures the ranking quality by discounting results further down the list
+
+### 4. LLM-Based Evaluation
+
+- **ACC_SQL**: Binary score (0/1) for SQL skeleton correctness evaluated by LLM
+- **ACC_Vec**: Binary score (0/1) for vector component correctness evaluated by LLM
+- **LLM Overall**: Average of ACC_SQL and ACC_Vec scores
+
+### Evaluation Process
+
+1. **SQL Extraction**: Extract SQL statements from MCP SQLVectorDB's natural language responses
+2. **SQL Execution**: Execute both standard and predicted SQL on the MyScale database
+3. **Result Comparison**: Compare execution results using set and ranking metrics
+4. **LLM Evaluation**: (Optional) Use GPT-4o to evaluate SQL semantic correctness
+
+## Environment
+
+### System Requirements
+
+- **Operating System**: Linux/macOS/Windows
+- **Architecture**: x86-64 (recommended)
+- **Memory**: 8GB+ RAM
+- **Storage**: 1GB+ free disk space
+
+### Python Dependencies
+
+- `requests`: For API calls
+- `clickhouse_connect`: For connecting to MyScale database
+- `argparse`: For command-line argument parsing
+- `json`: For data handling
+- `os`: For file system operations
+- `datetime`: For timestamp generation
+
+### Database Requirements
+
+- **Database**: MyScale
+- **Vector Index**: Pre-built vector indexes for efficient similarity search
+- **Tables**: Database schema should match the test dataset requirements
+
+### API Requirements
+
+- **API**: Access to MCP SQLVectorDB's API with Text2SQL capabilities
+- **LLM API**: Access to LLM model API with a valid API key
+- **OpenAI API**: (Optional) For LLM-based evaluation using GPT-4o
+
+## Troubleshooting
+
+### Common Issues
+
+1. **API Connection Errors**: Verify your API key and network connectivity
+2. **LLM API Errors**: Verify your LLM API key and network connectivity
+3. **Database Errors**: Check MyScale connection parameters and database permissions
+4. **SQL Execution Failures**: Ensure the database schema matches the expected structure
+5. **LLM Evaluation Failures**: Verify OpenAI API access if using LLM evaluation
+
+### Logging
+
+Detailed logs are generated during benchmark execution, including:
+- SQL execution results
+- Evaluation metrics
+- Error messages
+
+Logs can be found in the `log/` directory for debugging purposes.
+
+## License
+
+This benchmark is provided for evaluation purposes only. Please contact the maintainers for licensing information.
+
+## Contact
+
+For questions or issues, please contact the development team.