🎯 Multi-attempt can improve robustness, but gains vary — see the real benchmark results below.
🌐 Bilingual (English/Chinese) prompts supported — accuracy depends on your data and schema quality.
English | 中文文档
┌─────────────┐ ┌──────────────────────┐ ┌─────────────┐
│ User Input │────▶│ Send to Cloud API │────▶│ GPT-4/Claude│
│ "Query..." │ │ Data Leaves Local │ │ Cloud API │
└─────────────┘ └──────────────────────┘ └─────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Data Privacy Risk │ │ $200-2000/month │
└──────────────────┘ └──────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ 🏠 100% Local Environment │
│ │
│ ┌────────────┐ ┌─────────────────┐ ┌──────────────────┐ │
│ │ User Input │────▶│ Tokligence │────▶│ Ollama Models │ │
│ │ "Query..." │ │ LocalSQLAgent │ │ Qwen2.5-Coder:7B │ │
│ └────────────┘ │ Intelligent │ └──────────────────┘ │
│ │ Agent │ │
│ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ Local Databases (Data Never Leaves) │ │
│ │ PostgreSQL│MySQL│MongoDB│ClickHouse │ │
│ └──────────────────────────────────────┐ │
│ │
│ Advantages: 💰 $0 Cost 🔒 100% Privacy ⚡ 1-3s/attempt 📈 Multi-attempt gains│
└──────────────────────────────────────────────────────────────────────┘
Using the IntelligentSQLAgent with enhanced error learning and semantic understanding.
Spider Dataset (50 samples) - qwen2.5-coder:7b
| Version | Exec Accuracy | Exact Match | Avg Latency | Avg Attempts | Improvements |
|---|---|---|---|---|---|
| Original | 82.00% | 0.00% | 9.60s | 2.74 | Baseline |
| Phase 2 (Current) | 86.00% | 14.00% | 5.41s | 2.50 | ✅ Error Learning |
| Phase 3 | 86.00% | 14.00% | 5.37s | 2.52 | + Semantic Analysis |
Key Improvements:
- +4% accuracy: Enhanced error learning mechanism allows the agent to learn from SQL execution errors
- 44% faster: Optimized prompts reduced average latency from 9.6s to 5.4s
- Smart error recovery: Automatically classifies and fixes 7 types of SQL errors
- Better column understanding: Distinguishes between column names and aggregate functions
Model Comparison on Spider (50 samples)
| Model | Type | Params | Disk Size | RAM Usage | Exec Accuracy | Avg Latency | Notes |
|---|---|---|---|---|---|---|---|
| qwen2.5-coder:7b | Domain-specific (code) | 7B | 4.7 GB | ~6 GB | 86.00% | 5.41s | ✅ Best overall |
| gpt-oss:20b | General purpose | 20B | 13 GB | ~16 GB | 90.00% | 20.83s | |
| qwen2.5:14b | General purpose | 14B | 9.0 GB | ~12 GB | 82.00% | 10.02s | ❌ Worse accuracy, 2x slower |
| sqlcoder:7b | SQL-specific | 7B | 4.1 GB | ~5 GB | 2.00% | 2.92s | ❌ Failed - JSON/prompt issues |
| sqlcoder:15b | SQL-specific | 15B | 9.0 GB | ~11 GB | 6.00% | 0.01s* | ❌ Failed - not compatible |
| deepseek-coder-v2:16b | Domain-specific (code) | 16B | 8.9 GB | ~11 GB | 68.00% | 4.04s | ✅ Good accuracy, slower than 7B |
| codestral:22b-v0.1-q4_0 | Domain-specific (code) | 22B | 12 GB | ~15 GB | 82.00% | 30.59s | |
| mistral:7b-instruct | General purpose | 7B | 4.1 GB | ~5 GB | Failed | 31-39s* | ❌ JSON errors, extremely slow |
| mixtral:8x7b | MoE Architecture | 8x7B | 26 GB | ~30 GB | Pending | - | 📦 Downloaded, test pending |
*Failed tests show artificially low latency as models weren't running properly
Key Lesson Learned:
-
Domain-specific models (like qwen2.5-coder) outperform larger general models for SQL tasks
-
Test Results Ranking:
- gpt-oss:20b - Highest accuracy (90%) but 4x slower (20.8s) with JSON errors
- qwen2.5-coder:7b - Best balance: 86% accuracy, 5.41s latency, no errors
- codestral:22b - Code-optimized: 82% accuracy but very slow (30.59s)
- qwen2.5:14b - General model: 82% accuracy, 10.02s latency
- deepseek-coder-v2:16b - Good performance: 68% accuracy, 4.04s latency
-
Why larger models failed:
- gpt-oss:20b (20B params): 4x slower, JSON compliance issues, memory intensive
- qwen2.5:14b (14B params): Lower accuracy than 7B coder model, 2x slower
- codestral:22b (22B params): 6x slower than qwen2.5-coder, no accuracy improvement
- mistral:7b-instruct: JSON generation failures, extremely slow (31-39s/query)
- Larger model size ≠ better performance for specialized tasks
-
The specialized training of qwen2.5-coder on code/SQL beats raw model size
Why gpt-oss:20b Failed Despite Being Larger:
- Training Data Mismatch: Trained on conversational data, not code/SQL
- JSON Generation: Unable to reliably generate structured JSON responses required by the agent
- Inference Overhead: Larger model = slower inference without proportional accuracy gains
- Context Understanding: Struggles with database schema context compared to code-specific models
Why SQL-Specific Models (sqlcoder) Failed:
- Prompt Format Incompatibility: sqlcoder models expect different prompt formats than our agent system
- JSON Generation Issues: Unable to generate structured JSON required by IntelligentSQLAgent
- Outdated Training: Older models may lack modern instruction-following capabilities
- Agent Integration: These models were designed for direct SQL generation, not agent-based systems
Recommendation:
- ✅ Use modern code-focused models (qwen2.5-coder, deepseek-coder) for SQL tasks
⚠️ Be cautious with SQL-specific models (sqlcoder) - they may not work with agent frameworks- ❌ Avoid general conversational models (gpt-oss, llama-chat) even if they have more parameters
- 📊 Prioritize training domain match AND instruction-following capability
Dataset: Spider dev (first 100 samples). Model: qwen2.5-coder:7b (Ollama). Stop-on-success enabled. Temperature: 0.0 (1 attempt), 0.2 (5/7 attempts).
Host (venv)
| Max Attempts | Exec Accuracy | Exact Match | Avg Latency | Avg Attempts |
|---|---|---|---|---|
| 1 | 84% | 3% | 2.43s | 1.00 |
| 5 | 85% | 4% | 3.97s | 1.66 |
| 7 | 85% | 4% | 4.79s | 1.94 |
Docker (local image tag localsqlagent-api)
| Max Attempts | Exec Accuracy | Exact Match | Avg Latency | Avg Attempts |
|---|---|---|---|---|
| 1 | 84% | 3% | 2.56s | 1.00 |
| 5 | 84% | 2% | 4.22s | 1.66 |
| 7 | 84% | 3% | 4.77s | 1.96 |
Key Finding: On this subset, extra attempts improve execution accuracy marginally (84% → 85%) and increase latency.
BIRD dev: The official dev.zip download from bird-bench.oss-cn-beijing.aliyuncs.com failed from this environment, and the Hugging Face mini-dev lacks SQLite DBs/schema, so BIRD execution accuracy was not run yet. Provide the full BIRD dev dataset under data/bird to enable --benchmark bird.
Reproduce (Spider dev subset):
python benchmarks/sql_benchmark.py --model ollama --model-name qwen2.5-coder:7b \
--benchmark spider --limit 100 --max-attempts 5 --temperature 0.2Run:
pytest tests/integration/test_schema_discovery_mysql_clickhouse.py \
tests/integration/test_multi_statement_execution.py \
tests/integration/test_multi_schema_postgres.py \
tests/integration/test_live_services_smoke.py -qLatest run (macOS, local Docker, 2026-01-16): 7 passed.
For multi-db baselines (PostgreSQL/MySQL/ClickHouse/MongoDB), run the live integration tests and your own benchmark suite. Results vary significantly by schema quality, data distribution, and model choice.
| Solution | API/License Cost | Infrastructure | Total Cost | Data Privacy |
|---|---|---|---|---|
| LocalSQLAgent | $0 | $0 | $0 ✅ | 100% Local 🔒 |
| GPT-4 API* | ~$50-500+ | $0 | ~$50-500+ | Data sent to cloud |
| Claude API* | ~$40-400+ | $0 | ~$40-400+ | Data sent to cloud |
| Self-hosted LLM | $0 | $500+ (GPU rental) | $500+ | Requires expertise |
*Costs vary significantly based on usage volume and model selection
Hardware: Regular laptop (8GB RAM)
Model Size: 7B parameters (4GB disk space)
Response Time: 1-3 seconds per attempt
Accuracy (Spider dev first 100 samples):
• 1 Attempt: 84% execution accuracy
• 5 Attempts: 85% execution accuracy
• 7 Attempts: 85% execution accuracy
• Time Trade-off: ~2.4s → ~4.8s average latency
Concurrent Support: 10+ QPS
- Multi-Attempt Lift: 84% → 85% on Spider dev subset (first 100)
- Zero API Costs: No recurring fees (vs potentially hundreds/month for cloud APIs)
- 100% Privacy: Your data never leaves your machine
- Bilingual Native: Full support for English and Chinese queries
- 5-Second Results: Complex queries solved in 5-15 seconds total
Query in English: "Find recent popular products"
查询用中文: "查找最近的热门产品"
Both work perfectly! Ambiguity detection in both languages:
• English accuracy: 81.8%
• Chinese accuracy: 83.3%
• Automatic language detection
# 1. Clone the repository
git clone https://github.com/tokligence/LocalSQLAgent.git
cd LocalSQLAgent
# 2. Use the default docker-compose.yml (with host network mode)
docker-compose up -d # Start all services
# 3. Launch the new ChatGPT-style Web UI
make web-ui # Start chat interface at http://localhost:8501
# 4. (Optional) Start API Server
make api-server # Start API server at http://localhost:8711# 1. Clone the repository
git clone https://github.com/tokligence/LocalSQLAgent.git
cd LocalSQLAgent
# 2. Use the macOS-specific configuration (with port mappings)
docker-compose -f docker-compose.macos.yml up -d # Start all services
# 3. Launch the new ChatGPT-style Web UI
make web-ui # Start chat interface at http://localhost:8501
# 4. (Optional) Start API Server
make api-server # Start API server at http://localhost:8711- 💬 ChatGPT-style conversation - Natural chat interface like OpenAI
- 🤔 Interactive clarifications - Agent asks questions when needed
- 📊 In-chat results - SQL and data displayed directly in conversation
- 📝 Conversation memory - Maintains context across messages
- 💾 Export chat history - Save conversations as JSON
make help # Show all available commands
make benchmark # Run full benchmarks
make clean # Clean up containers and dataAll services use host network mode for optimal performance and simplicity:
# Start all services (databases + web UI + API)
docker-compose up -d
# View running services
docker-compose ps
# View logs
docker-compose logs -f
# Stop all services
docker-compose down# Create and setup virtual environment
make venv-setup
# Activate virtual environment
source venv/bin/activate
# Start databases and run application
make start
make web-ui # or make api-server# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Download model (4GB, one-time download)
ollama pull qwen2.5-coder:7b
# 3. Start databases (optional)
docker-compose up -d
# 4. Install dependencies and run
pip install -r requirements.txt
python quick_start.pyThat's it! No API keys, no cloud services, no credit cards 🎉
Example Conversation:
👤 User: Show me the top 5 customers by revenue from last month
🤖 Assistant: I need some clarification:
- The term 'last month' is ambiguous. Did you mean:
• December 2025
• The last 30 days
• Since the beginning of December
Please provide more specific details.
👤 User: December 2025
🤖 Assistant: ✅ Query executed successfully!
Attempts: 2 | Execution Time: 1.23s | Rows: 5
Generated SQL:
SELECT c.customer_name, SUM(o.total_amount) as revenue
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2025-12-01' AND o.order_date < '2026-01-01'
GROUP BY c.customer_name
ORDER BY revenue DESC
LIMIT 5
Results:
┌─────────────────┬──────────┐
│ Customer Name │ Revenue │
├─────────────────┼──────────┤
│ Acme Corp │ $45,230 │
│ Tech Solutions │ $38,150 │
│ Global Trade │ $31,890 │
│ Prime Services │ $28,750 │
│ Star Industries │ $24,320 │
└─────────────────┴──────────┘
- 🎭 Natural Conversation - Chat naturally like with ChatGPT
- 🤔 Smart Clarifications - Agent asks for specifics when queries are ambiguous
- 📊 Inline Results - SQL and data displayed directly in chat
- 💬 Context Memory - Maintains conversation context
- 📥 Export Chat - Download conversation history as JSON
- 🔄 Real-time Updates - See SQL generation progress
- 🧭 Schema Overview - "Explore the database" shows live schema cards
- 🧩 Multi-DB Routing - Select multiple databases and compare answers
- 🛡️ Safety Rails - Read-only default, configurable DDL/DML, LIMIT guardrails
- 🧭 Schema Controls - Toggle samples/row counts and filter schemas
Launch with: make web-ui or streamlit run web/app.py
web/api_server.py— OpenAI-compatible API server (production)web/app.py— Streamlit UI for interactive use
# Use with OpenAI Python SDK
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8711/v1",
api_key="not-needed" # No API key required!
)
response = client.chat.completions.create(
model="localsqlagent",
messages=[
{"role": "user", "content": "Find top customers by revenue"}
]
)
print(response.choices[0].message.content)Launch with: make api-server or python web/api_server.py
Integration testing guide: docs/testing.md
Optional: pass database config and execution policy:
import requests
payload = {
"model": "localsqlagent",
"query_mode": "explore", # return schema overview instead of executing SQL
"db_config": {
"type": "postgresql",
"host": "localhost",
"port": 5432,
"database": "benchmark",
"user": "text2sql",
"password": "text2sql123"
},
"execution_policy": {
"read_only": True,
"default_limit": 10000
},
"schema_options": {
"schemas": "public",
"include_samples": False,
"include_row_counts": False
},
"messages": [
{"role": "user", "content": "Find top customers by revenue"}
]
}
response = requests.post("http://localhost:8711/v1/chat/completions", json=payload)
print(response.json()["choices"][0]["message"]["content"])LLM settings can be set in ~/.tokligence/llm_config.json (or env vars like OLLAMA_TEMPERATURE) and optionally overridden per request via db_config keys like temperature and max_tokens.
- Automatic database structure discovery - No hardcoded schemas
- Field meaning inference - Based on field names and sample data
- Relationship detection - Automatically identifies table relationships
- Intelligent ambiguous expression detection - "recent", "popular", etc.
- False positive control - Multi-layer validation; tune thresholds for your data
- Interactive clarification - Proactively asks for user intent
- Adaptive strategy selection - Chooses optimal strategy based on query complexity
- Error recovery - Multiple attempts, learns from errors
- Cache optimization - Intelligent caching for faster responses
- Execution guardrails - Read-only default, optional DDL/DML/Admin, LIMIT injection
All services are configured to use host network mode for optimal performance:
# docker-compose.yml configuration
services:
webui:
network_mode: host # Direct host network access
api:
network_mode: host # No port mapping needed
postgres:
network_mode: host # Runs on localhost:5432
mysql:
network_mode: host # Runs on localhost:3306Benefits of Host Network Mode:
- ✅ Better Performance - No network translation overhead
- ✅ Simpler Configuration - No complex port mappings
- ✅ Direct Access - Services accessible on localhost
- ✅ Database Compatibility - Works seamlessly with local Ollama
For development and testing, use Python virtual environment:
make venv-setup # Creates isolated Python environment
source venv/bin/activate # Activate the environment
make web-ui # All commands use venv automaticallyLocalSQLAgent/
├── src/ # Core source code
│ ├── core/ # Core modules
│ │ ├── ambiguity_detection.py # Ambiguity detection
│ │ ├── intelligent_agent.py # Intelligent agent
│ │ └── schema_discovery.py # Schema discovery
│ ├── agents/ # Agent implementations
│ └── mongodb/ # MongoDB specific
├── benchmarks/ # Benchmark tests
├── examples/ # Example code
├── tests/ # Test suite
├── docs/ # Documentation
├── docker-compose.yml # Database containers
└── Makefile # Easy setup commands
from src.core.intelligent_agent import IntelligentSQLAgent
# Initialize agent
agent = IntelligentSQLAgent(
model_name="qwen2.5-coder:7b",
db_config={"type": "postgresql", ...},
mcp_server="http://localhost:8080" # Optional
)
# Execute query
result = agent.execute_query("Find VIP customers with recent purchases")# Adjust ambiguity detection threshold
detector = AmbiguityDetector(confidence_threshold=0.8)
# Use more retries for hard queries
agent = IntelligentSQLAgent(
model_name="qwen2.5-coder:7b",
db_config={"type": "postgresql", ...},
max_attempts=7
)LocalSQLAgent is a native Ollama ecosystem application:
# Supports all Ollama models
ollama pull qwen2.5-coder:7b # Recommended: Best results
ollama pull deepseek-coder:6.7b # Alternative: Also good
ollama pull sqlcoder:7b # Specialized: SQL-specific model
# Switch models with one line
python benchmarks/sql_benchmark.py --model ollama:deepseek-coder:6.7b- 🧠 Intelligent Agent Strategy - Not just single generation, but multiple attempts and learning like humans
- 🔍 Ambiguity Detection - First Text2SQL system with integrated ambiguity detection
- 📊 Dynamic Schema - Real-time database structure analysis, no manual configuration
- 🏠 Pure Local Execution - Fully localized deployment using Ollama
- 💰 Zero Operating Cost - No API fees, one-time deployment for permanent use
Issues and Pull Requests are welcome! We especially welcome:
- New database adapters
- More Ollama model testing
- Enterprise feature requests
- Performance optimizations
- Documentation improvements
MIT License - See LICENSE for details
- Ollama Team - Excellent local model deployment solution
- Qwen Team - Outstanding Qwen2.5-Coder model
- Open Source Community - Thanks to all contributors
🏢 Developed by Tokligence
Building intelligent tools for the local AI ecosystem
🌟 If this project helps you, please give us a Star!
Tags: #text-to-sql #ollama #local-llm #qwen-coder #zero-cost #privacy-first #sql-agent #mongodb