An intelligent conversational AI system that enables natural language querying of SQL databases with multi-format output generation. Built using LangGraph for workflow orchestration and Azure OpenAI for natural language processing.
- Natural Language to SQL: Convert plain English queries to optimized SQL
- Multi-format Output: Automatic generation of summaries, tables, and visualizations
- Intelligent Routing: Smart detection of query intent and optimal response format
- Production Ready: Robust error handling, logging, and performance optimization
- Natural Language Processing: Convert plain English to optimized SQL queries
- Multi-format Output: Automatic generation of summaries, tables, and visualizations
- Intelligent Routing: Smart detection of query intent and optimal output format
- Database Optimization: Single connection reuse for improved performance
- Comprehensive Analysis: Support for complex multi-part analytical queries
- Data Visualization: Automatic chart generation with matplotlib/seaborn
- Robust Error Handling: Timeout management and query validation
- Extensive Logging: Complete execution tracking and debugging support
- Your own CSV or Excel data file with business/sales data
- Recommended columns: date, customer_id, order_value, category, product_name, etc.
- Azure OpenAI API access
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtOr install in development mode:
pip install -e .Copy the example environment file and configure your settings:
cp .env.example .envEdit .env with your Azure OpenAI credentials:
AZURE_OPENAI_API_KEY=your-azure-openai-api-key-here
AZURE_OPENAI_BASE_URL=https://YOUR-RESOURCE-NAME.openai.azure.com/
AZURE_OPENAI_MODEL=gpt-5-previewRun the setup validation script:
python validate_setup.pyThis will check your environment configuration and API connectivity.
Prepare Your Data:
- Place your CSV or Excel file in the project root directory
- Ensure your data has columns like:
date,customer_id,order_value,category, etc. - Update the file path in
import_data.pyif needed
Import Your Data:
python import_data.pyThis script will:
- Read your CSV/Excel file
- Create a SQLite database with appropriate schema
- Import and structure your data for querying
βββ langgraph_sql_agent/ # Core system implementation
β βββ core/ # Workflow orchestration
β βββ database/ # Database management
β βββ llm/ # AI model integration
β βββ nodes/ # Processing components
β βββ output/ # Generated visualizations
β βββ utils/ # Configuration and utilities
βββ main.py # Testing and demonstration script
βββ interactive_query.py # Interactive CLI interface
βββ import_data.py # Database setup script
βββ validate_setup.py # Environment validation
βββ requirements.txt # Python dependencies
βββ your_data.csv # Your CSV/Excel data (user provided)
βββ database.db # Generated SQLite database
βββ README.md # Project documentation
βββ PORTFOLIO.md # Portfolio overview
βββ PROJECT_REFERENCE_GUIDE.md # Technical reference
Run the main application:
python main.pyFor interactive querying:
python interactive_query.pyExecute the comprehensive test suite:
python test_optimized_6_prompts.pyThe system supports various types of natural language queries (adapt to your data):
Simple Summaries:
"What was the total sales for this year?"
"How many records are in the database?"
Data Tables:
"Show me the top 10 customers by value"
"List all items by category"
Visualizations:
"Generate a trend plot over time"
"Create a bar chart by category"
Complex Multi-format Analysis:
"Generate a comprehensive analysis with charts and tables"
"Analyze patterns including visualizations and summaries"
The system automatically detects the best output format:
- Summary: Text-based analysis and insights
- Table: Structured data in tabular format
- Plot: Visual charts and graphs (PNG files saved to
langgraph_sql_agent/output/) - Multi: Combination of summary, table, and visualization
- Python: 3.9+ (tested with 3.10)
- Database: SQLite (default) or PostgreSQL
- API Access: Azure OpenAI GPT-5 (configured in
.env) - Memory: Minimum 4GB RAM recommended
- Storage: ~100MB for dependencies + data
Key configuration options in .env:
# Azure OpenAI (Required)
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_BASE_URL=https://your-resource.openai.azure.com/
AZURE_OPENAI_MODEL=gpt-5-preview
# Performance Tuning
MAX_QUERY_TIMEOUT=300 # Increased for complex queries
MAX_RESULT_ROWS=10000
# Database
DATABASE_URL=sqlite:///database.dbThe system includes several optimizations:
- Connection Reuse: Single database connection for all queries
- Timeout Management: Extended timeouts for complex analysis
- Quote Normalization: Handles Unicode smart quotes from GPT-5
- SQL Validation: Prevents ORDER BY syntax errors in UNION queries
Run the comprehensive test suite:
python main.pyThis will execute various test queries and generate:
- Performance metrics and success rates
- Sample outputs in multiple formats (text, tables, charts)
- Detailed execution logs and metadata
All test outputs are saved to test_results/ directory.
The system uses a modular LangGraph workflow:
- Intent Parser: Analyzes query intent and requirements
- SQL Generator: Creates optimized SQL queries
- Database Executor: Executes queries with connection reuse
- Output Router: Determines optimal output format(s)
- Format Generators: Creates summaries, tables, and visualizations
- Multi-output Coordinator: Manages complex multi-format responses
To extend functionality:
- Add new nodes in
langgraph_sql_agent/nodes/ - Update the workflow in
langgraph_sql_agent/core/workflow.py - Add tests in
test_optimized_6_prompts.py - Update configuration in
langgraph_sql_agent/utils/config.py
The system is optimized for production use with:
- Database Connection Reuse: 90%+ performance improvement
- Query Optimization: Intelligent SQL generation and validation
- Memory Efficient: Streaming results for large datasets
- Error Recovery: Graceful handling of edge cases and timeouts