Skip to content

dhruvpuri/Agentic-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agentic RAG System with LangGraph

An intelligent conversational AI system that enables natural language querying of SQL databases with multi-format output generation. Built using LangGraph for workflow orchestration and Azure OpenAI for natural language processing.

🎯 Key Capabilities

  • Natural Language to SQL: Convert plain English queries to optimized SQL
  • Multi-format Output: Automatic generation of summaries, tables, and visualizations
  • Intelligent Routing: Smart detection of query intent and optimal response format
  • Production Ready: Robust error handling, logging, and performance optimization

✨ Features

  • Natural Language Processing: Convert plain English to optimized SQL queries
  • Multi-format Output: Automatic generation of summaries, tables, and visualizations
  • Intelligent Routing: Smart detection of query intent and optimal output format
  • Database Optimization: Single connection reuse for improved performance
  • Comprehensive Analysis: Support for complex multi-part analytical queries
  • Data Visualization: Automatic chart generation with matplotlib/seaborn
  • Robust Error Handling: Timeout management and query validation
  • Extensive Logging: Complete execution tracking and debugging support

Setup

Prerequisites

  • Your own CSV or Excel data file with business/sales data
  • Recommended columns: date, customer_id, order_value, category, product_name, etc.
  • Azure OpenAI API access

1. Create Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt

Or install in development mode:

pip install -e .

3. Environment Configuration

Copy the example environment file and configure your settings:

cp .env.example .env

Edit .env with your Azure OpenAI credentials:

AZURE_OPENAI_API_KEY=your-azure-openai-api-key-here
AZURE_OPENAI_BASE_URL=https://YOUR-RESOURCE-NAME.openai.azure.com/
AZURE_OPENAI_MODEL=gpt-5-preview

4. Verify Installation

Run the setup validation script:

python validate_setup.py

This will check your environment configuration and API connectivity.

5. Data Setup

Prepare Your Data:

  1. Place your CSV or Excel file in the project root directory
  2. Ensure your data has columns like: date, customer_id, order_value, category, etc.
  3. Update the file path in import_data.py if needed

Import Your Data:

python import_data.py

This script will:

  • Read your CSV/Excel file
  • Create a SQLite database with appropriate schema
  • Import and structure your data for querying

6. Project Structure

β”œβ”€β”€ langgraph_sql_agent/           # Core system implementation
β”‚   β”œβ”€β”€ core/                      # Workflow orchestration
β”‚   β”œβ”€β”€ database/                  # Database management
β”‚   β”œβ”€β”€ llm/                       # AI model integration
β”‚   β”œβ”€β”€ nodes/                     # Processing components
β”‚   β”œβ”€β”€ output/                    # Generated visualizations
β”‚   └── utils/                     # Configuration and utilities
β”œβ”€β”€ main.py                        # Testing and demonstration script
β”œβ”€β”€ interactive_query.py           # Interactive CLI interface
β”œβ”€β”€ import_data.py                 # Database setup script
β”œβ”€β”€ validate_setup.py              # Environment validation
β”œβ”€β”€ requirements.txt               # Python dependencies
β”œβ”€β”€ your_data.csv                  # Your CSV/Excel data (user provided)
β”œβ”€β”€ database.db                    # Generated SQLite database
β”œβ”€β”€ README.md                      # Project documentation
β”œβ”€β”€ PORTFOLIO.md                   # Portfolio overview
└── PROJECT_REFERENCE_GUIDE.md     # Technical reference

πŸš€ Usage

Quick Start

Run the main application:

python main.py

Interactive Query Mode

For interactive querying:

python interactive_query.py

Running Tests

Execute the comprehensive test suite:

python test_optimized_6_prompts.py

Example Queries

The system supports various types of natural language queries (adapt to your data):

Simple Summaries:

"What was the total sales for this year?"
"How many records are in the database?"

Data Tables:

"Show me the top 10 customers by value"
"List all items by category"

Visualizations:

"Generate a trend plot over time"
"Create a bar chart by category"

Complex Multi-format Analysis:

"Generate a comprehensive analysis with charts and tables"
"Analyze patterns including visualizations and summaries"

Output Formats

The system automatically detects the best output format:

  • Summary: Text-based analysis and insights
  • Table: Structured data in tabular format
  • Plot: Visual charts and graphs (PNG files saved to langgraph_sql_agent/output/)
  • Multi: Combination of summary, table, and visualization

πŸ“‹ Requirements

  • Python: 3.9+ (tested with 3.10)
  • Database: SQLite (default) or PostgreSQL
  • API Access: Azure OpenAI GPT-5 (configured in .env)
  • Memory: Minimum 4GB RAM recommended
  • Storage: ~100MB for dependencies + data

πŸ”§ Configuration

Environment Variables

Key configuration options in .env:

# Azure OpenAI (Required)
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_BASE_URL=https://your-resource.openai.azure.com/
AZURE_OPENAI_MODEL=gpt-5-preview

# Performance Tuning
MAX_QUERY_TIMEOUT=300  # Increased for complex queries
MAX_RESULT_ROWS=10000

# Database
DATABASE_URL=sqlite:///database.db

Performance Optimizations

The system includes several optimizations:

  • Connection Reuse: Single database connection for all queries
  • Timeout Management: Extended timeouts for complex analysis
  • Quote Normalization: Handles Unicode smart quotes from GPT-5
  • SQL Validation: Prevents ORDER BY syntax errors in UNION queries

πŸ§ͺ Testing

Run the comprehensive test suite:

python main.py

This will execute various test queries and generate:

  • Performance metrics and success rates
  • Sample outputs in multiple formats (text, tables, charts)
  • Detailed execution logs and metadata

All test outputs are saved to test_results/ directory.

πŸ› οΈ Development

Project Structure

The system uses a modular LangGraph workflow:

  1. Intent Parser: Analyzes query intent and requirements
  2. SQL Generator: Creates optimized SQL queries
  3. Database Executor: Executes queries with connection reuse
  4. Output Router: Determines optimal output format(s)
  5. Format Generators: Creates summaries, tables, and visualizations
  6. Multi-output Coordinator: Manages complex multi-format responses

Adding New Features

To extend functionality:

  1. Add new nodes in langgraph_sql_agent/nodes/
  2. Update the workflow in langgraph_sql_agent/core/workflow.py
  3. Add tests in test_optimized_6_prompts.py
  4. Update configuration in langgraph_sql_agent/utils/config.py

πŸ“Š Performance

The system is optimized for production use with:

  • Database Connection Reuse: 90%+ performance improvement
  • Query Optimization: Intelligent SQL generation and validation
  • Memory Efficient: Streaming results for large datasets
  • Error Recovery: Graceful handling of edge cases and timeouts

About

An intelligent conversational AI system with chat context that converts natural language queries into SQL operations and generates multi-format outputs (text summaries, data tables, and visualizations) using advanced workflow orchestration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages