This project extends the TAP AI Frappe application with a powerful, conversational AI layer. It provides a single, robust API endpoint that can understand user questions and intelligently route the user's request to the appropriate execution engine.
The system is designed for multi-turn conversations, automatically managing chat history to understand follow-up questions. It features asynchronous processing via RabbitMQ workers, voice input/output support, and robust fallback mechanisms.
- Project Overview
- Core Architecture
- System Workflow
- Complete Codebase Structure
- Dependencies
- Installation
- Configuration
- One-Time Setup
- Testing
- API Documentation
- Worker System
- Core File Descriptions
- Telegram Bot Demo
- Deployment Guide
- Troubleshooting
TAP AI is a conversational AI engine built on top of the Frappe framework. It intelligently routes user queries to specialized execution engines:
- Text-to-SQL Engine: For factual, database-specific queries
- Vector RAG Engine: For conceptual, semantic, and summarization queries
- RabbitMQ Worker Architecture: Asynchronous processing for scalability
- Voice Processing: STT -> LLM -> TTS pipeline for voice queries
Key Features:
- Intelligent routing using LLMs
- Multi-turn conversation support with history management
- Hybrid query execution (SQL + Vector Search)
- Automatic fallback mechanisms
- Telegram bot integration
- Rate limiting and authentication built-in
- Voice input/output support via Telegram
- Asynchronous processing with RabbitMQ
- Dynamic configuration for TAP LMS integration
- Admin-controlled DocType exclusion system
Technology Stack:
- Backend: Python 3.10+
- Framework: Frappe (ERPNext)
- LLM: OpenAI GPT models
- Vector DB: Pinecone
- Database: MariaDB/MySQL
- Message Queue: RabbitMQ (Pika)
- Caching: Redis
- Web Framework: Flask (for Telegram webhooks)
- ORM: SQLAlchemy
Language Composition:
- Python: 99.6%
- JavaScript: 0.4%
The system's intelligence lies in its central router, which acts as a decision-making brain. When a query is received, it follows this flow:
- Intelligent Routing: An LLM analyzes the user's query to determine its intent.
- Tool Selection:
- For factual, specific questions (e.g., "list all...", "how many..."), it selects the Text-to-SQL Engine.
- For conceptual, open-ended, or summarization questions (e.g., "summarize...", "explain..."), it selects the Vector RAG Engine.
- Execution & Fallback: The chosen engine executes the query. If it fails to produce a satisfactory answer, the system automatically falls back to the Vector RAG engine as a safety net.
- Answer Synthesis: The retrieved data is passed to an LLM, which generates a final, human-readable answer.
graph TD
subgraph "User Input"
User[User Query]
end
subgraph "API Layer"
QueryAPI["api/query.py<br><b>Text Query API</b>"]
VoiceQueryAPI["api/voice_query.py<br><b>Voice Query API</b>"]
end
subgraph "Message Queue"
RabbitMQ["RabbitMQ<br>Message Broker"]
end
subgraph "Worker Processes"
STTWorker["workers/stt_worker.py<br><b>Speech-to-Text</b>"]
LLMWorker["workers/llm_worker.py<br><b>LLM Router</b>"]
TTSWorker["workers/tts_worker.py<br><b>Text-to-Speech</b>"]
end
subgraph "Services"
Router["services/router.py<br><b>Intelligent Router</b>"]
SQL["services/sql_answerer.py<br><b>SQL Engine</b>"]
RAG["services/rag_answerer.py<br><b>RAG Engine</b>"]
end
subgraph "Data Layer"
MariaDB[(Frappe<br>MariaDB)]
PineconeDB[(Pinecone<br>Vector DB)]
end
User -->|Text| QueryAPI
User -->|Voice| VoiceQueryAPI
QueryAPI -->|Request| RabbitMQ
VoiceQueryAPI -->|Request| RabbitMQ
RabbitMQ -->|audio_stt_queue| STTWorker
RabbitMQ -->|text_query_queue| LLMWorker
RabbitMQ -->|audio_tts_queue| TTSWorker
STTWorker -->|Transcribed Text| RabbitMQ
LLMWorker -->|Route Query| Router
Router -->|Factual| SQL
Router -->|Conceptual| RAG
SQL -->|SQL Query| MariaDB
RAG -->|Vector Search| PineconeDB
LLMWorker -->|Answer| TTSWorker
TTSWorker -->|Audio File| MariaDB
The robustness of the system comes from the specialized design of each engine.
This engine excels at factual queries because it builds an "intelligent schema" before prompting the LLM.
graph TD
A[User Query] --> B["1. Inspect Live Frappe Metadata"]
B --> C["2. Create Rich Schema Prompt"]
C --> D{LLM: Generate SQL}
D --> E[MariaDB]
E --> F[Structured Data Rows]
This engine excels at conceptual queries by retrieving semantically relevant documents.
graph TD
A[User Query + Chat History] --> B{LLM: Refine Query}
B --> C["1. Select DocTypes"]
C --> D["2. Semantic Search"]
D --> E["3. Fetch Full Text"]
E --> F[Rich Context Chunks]
.
├── .editorconfig
├── .eslintrc
├── .gitignore
├── .pre-commit-config.yaml
├── README.md
├── __init__.py
├── license.txt
├── pyproject.toml
├── requirements.txt
├── telegram_webhook.py
├── test_remote_connection.py
└── tap_ai/
├── __init__.py
├── hooks.py
├── modules.txt
├── patches.txt
├── test_remote_db.py
├── api/
│ ├── __init__.py
│ ├── query.py
│ ├── result.py
│ ├── voice_query.py
│ └── voice_result.py
├── config/
│ └── __init__.py
├── infra/
│ ├── config.py
│ ├── llm_client.py
│ ├── schema.py
│ └── sql_catalog.py
├── public/
│ └── .gitkeep
├── schema/
│ ├── __init__.py
│ ├── generate_schema.py
│ ├── list_system_doctypes.py
│ └── tap_ai_schema.json
├── services/
│ ├── __init__.py
│ ├── doctype_selector.py
│ ├── pinecone_index.py
│ ├── pinecone_store.py
│ ├── rag_answerer.py
│ ├── ratelimit.py
│ ├── router.py
│ └── sql_answerer.py
├── tap_ai/
│ ├── __init__.py
│ └── doctype/
│ ├── __init__.py
│ ├── ai_integration_config/
│ │ ├── __init__.py
│ │ ├── ai_integration_config.js
│ │ ├── ai_integration_config.json
│ │ ├── ai_integration_config.py
│ │ └── test_ai_integration_config.py
│ ├── ai_knowledge_base/
│ │ ├── __init__.py
│ │ ├── ai_knowledge_base.js
│ │ ├── ai_knowledge_base.json
│ │ ├── ai_knowledge_base.py
│ │ └── test_ai_knowledge_base.py
│ ├── doctype_list/
│ │ ├── __init__.py
│ │ ├── doctype_list.json
│ │ └── doctype_list.py
│ └── excludeddoctypes/
│ ├── __init__.py
│ ├── excludeddoctypes.js
│ ├── excludeddoctypes.json
│ ├── excludeddoctypes.py
│ └── test_excludeddoctypes.py
├── templates/
│ ├── __init__.py
│ └── pages/
│ └── __init__.py
├── utils/
│ ├── __init__.py
│ ├── dynamic_config.py
│ ├── mq.py
│ └── remote_db.py
└── workers/
├── llm_worker.py
├── stt_worker.py
└── tts_worker.py
pymysql>=1.1.1- MySQL database driversqlalchemy>=2.0.32- SQL toolkit and ORMsqlalchemy-utils>=0.41.2- SQLAlchemy utility functions
openai>=1.40.0- OpenAI API client (GPT, Whisper, TTS)langchain>=0.3.0- LLM frameworklangchain-community>=0.3.0- LangChain integrationslangchain-openai>=0.1.17- LangChain OpenAI integrationtiktoken>=0.7.0- Token counting for OpenAI
pinecone- Pinecone vector database client
pika- RabbitMQ client for async processing
numpy>=1.26.4- Numerical computing
redis>=5.0.8- Redis client for caching and rate limiting
python-dotenv>=1.0.1- Environment variable loadingpydantic>=2.8.2- Data validationloguru>=0.7.2- Enhanced loggingtenacity>=9.0.0- Retry library
Flask- Web framework for webhookspython-telegram-bot- Telegram bot libraryrequests- HTTP client library
pytest>=8.3.2- Testing frameworkhttpx>=0.27.2- Async HTTP client
Frappe~=15.0+- Installed via bench (not in requirements.txt)
- Python 3.10+
- Frappe bench installed
- MariaDB/MySQL server running
- RabbitMQ broker running
- Redis server running
- Pinecone account (for Vector RAG)
- OpenAI API key
# Get the app
bench get-app tap_ai https://github.com/theapprenticeproject/Ai.git
# Install on site
bench --site <site-name> install-app tap_ai# Install all required packages
bench pip install -r apps/tap_ai/requirements.txt
# Or install key packages individually
bench pip install langchain-openai pinecone pymysql pika redis# RabbitMQ (macOS)
brew install rabbitmq
# RabbitMQ (Ubuntu)
sudo apt-get install rabbitmq-server
# Redis (macOS)
brew install redis
# Redis (Ubuntu)
sudo apt-get install redis-server
# Start services
brew services start rabbitmq-server
brew services start redis-servercd apps/tap_ai
pre-commit installEdit your site's site_config.json file and add:
{
"openai_api_key": "sk-your-openai-key-here",
"primary_llm_model": "gpt-4o-mini",
"embedding_model": "text-embedding-3-small",
"pinecone_api_key": "pcn-your-pinecone-key-here",
"pinecone_index": "tap-ai-byo",
"rabbitmq_url": "amqp://guest:guest@localhost:5672/",
"redis_host": "localhost",
"redis_port": 6379,
"redis_db": 0,
"max_context_length": 2048,
"vector_search_k": 5,
"max_response_tokens": 500
}| Key | Type | Purpose | Default |
|---|---|---|---|
openai_api_key |
string | OpenAI API authentication | Required |
primary_llm_model |
string | Primary LLM for routing | gpt-4o-mini |
embedding_model |
string | Model for embeddings | text-embedding-3-small |
pinecone_api_key |
string | Pinecone authentication | Required |
pinecone_index |
string | Pinecone index name | tap-ai-byo |
rabbitmq_url |
string | RabbitMQ connection URL | amqp://guest:guest@localhost:5672/ |
redis_host |
string | Redis hostname | localhost |
redis_port |
int | Redis port | 6379 |
redis_db |
int | Redis database number | 0 |
max_context_length |
int | Max LLM context tokens | 2048 |
vector_search_k |
int | Top-K vectors for RAG | 5 |
max_response_tokens |
int | Max response tokens | 500 |
Create .env file in frappe-bench:
OPENAI_API_KEY=sk-your-key
PINECONE_API_KEY=pcn-your-key
RABBITMQ_URL=amqp://guest:guest@localhost:5672/bench execute tap_ai.schema.generate_schema.cli_populate_excludedbench execute tap_ai.schema.generate_schema.cliThis creates tap_ai_schema.json needed by SQL and RAG engines.
bench execute tap_ai.services.pinecone_index.cli_ensure_indexbench execute tap_ai.services.pinecone_store.cli_upsert_all# Basic text query
curl -X POST "http://localhost:8000/api/method/tap_ai.api.query.query" \
-H "Content-Type: application/json" \
-d '{"q": "List all courses", "user_id": "test_user"}'
# Response
{"request_id": "REQ_a1b2c3d4"}
# Poll for result
curl "http://localhost:8000/api/method/tap_ai.api.result.result?request_id=REQ_a1b2c3d4"# Initiate voice query
curl -X POST "http://localhost:8000/api/method/tap_ai.api.voice_query.voice_query" \
-H "Content-Type: application/json" \
-d '{"audio_url": "https://example.com/audio.mp3", "user_id": "test_user"}'
# Response
{"request_id": "VREQ_x1y2z3w4"}
# Poll voice result
curl "http://localhost:8000/api/method/tap_ai.api.voice_result.voice_result?request_id=VREQ_x1y2z3w4"In separate terminal windows:
# Worker 1: LLM Worker
cd frappe-bench
bench execute tap_ai.workers.llm_worker.start
# Worker 2: STT Worker
bench execute tap_ai.workers.stt_worker.start
# Worker 3: TTS Worker
bench execute tap_ai.workers.tts_worker.startPOST /api/method/tap_ai.api.query.query
Request body:
{
"q": "Your question here",
"user_id": "unique_user_identifier"
}Response:
{
"request_id": "REQ_abc12345"
}GET /api/method/tap_ai.api.result.result?request_id=REQ_abc12345
Response (pending):
{
"status": "pending",
"query": "Your question"
}Response (success):
{
"status": "success",
"answer": "The answer to your question...",
"query": "Your question",
"history": [...],
"metadata": {...}
}POST /api/method/tap_ai.api.voice_query.voice_query
Request body:
{
"audio_url": "https://example.com/audio.mp3",
"user_id": "unique_user_identifier"
}Response:
{
"request_id": "VREQ_xyz98765"
}GET /api/method/tap_ai.api.voice_result.voice_result?request_id=VREQ_xyz98765
Response (success):
{
"status": "success",
"transcribed_text": "What is the first course?",
"answer_text": "The first course is...",
"audio_url": "/files/output_file.mp3",
"language": "en"
}The system uses RabbitMQ for asynchronous processing. Three workers handle different tasks:
- Pulls text queries from
text_query_queue - Runs the router to choose between SQL and RAG
- Manages conversation history
- Routes voice queries to TTS worker
- Updates request status in Redis cache
Start with:
bench execute tap_ai.workers.llm_worker.start- Pulls voice requests from
audio_stt_queue - Downloads audio from provided URL
- Uses Whisper API to transcribe
- Detects language of transcription
- Routes transcribed text to LLM worker
Start with:
bench execute tap_ai.workers.stt_worker.start- Pulls synthesization jobs from
audio_tts_queue - Uses OpenAI TTS to generate speech
- Saves audio file to Frappe File Manager
- Returns audio URL and marks request as complete
Start with:
bench execute tap_ai.workers.tts_worker.starttap_ai/api/query.py
- Text query entry point
- Rate limiting check
- Publishes to RabbitMQ
text_query_queue - Returns request_id for polling
tap_ai/api/result.py
- Polls for text query result
- Retrieves from Redis cache
tap_ai/api/voice_query.py
- Voice query entry point
- Publishes to RabbitMQ
audio_stt_queue - Returns request_id for polling
tap_ai/api/voice_result.py
- Polls voice results
- Handles TTS generation on-demand
- Returns audio URL when ready
tap_ai/services/router.py
- Central query routing logic
- Chooses between SQL and RAG engines
- Manages fallback logic
- Handles chat history
tap_ai/services/sql_answerer.py
- Generates SQL from natural language
- Builds intelligent schema for LLM
- Executes queries against MariaDB
- Returns structured data
tap_ai/services/rag_answerer.py
- Retrieves semantically similar documents
- Refines queries with chat history
- Synthesizes answers from context
- Handles multi-turn conversations
tap_ai/services/doctype_selector.py
- Selects relevant DocTypes for RAG
- Reduces search space
- Improves retrieval accuracy
tap_ai/services/pinecone_store.py
- Manages Pinecone interactions
- Upserts documents with embeddings
- Performs semantic search
tap_ai/services/ratelimit.py
- Enforces API rate limits
- Uses Redis for distributed counting
- Tracks requests per user
tap_ai/workers/llm_worker.py
- Main processing worker
- Routes queries through the dual-engine system
- Manages conversation context
- Bridges text and voice pipelines
tap_ai/workers/stt_worker.py
- Speech-to-Text processing
- Audio download and handling
- Language detection
- Whisper API integration
tap_ai/workers/tts_worker.py
- Text-to-Speech synthesis
- OpenAI TTS integration
- Frappe File Manager integration
- Audio file management
tap_ai/utils/dynamic_config.py
- Decouples TAP AI from TAP LMS schema changes
- Handles dynamic DocType mapping
- Manages user profiles with enrollment data
- Singleton pattern for configuration caching
- Validation and context resolution rules
tap_ai/utils/mq.py
- RabbitMQ publisher
- Queue declaration and management
- Persistent message delivery
tap_ai/utils/remote_db.py
- Remote MariaDB/MySQL connection helper for running queries outside the local Frappe DB
tap_ai/infra/config.py
- Centralized configuration loader
- Frappe integration with fallbacks
- Works both inside Frappe and standalone
- Service status validation
tap_ai/infra/llm_client.py
- Centralized LLM client wrapper used by the services layer
tap_ai/infra/schema.py
- Schema loading helpers used by the SQL/Text-to-SQL pipeline
tap_ai/schema/list_system_doctypes.py
- Lists DocTypes classified as system DocTypes
tap_ai/schema/generate_schema.py
- Fetches DocType metadata from a remote database
- Uses the ExcludedDoctypes DocType to exclude DocTypes from the generated schema
- Writes the generated schema to
tap_ai/schema/tap_ai_schema.json
tap_ai/test_remote_db.py
- Tests for the remote database connection utilities
test_remote_connection.py
- Root-level script for testing remote DB connectivity
tap_ai/tap_ai/doctype/ai_integration_config/
ai_integration_config.json: DocType definitionai_integration_config.py: DocType server logicai_integration_config.js: DocType client scripttest_ai_integration_config.py: DocType tests
tap_ai/tap_ai/doctype/ai_knowledge_base/
ai_knowledge_base.json: DocType definitionai_knowledge_base.py: DocType server logicai_knowledge_base.js: DocType client scripttest_ai_knowledge_base.py: DocType tests
tap_ai/tap_ai/doctype/doctype_list/
doctype_list.json: DocType definitiondoctype_list.py: DocType server logic
tap_ai/tap_ai/doctype/excludeddoctypes/
excludeddoctypes.json: DocType definitionexcludeddoctypes.py: DocType server logicexcludeddoctypes.js: DocType client scripttest_excludeddoctypes.py: DocType tests
User -> Telegram -> Ngrok -> telegram_webhook.py -> Frappe API -> AI Engine
- Telegram account
- Ngrok installed and authenticated
- Frappe bench running
- Search for
@BotFatheron Telegram - Send
/newbot - Follow instructions
- Copy the bot token (e.g.,
123456:ABC-DEF1234)
ngrok config add-authtoken <your-ngrok-token>
ngrok http 5000Copy the HTTPS forwarding URL (e.g., https://random-string.ngrok-free.app)
# Install dependencies
bench pip install Flask python-telegram-bot requests
# Edit telegram_webhook.py and set:
# - TELEGRAM_BOT_TOKEN
# - FRAPPE_API_URL
# - FRAPPE_API_KEY
# - FRAPPE_API_SECRET
# - OPENAI_API_KEY
# Run the bridge
python apps/tap_ai/telegram_webhook.pycurl -F "url=https://<NGROK_URL>/webhook" \
"https://api.telegram.org/bot<BOT_TOKEN>/setWebhook"Open Telegram and start a conversation with your bot.
# Terminal 1: Frappe
bench start
# Terminal 2: LLM Worker
bench execute tap_ai.workers.llm_worker.start
# Terminal 3: STT Worker
bench execute tap_ai.workers.stt_worker.start
# Terminal 4: TTS Worker
bench execute tap_ai.workers.tts_worker.start
# Terminal 5: Ngrok (optional for Telegram)
grok http 5000Use Supervisor or systemd for worker management:
# /etc/supervisor/conf.d/tap-ai-workers.conf
[program:tap-ai-llm]
command=bench execute tap_ai.workers.llm_worker.start
directory=/opt/frappe-bench
autostart=true
autorestart=true
[program:tap-ai-stt]
command=bench execute tap_ai.workers.stt_worker.start
directory=/opt/frappe-bench
autostart=true
autorestart=true
[program:tap-ai-tts]
command=bench execute tap_ai.workers.tts_worker.start
directory=/opt/frappe-bench
autostart=true
autorestart=true
autorestart=true# Check site_config.json
cat sites/<site-name>/site_config.json | grep openai_api_key
# Or check env vars
echo $OPENAI_API_KEY# Check if RabbitMQ is running
brew services list | grep rabbitmq
# Or check status
rabbitmqctl status
# Start if not running
brew services start rabbitmq-server# Recreate index
bench execute tap_ai.services.pinecone_index.cli_ensure_index
# Upsert data
bench execute tap_ai.services.pinecone_store.cli_upsert_all# Check RabbitMQ queues
rabbitmqctl list_queues
# Check Redis connection
redis-cli PING
# Check Frappe logs
tail -f frappe-bench/logs/frappe.logThis project is licensed under the terms specified in license.txt.
Last Updated: 2026-04-12 15:29:33 Version: 2.0.0 Author: Anish Aman Repository: theapprenticeproject/Ai