Live Demo: https://kvbgkvw4mehwhhdjt7crrg.streamlit.app/
LexTransition AI is an open-source, offline-first legal assistant. It helps users navigate the transition from old Indian laws (IPC/CrPC/IEA) to the new BNS/BNSS/BSA frameworks. Using local Machine Learning and OCR, it analyzes legal documents and maps law sections with 100% grounded accuracy.
LexTransition AI is an open-source, offline-first legal assistant. It helps users navigate the transition from old Indian laws (IPC/CrPC/IEA) to the new BNS/BNSS/BSA frameworks. Using local Machine Learning and OCR, it analyzes legal documents and maps law sections with 100% grounded accuracy.
- π Intelligent Law Mapper: Maps old IPC sections to new BNS equivalents. Uses an LLM to highlight specific changes in wording, penalties, and scope.
- πΌοΈ Multimodal OCR Analysis: Upload photos of legal notices or FIRs. The system extracts text using local OCR and generates actionable summaries.
- π Grounded Fact-Checking (RAG): Ask legal questions and get answers backed by official citations. The AI identifies the exact Section and Page from uploaded Law PDFs to prevent hallucinations.
- ποΈ Environment-Aware Voice Agent: Features high-fidelity offline TTS (Piper) with an automatic, lightweight cloud fallback (gTTS) to ensure seamless audio playback on headless platforms like Streamlit Cloud.
To ensure privacy and offline accessibility, this project can be configured to run without external APIs:
- Frontend: Streamlit
- Backend: Python, LangChain/LlamaIndex.
- Local LLM Engine: Ollama (Llama 3 / Mistral)
- Voice / TTS: Piper TTS (ONNX models)
- OCR Engine: EasyOCR / PyTesseract
- Vector Database (RAG): FAISS + Sentence-Transformers
LexTransition-AI/
βββ .github/
β βββ workflows/
β βββ lextransition-ci.yml # GitHub Actions CI/CD pipeline
βββ engine/
β βββ comparator.py # AI logic for comparing IPC & BNS texts
β βββ llm.py # Fallback logic and LLM summarization
β βββ mapping_logic.py # Core IPC to BNS transition logic
β βββ ocr_processor.py # Local OCR extraction and processing
β βββ rag_engine.py # Local Vector Search logic (FAISS)
β βββ db.py # Database connection and queries
βββ utils/
β βββ timeout_handler.py # Resiliency and API timeout handlers
βββ tests/
β βββ test_embeddings.py # Pytest suite for automated testing
βββ scripts/
β βββ ocr_benchmark.py # OCR character error rate testing
βββ models/
β βββ tts/ # Local storage for Piper ONNX voice models
βββ law_pdfs/ # Upload directory for Grounded Fact-Checking
βββ app.py # Main Streamlit UI application
βββ Dockerfile # Production container configuration
βββ requirements.txt # Python dependencies & OS-specific markers
βββ setup_agent.py # Manual setup script for downloading TTS binaries
βββ README.md # Master project documentationThe easiest way to run LexTransition-AI is with Docker. This handles all dependencies (including Tesseract OCR and system libraries) automatically.
-
Clone the repository:
git clone [https://github.com/[username]/LexTransition-AI.git](https://github.com/[username]/LexTransition-AI.git) cd LexTransition-AI -
Build the Docker Image in terminal
docker build -t lextransition . -
Run the Application
docker run -p 8501:8501 -e LTA_OLLAMA_URL="[http://host.docker.internal:11434](http://host.docker.internal:11434)" lextransition-ai -
Open the App
http://localhost:8501
If you prefer to run the app directly in your local Python environment:
-
Install Dependencies (requires Python 3.10)
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate pip install -r requirements.txt
-
Download Voice Agent Models
python setup_agent.py
-
Start the Local LLM
ollama serve ollama pull llama3
-
Launch the App
export LTA_OLLAMA_URL="http://localhost:11434" # On Windows use: set LTA_OLLAMA_URL=http://localhost:11434 streamlit run app.py
All core modules, offline LLM integrations, and containerization features are fully implemented and production-ready.
=========================================================================
π LEXTRANSITION-AI: SYSTEM ARCHITECTURE
=========================================================================
[ π₯οΈ Streamlit Frontend (app.py) ]
|
-------------------------------------------------
| | |
[ π IPC β BNS Mapper ] [ πΌοΈ Document OCR ] [ π Fact-Checker (RAG) ]
| | |
(SQLite Mapping DB) (EasyOCR / PyTesseract) (FAISS + sentence-transformers)
| | |
-------------------------------------------------
|
v
[ π§ Local LLM Engine (Ollama) ]
(Semantic Analysis, Action Items, Summarization)
|
v
[ ποΈ Offline Voice Agent (Piper TTS) ]
(High-fidelity vocal dictation of AI outputs)
=========================================================================
βοΈ INFRASTRUCTURE GUARANTEES
=========================================================================
βοΈ 100% Offline Capable (No external API keys required)
βοΈ Dockerized Deployment (Verified networking & TTS dependencies)
βοΈ CI/CD Pipeline Active (GitHub Actions + Pytest)
- Local Data Storage (Privacy-First) To maintain our strict offline-first architecture, no user data or legal documents ever leave your machine:
- Relational Data: Mappings and system configurations are persisted securely using a local SQLite database (replacing the legacy
mapping_db.json). - Vector Store: Uploaded law PDFs for Grounded Fact-Checking are processed and stored locally in a FAISS vector index (
./vector_store).
-
Automated Testing & CI/CD LexTransition-AI maintains high reliability through local testing and GitHub Actions.
Local Unit Tests To run the test suite locally, ensure your virtual environment is active (Python 3.10) and execute:
pip install -r requirements.txt pytest -q
Every Pull Request automatically triggers our .github/workflows/lextransition-ci.yml pipeline.
To evaluate the local OCR engine's Character Error Rate (CER) and Keyword Recall against custom scanned datasets
python scripts/ocr_benchmark.py --dataset data/ocr_dataset.csv --report ocr_report.mdLexTransition-AI is designed to be plug-and-play, but power users can customize the engine behavior using environment variables. If you are using Docker, these are passed via the -e flag.
| Variable | Default | Description |
|---|---|---|
LTA_OLLAMA_URL |
http://localhost:11434 |
The endpoint for the local LLM. When running in Docker, use http://host.docker.internal:11434 to route traffic to your host machine. |
LTA_OLLAMA_MODEL |
llama3 |
Specifies which local model to use for analysis and summarization. |
LTA_USE_EMBEDDINGS |
1 |
Toggles the FAISS/Sentence-Transformer RAG engine. Set to 0 to fallback to legacy keyword search. |
All foundational features (Local LLM, OCR, Vector DB, and CI/CD) are fully operational. The next phase of development focuses on expanding accessibility and enterprise utility:
- Speech-to-Text (STT) Integration: Implement local Whisper models to allow users to verbally query the Fact-Checker without typing.
- Multilingual Support (Indic Languages): Translate BNS mappings and OCR summaries into Hindi, Bengali, and other regional languages for broader accessibility.
- Precedent & Case Law Expansion: Expand the RAG Vector Database beyond standard Bare Acts to include landmark judicial precedents.
- Automated Legal Briefs: Add a reporting engine to export OCR analysis and IPC-to-BNS comparisons into cleanly formatted PDF/Docx files.
This project exists thanks to the amazing people who contribute their time, ideas, and improvements.
We truly appreciate every contribution π