High-performance Sanskrit NLP toolkit for the LLM era
Vedyut combines Rust performance with Python ease-of-use to provide blazing-fast Sanskrit text processing with first-class multi-script support.
- ⚡ Rust Performance: 100-180x faster than pure Python implementations
- 🌏 First-Class Script Support: Write Sanskrit in any script - Devanagari, IAST, Tamil, Telugu, Malayalam, Kannada, Bengali, and 15+ more
- 🎯 Script-First API Design: Script selection is a required, explicit parameter - not buried in options
- 📝 Full Sanskrit NLP: Transliteration, segmentation, morphological analysis, word generation
- 🤖 LLM-Ready: Built-in integrations for RAG, agents, and modern ML workflows
- 🐍 Python API: Clean, type-safe Python interface powered by Rust core
Sanskrit can be written in any script. Vedyut treats script selection as a first-class feature:
| Category | Scripts |
|---|---|
| Romanization | IAST, SLP1, Harvard-Kyoto, ITRANS, ISO 15919, Velthuis, WX |
| Indian Scripts | Devanagari, Telugu, Tamil, Kannada, Malayalam, Bengali, Gujarati, Gurmukhi, Odia, Assamese |
| Other Scripts | Tibetan, Sinhala, Burmese, Thai, Grantha |
pip install vedyut# Clone repository
git clone https://github.com/VedantMadane/vedyut.git
cd vedyut
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .from vedyut import transliterate, segment, analyze, Script
# Transliterate between any scripts
# Script is a FIRST-CLASS parameter - explicit and required!
devanagari = transliterate("namaste", Script.IAST, Script.DEVANAGARI)
# → "नमस्ते"
tamil = transliterate("namaste", Script.IAST, Script.TAMIL)
# → "நமஸ்தே"
telugu = transliterate("namaste", Script.IAST, Script.TELUGU)
# → "నమస్తే"
# Segment text into words
segments = segment("धर्मक्षेत्रे कुरुक्षेत्रे", Script.DEVANAGARI)
# → [["धर्मक्षेत्रे", "कुरुक्षेत्रे"]]
# Morphological analysis
analysis = analyze("रामः", Script.DEVANAGARI)
# → [{"stem": "राम", "case": "nominative", ...}]use vedyut_lipi::{transliterate, Scheme};
fn main() {
// Script as first-class parameter
let result = transliterate(
"dharmakṣetre",
Scheme::Iast,
Scheme::Devanagari
);
println!("{}", result); // धर्मक्षेत्रे
}# Start the API server
uv run uvicorn vedyut.api.main:app --reload
# Or with Python
python -m vedyut.api.main# Transliterate
curl -X POST http://localhost:8000/v1/transliterate \
-H "Content-Type: application/json" \
-d '{
"text": "namaste",
"from_scheme": "iast",
"to_scheme": "devanagari"
}'Vedyut makes script selection explicit and unavoidable - it's a core design principle:
# Script is a required, explicit parameter
transliterate(text, from_script, to_script)
segment(text, script=Script.DEVANAGARI)
analyze(word, script=Script.TAMIL)# Don't do this - script hidden in options
transliterate(text, options={"from": "iast", "to": "deva"})
process(text, config=Config(script="devanagari"))vedyut/
├── rust/ # Rust core (performance-critical)
│ ├── vedyut-lipi/ # Transliteration engine
│ ├── vedyut-sandhi/ # Sandhi rules & splitting
│ ├── vedyut-prakriya/ # Word generation (Pāṇinian)
│ ├── vedyut-kosha/ # High-speed lexicon
│ └── vedyut-cheda/ # Segmentation & analysis
├── python/ # Python API (user-friendly)
│ └── vedyut/
│ ├── __init__.py # Clean Python interface
│ ├── api/ # FastAPI web service
│ └── llm/ # LLM integrations
└── tests/ # Integration tests
cd rust
cargo build --release
cargo testuv run pytest tests/ -v# Rust
cd rust
cargo fmt
cargo clippy -- -D warnings
# Python
uv run ruff format .
uv run ruff check .Vedyut achieves 100-180x speedup vs pure Python:
| Operation | Pure Python | Vedyut (Rust) | Speedup |
|---|---|---|---|
| Transliteration | ~1ms | <10μs | ~100x |
| Word lookup | ~10μs | 820ns | ~12x |
| Verse segmentation | 1.8s | 10ms | ~180x |
| Word generation | 10s/word | 20μs/word | ~500,000x |
Vedyut is designed for the LLM era with built-in support for:
- RAG (Retrieval-Augmented Generation): Semantic chunking respecting sandhi boundaries
- Agent Frameworks: LangChain/CrewAI tool definitions
- Embeddings: Batch processing for vector databases
from vedyut.llm import SanskritRAG
# Semantic chunking with script support
rag = SanskritRAG(
texts=["bhagavad_gita.txt"],
script=Script.DEVANAGARI
)
results = rag.query("What does Krishna say about dharma?")- Multi-script transliteration (25+ scripts)
- Script as first-class API parameter
- Rust core skeleton with CI
- Production transliteration implementation
- Complete sandhi rules (Aṣṭādhyāyī)
- Lexicon with 29M+ forms
- Python bindings (PyO3)
- WebAssembly support
- ML-based scoring for segmentation
- Neural + rule-based hybrid models
Contributions welcome! See CONTRIBUTING.md for guidelines.
Key areas:
- Implementing transliteration mappings
- Adding sandhi rules
- Building lexicon data
- LLM integrations
- Documentation & examples
This project is licensed under the MIT License - see LICENSE file for details.
- Inspired by vidyut (Ambuda project)
- sanskrit_parser for Python foundations
- The Sanskrit NLP community for research and data
- vidyut - Reliable Sanskrit infrastructure (upstream inspiration)
- sanskrit_parser - Python Sanskrit parser
- indic-transliteration - Python transliteration
- GitHub: @VedantMadane
- Issues: GitHub Issues
Made with ❤️ for the Sanskrit and Indic language communities
Key Feature: Sanskrit in ANY script - script selection is first-class! 🌏