A state-of-the-art, production-ready LLM-powered system for generating human-readable explanations of Python code with enhanced retrieval, security, and monitoring capabilities.
🚀 Quick Start • 📖 Documentation • � Tutorial • �🔧 Installation • 💡 Examples • 🤝 Contributing • 💬 Discussions
Device portability and intelligent explanations:
- Unified DeviceManager selects the best device automatically (CUDA > MPS > CPU) with safe fallbacks
- Precision control via CODE_EXPLAINER_PRECISION (fp32, fp16, bf16, 8bit)
- Optional IntelligentExplanationGenerator for adaptive, audience-aware explanations
- Advanced AI Models: Fine-tuned CodeT5, CodeBERT, and GPT models for accurate explanations
- Enhanced RAG: Retrieval-Augmented Generation with FAISS, BM25, and hybrid search
- Cross-Encoder Reranking: Improved relevance with sentence-transformers rerankers
- MMR Diversity: Maximal Marginal Relevance for diverse code examples
- Multi-Agent Analysis: Collaborative explanations from specialized agents
- Symbolic Analysis: Property-based testing and complexity analysis
- Batch Processing: Efficient batch explanation with memory optimization and progress tracking
- Async Processing: Non-blocking explanation generation for better responsiveness
- Performance Monitoring: Real-time memory usage, GPU stats, and performance metrics
- Model Optimization: Quantization support (4-bit/8-bit), gradient checkpointing, and inference optimizations
- Security Features: Input validation, rate limiting, and security auditing
- API v2 Endpoints: Enhanced REST API with performance monitoring, security validation, and model optimization
- Multiple Strategies: vanilla, ast_augmented, retrieval_augmented, execution_trace, and enhanced_rag
- Code Understanding: Support for functions, classes, algorithms, and data structures
- Complexity Analysis: Automatic time/space complexity detection
- Error Pattern Recognition: Common bug identification and debugging suggestions
- Intelligent Augmentation: Automatic function name and recursion hints for robustness
- REST API: FastAPI with Prometheus metrics, rate limiting, and health checks
- Web UI: Streamlit and Gradio interfaces for interactive exploration
- CLI Tools: Comprehensive command-line interface with rich output
- Python SDK: Direct integration for developers
Small compatibility shims have been added to improve testability and backwards compatibility with prior consumer code. Notable shims include:
CodeExplainer.explain_code_with_symbolic(...)— convenience method that returns combined symbolic + textual explanations.CodeExplainerTraineraccepts aconfig_pathparameter for older callers.clear_model_cacheandget_model_cache_infoare exported at the package level for convenience when managing cached model artifacts.
- Code Redaction: Automatic PII and credential detection and redaction
- Security Validation: AST-based dangerous pattern detection
- Safe Execution: Sandboxed code execution with resource limits
- Input Validation: Comprehensive request validation and sanitization
- Prometheus Metrics: API performance, error rates, and P95/P99 latencies
- Grafana Dashboard: Pre-built monitoring dashboards
- Structured Logging: JSON logging with request IDs and tracing
- Health Checks: Comprehensive service health monitoring
- Performance Monitoring: Memory usage, GPU utilization, and cache statistics
- Multi-Level Caching: Explanation cache, embedding cache, and advanced cache with strategies
- Cache Strategies: LRU, LFU, FIFO, Size-based, and Adaptive eviction policies
- Persistence: Disk-backed caching with TTL and compression
- Invalidation: Tag-based, time-based, version-based, and content-based invalidation
- Cache Metrics: Hit rates, access times, and eviction statistics
- Traditional Metrics: BLEU, ROUGE-L, BERTScore, CodeBLEU for quantitative assessment
- LLM-as-a-Judge: Multi-judge consensus evaluation with GPT-4 and Claude
- Preference Learning: Pairwise comparisons and Bradley-Terry ranking
- Contamination Detection: Comprehensive data leakage detection (exact, n-gram, semantic)
- Robustness Testing: Adversarial testing with 7 transformation types
- Comprehensive CLI: Full evaluation pipeline with detailed reporting
- Quality Assurance: Automated testing with pytest, coverage, and type checking
- Release Automation: Automated releases with changelogs and semantic versioning
- Pre-commit Hooks: Code formatting, linting, and security checks
- Multi-environment Testing: Testing across Python 3.9, 3.10, 3.11, 3.12
- Setup Validation: Automated configuration and environment validation
- mkdocs Documentation: Comprehensive documentation site with examples
- Development Containers: VS Code devcontainer for instant setup
- Makefile Automation: Common tasks simplified with make commands
- nbstripout: Clean notebook commits without outputs
# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -
# Install from source with Poetry
git clone https://github.com/rajatsainju2025/code-explainer.git
cd code-explainer
poetry install
# For RAG features (optional)
poetry install --with rag
# For development
poetry install --with dev
# For all optional dependencies
poetry install --with rag,metrics,monitoring,dev# Basic installation
pip install code-explainer
# With RAG features
pip install code-explainer[rag]
# With all optional features
pip install code-explainer[all]from code_explainer import CodeExplainer
# Initialize the explainer
explainer = CodeExplainer()
# Explain some code
code = """
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
"""
explanation = explainer.explain_code(code)
print(explanation)# Start the FastAPI server
python -m code_explainer.cli_commands.main serve
# Or use Streamlit (if available)
# streamlit run streamlit_app.py
# Or use Gradio (if available)
# python -c "import gradio as gr; gr.Interface(...)"# Explain a file
python -m code_explainer.cli_commands.main explain --file examples/fibonacci.py
# Use different strategies
python -m code_explainer.cli_commands.main explain --file mycode.py --prompt-strategy vanilla
python -m code_explainer.cli_commands.main explain --file mycode.py --prompt-strategy ast_augmented
python -m code_explainer.cli_commands.main explain --file mycode.py --prompt-strategy retrieval_augmented
# Run evaluations
python -m code_explainer.cli_commands.main eval --dataset humaneval --model codet5-small
# Research-driven evaluation (contamination, dynamic, multi-agent, adversarial)
python -c "
from code_explainer.research_evaluation_orchestrator import ResearchEvaluationOrchestrator
orchestrator = ResearchEvaluationOrchestrator()
results = orchestrator.run_evaluation(model, dataset)
print(results)
"
# Check security
python -c "
from code_explainer.security import CodeSecurityValidator
validator = CodeSecurityValidator()
is_safe, issues = validator.validate_code(user_code)
print('Safe:', is_safe, 'Issues:', issues)
"
# Run golden tests
python -m code_explainer.cli_commands.main test
For a 15-minute walkthrough, see the Zero to Results tutorial: docs/tutorials/zero_to_results.md| Metric | CodeT5-Small | CodeT5-Base | GPT-3.5-Turbo | Our Enhanced RAG |
|---|---|---|---|---|
| BLEU-4 | 0.42 | 0.48 | 0.55 | 0.61 |
| ROUGE-L | 0.38 | 0.44 | 0.52 | 0.58 |
| BERTScore | 0.71 | 0.76 | 0.82 | 0.85 |
| CodeBLEU | 0.35 | 0.41 | 0.48 | 0.54 |
| Human Rating | 3.2/5 | 3.6/5 | 4.1/5 | 4.4/5 |
Benchmarked on HumanEval and MBPP datasets with human evaluators.
Our evaluation system implements state-of-the-art assessment methods following open evaluation best practices:
# Comprehensive traditional metrics
code-explainer evaluate \
--test-data test.jsonl \
--predictions predictions.jsonl \
--metrics bleu rouge bertscore codebleu# Multi-judge consensus evaluation
code-explainer eval-llm-judge \
--test-data test.jsonl \
--predictions predictions.jsonl \
--judges gpt-4 claude-3-sonnet \
--criteria accuracy clarity completeness# Detect data leakage between train/test
code-explainer eval-contamination \
--train-data train.jsonl \
--test-data test.jsonl \
--methods exact ngram substring semantic# Test model robustness under adversarial conditions
code-explainer eval-robustness \
--test-data test.jsonl \
--model-path ./results \
--test-types typo case whitespace punctuation \
--severity-levels 0.05 0.1 0.2# Compare models using pairwise preferences
code-explainer eval-preference \
--test-data test.jsonl \
--predictions-a model_a.jsonl \
--predictions-b model_b.jsonl \
--use-bradley-terry📖 See our Advanced Evaluation Tutorial for comprehensive examples and best practices.
graph TB
A[Code Input] --> B[Security Validation]
B --> C[AST Analysis]
C --> D[Strategy Selection]
D --> E1[Vanilla LLM]
D --> E2[AST-Augmented]
D --> E3[Enhanced RAG]
D --> E4[Multi-Agent]
E3 --> F[Vector Store]
E3 --> G[BM25 Index]
E3 --> H[Cross-Encoder Reranker]
E1 --> I[Response Synthesis]
E2 --> I
E3 --> I
E4 --> I
I --> J[Quality Validation]
J --> K[Security Redaction]
K --> L[Final Explanation]
The system is highly configurable through YAML files:
# configs/custom.yaml
model:
name: "microsoft/CodeGPT-small-py"
max_length: 512
temperature: 0.7
training:
num_train_epochs: 100
per_device_train_batch_size: 8
learning_rate: 5e-5
prompt:
template: "Explain this Python code:\n```python\n{code}\n```\nExplanation:"Use ready-made presets to switch models quickly:
| Preset | Arch | Base Model | Config | Train | Evaluate |
|---|---|---|---|---|---|
| DistilGPT-2 (default) | causal | distilgpt2 | configs/default.yaml |
cx-train -c configs/default.yaml |
code-explainer eval -c configs/default.yaml |
| CodeT5 Small | seq2seq | Salesforce/codet5-small | configs/codet5-small.yaml |
cx-train -c configs/codet5-small.yaml |
code-explainer eval -c configs/codet5-small.yaml |
| CodeT5 Base | seq2seq | Salesforce/codet5-base | configs/codet5-base.yaml |
cx-train -c configs/codet5-base.yaml |
code-explainer eval -c configs/codet5-base.yaml |
| CodeGPT Small (CodeBERT family) | causal | microsoft/CodeGPT-small-py | configs/codebert-base.yaml |
cx-train -c configs/codebert-base.yaml |
code-explainer eval -c configs/codebert-base.yaml |
| StarCoderBase 1B | causal | bigcode/starcoderbase-1b | configs/starcoderbase-1b.yaml |
cx-train -c configs/starcoderbase-1b.yaml |
code-explainer eval -c configs/starcoderbase-1b.yaml |
| StarCoder2 Instruct | causal | bigcode/starcoder2-3b | configs/starcoder2-instruct.yaml |
cx-train -c configs/starcoder2-instruct.yaml |
code-explainer eval -c configs/starcoder2-instruct.yaml |
| CodeLlama Instruct | causal | codellama/CodeLlama-7b-Instruct-hf | configs/codellama-instruct.yaml |
cx-train -c configs/codellama-instruct.yaml |
code-explainer eval -c configs/codellama-instruct.yaml |
Data paths in each config default to the tiny examples in data/. Override any path via CLI flags (e.g., --data for training or --test-file for eval).
from code_explainer import CodeExplainerTrainer
# Initialize trainer with custom config
trainer = CodeExplainerTrainer("configs/custom.yaml")
# Train on custom dataset
trainer.train(data_path="data/my_dataset.json")# Batch processing
codes = ["print('hello')", "x = [1,2,3]", "def add(a,b): return a+b"]
explanations = explainer.explain_code_batch(codes)
# Prompt strategy (CLI)
# From API
# POST /explain {"code": "...", "strategy": "ast_augmented"}
# A/B compare strategies
python scripts/ab_compare_strategies.py --config configs/default.yaml --max-samples 5 \
--strategies vanilla ast_augmented retrieval_augmentedSee docs/strategies.md for details on: vanilla | ast_augmented | retrieval_augmented | execution_trace, including safety notes and examples.
See quick-start examples in examples/ (training, evaluation, and serving with presets). Start here:
- examples/README.md
- examples/preset_switching.md
- examples/eval_report_template.md
Contribute examples/data: see the discussion “Call for community samples (tiny datasets)” in the Discussions tab.
📝 Example Explanations
Input:
class BankAccount:
def __init__(self, balance=0):
self.balance = balance
def deposit(self, amount):
self.balance += amount
return self.balanceOutput:
This code defines a
BankAccountclass that represents a simple bank account. The__init__method initializes the account with an optional starting balance (defaulting to 0). Thedepositmethod adds money to the account and returns the new balance.
git clone https://github.com/rajatsainju2025/code-explainer.git
cd code-explainer
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit installAdditional tools:
- Makefile targets: install, format, lint, type, precommit, test, clean
- Devcontainer:
.devcontainer/devcontainer.jsonfor a ready-made VS Code container
# Run all tests
pytest
# Run with coverage
pytest --cov=code_explainer --cov-report=html
# Run specific test
pytest tests/test_model.py::test_explain_codeFor scope, speed, and coverage goals, see the testing strategy discussion: .github/DISCUSSIONS.md.
Planning & Roadmap
- Plan review: docs/plan_review.md
- Roadmap: NEXT_PHASE_ROADMAP.md
- Reimagination: REIMAGINE.md
# Format code
black src/ tests/
# Sort imports
isort src/ tests/
# Type checking
mypy src/
```dockerfile
# Build image
docker build -t code-explainer .
# Run web interface
# Run training
docker run -v $(pwd)/data:/app/data code-explainer train --data /app/data/train.json- Multi-language Support: JavaScript, Java, C++, etc.
- Advanced Models: Integration with CodeT5, CodeBERT, StarCoder
- VS Code Extension: Direct integration with development environment
- API Service: RESTful API for integration with other tools
- Performance Optimization: Model quantization and optimization
- Enterprise Features: Authentication, usage tracking, custom deployments
Current Status: Push 15/20 Complete ✅
- Push 1-5: Initial setup and core improvements
- Push 6-10: Advanced caching and batch processing
- Push 11: Logging enhancements
- Push 12: Performance optimizations (quantization, gradient checkpointing, memory monitoring)
- Push 13: Security enhancements (rate limiting, input validation, security auditing)
- Push 14: API improvements (v2 endpoints for health, performance, security validation)
- Push 15: Testing expansions (comprehensive integration tests for all new features)
- Push 16: Documentation updates
- Push 17: CI/CD enhancements
- Push 18: Performance benchmarking
- Push 19: Production deployment
- Push 20: Final integration and release
Track progress in IMPLEMENTATION_STATUS.md
We welcome contributions! Please see our Contributing Guide for details.
4. Push to the branch (git push origin feature/amazing-feature)
This project is licensed under the MIT License - see the LICENSE file for details.
-
Hugging Face for the amazing Transformers library
-
OpenAI for GPT model architecture inspiration
-
The open-source community for various tools and libraries
-
Author: Rajat Sainju
-
Email: your.email@example.com
-
GitHub: @rajatsainju2025
-
Project Link: https://github.com/rajatsainju2025/code-explainer
- Start here: #4
- General Q&A and ideas: Discussions tab
Run the FastAPI server (example):
uvicorn code_explainer.api.server:app --host 0.0.0.0 --port 8000Endpoints:
- GET /health → {"status": "ok"}
- GET /version → {"version": }
- GET /strategies → list of supported strategies
- POST /explain {code: str, strategy?: str} → {explanation: str}
More: see docs/api.md and docs/strategies.md.
Code Explainer includes comprehensive performance optimizations for production deployments:
- orjson Integration: 3-10x faster JSON serialization across Redis caching, security, retrieval, config loading, and data governance modules
- Shared Utilities: Centralized
json_loads/json_dumpsinutils/hashing.pyfor consistency
__slots__Everywhere: 20-30% memory reduction onCacheStats,CacheConfig,RetrievalConfig,CodeExplainerException,DatabaseConfig,DataGovernanceConfig,StructuredLoggerfrozensetLookups: O(1) validation for strategies, languages, and retrieval methods
perf_counter()Timing: Sub-millisecond precision for API latency measurements@lru_cacheConfidence: Cached multi-agent confidence computations- AST Caching: Bounded cache for parsed syntax trees in symbolic analyzer
- Precompiled Regex: Input sanitization patterns compiled once at module load
- Named Constants:
ONE_HOUR,TWO_HOURS,ONE_DAYfor TTL;_MIN_CODE_LENGTH_FOR_CACHE, etc. for symbolic analyzer - Type Safety:
Optional[T]annotations throughout error handling and logging __all__Exports: Explicit public API in validation and cache modules
Run benchmarks to validate:
python scripts/benchmark_hashing.py
python benchmarks/benchmark_inference.py