βββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββ
I don't follow tutorials. I derive equations.
I don't ship demos. I ship systems that survive production.
I don't guess. I benchmark, instrument, and iterate.
I'm an AI engineer who builds from first principles β scratch implementation β hardened deployment. Every system I create is mathematically grounded, rigorously tested, and engineered for failure resilience.
class Amman:
stack = ["LLMs", "RAG", "MLOps", "Transformers", "Evaluation"]
languages = ["Python", "SQL", "Bash"]
approach = "derive β implement from scratch β harden β benchmark β ship"
based_in = "India"
available = "remote, globally"
building = True # alwaysβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π PRODUCTION SYSTEMS Β· ACTIVE Β· PUBLIC β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β‘ Β fast-gpt-lab Β Β·Β active
GPT architecture implemented twice β once for clarity, once for performance. BPE tokenizer from scratch. Benchmarked against nanoGPT.
Bridges the gap between theoretical deep learning and hardware-level optimization. Two complete implementations in one repo:
legacy/ β clean, annotated, readable
every operation mapped to the Attention Is All You Need paper
for understanding the architecture deeply
optimized/ β FlashAttention v2
Rotary Position Embeddings (RoPE)
PagedAttention-style KV cache
SwiGLU MLP
torch.compile
FP8 quantization stubs
FSDP distributed training wrapper
BPE tokenizer built from scratch β merge rules, vocabulary, encode/decode β before touching HuggingFace tokenizers.
benchmarks vs nanoGPT:
perplexity β WikiText-2, measured at every checkpoint
throughput β tokens/sec at batch sizes 1, 8, 32, 128
memory β peak GPU memory per optimization added
compilation β torch.compile speedup measured independently
PyTorch CUDA FlashAttention FSDP FP8 torch.compile RoPE BPE
π₯ Β cost-aware-llm Β Β·Β active
A high-performance LLM Gateway that dynamically routes requests across multiple providers using cost, latency, and reliability signals.
The problem: calling OpenAI directly means paying full price on cacheable queries, and one provider outage takes your whole system down. This gateway solves all three.
# Route by strategy β gateway picks the optimal provider automatically
response = gateway.complete(prompt, strategy="cost") # β cheapest model available
response = gateway.complete(prompt, strategy="speed") # β lowest p99 latency
response = gateway.complete(prompt, strategy="safe") # β circuit-broken fallback chain
# Semantic cache β similar queries return cached response
# "what is gradient descent?" and "explain gradient descent" β same cache hitRouting architecture:
Incoming Request
β
βΌ
Auth + Rate Limit (token bucket per API key)
β
βΌ
Semantic Cache ββββ HIT βββββββββββββββββββΆ Return cached response
β
MISS
β
βΌ
Router (cost / speed / safe signal scoring)
β
ββββΆ OpenAI
ββββΆ Anthropic
ββββΆ Together AI
ββββΆ Local vLLM
β
βΌ
Circuit Breaker ββββ OPEN βββΆ Fallback chain
β
CLOSED
β
βΌ
Response + Prometheus metrics + OpenTelemetry traces
Chaos engineering included β a test suite that randomly kills providers mid-run, verifies circuit breakers open, and confirms fallback activates within SLA.
FastAPI Redis OpenTelemetry Grafana Locust Terraform Kubernetes Multi-tenant
π€ Β agentic-ai-production-system Β Β·Β active
A multi-agent orchestration system built for production β LLMs, tool-use, and workflow orchestration for autonomous reasoning and execution.
Most "agentic AI" projects are chains wrapped in Streamlit. This is different β a full production system with instrumentation, safety gates, evaluation, and a feedback loop that fine-tunes the model on real user interactions.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Request βββΆ FastAPI βββΆ LangGraph Orchestrator β
β β β
β βββββββββββββββββΌββββββββββββββββ β
β βΌ βΌ βΌ β
β Planner Executor Reflectorβ
β β β β β
β βββββββββββββββββΌββββββββββββββββ β
β β β
β βββββββββββββββββββββββΌβββββββββββββββ β
β βΌ βΌ βΌ β
β RAG Pipeline Tool Sandbox Safety β
β (hybrid search) (Docker) Guards β
β β β β
β βββββββββββββββββββββββΌβββββββββββββββ β
β β β
β Prometheus ββ Langfuse ββ Audit Logs (S3) β
β β β
β Human Approval Gate β
β β β
β LoRA Fine-tuning on Feedback β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- β Circuit breakers on every external call β no silent failures
- β PII scrubbing before any data touches the LLM
- β RAGAS evaluation runs on every PR β merge blocked on faithfulness regression
- β Human-in-the-loop approval gate before irreversible tool actions
- β Every interaction logged to S3 for compliance and replay
- β LoRA fine-tuning loop trained on collected thumbs-up/down feedback
LangGraph FastAPI Qdrant Docker Kubernetes RAGAS Prometheus Langfuse LoRA Redis
π Β pure-ml Β Β·Β completed
Mathematical Foundations β Algorithms β Neural Networks β Research Engineering. Machine Learning implemented from scratch using NumPy.
Before touching any framework, I sat down with the mathematics and built everything from scratch. Each algorithm comes with a full derivation document, visual comparisons against sklearn, and benchmarks proving identical outputs.
algorithms β Linear Regression (OLS + gradient descent + Ridge + Lasso)
Logistic Regression (binary + multiclass + regularized)
K-Nearest Neighbors (classification + regression)
K-Means Clustering (elbow method + silhouette analysis)
Naive Bayes (Gaussian + Multinomial + Bernoulli)
Decision Trees (CART + pruning)
Random Forests (bagging + feature importance)
Support Vector Machines (linear + kernel)
Principal Component Analysis
Gradient Boosting
testing β 100% unit tested against sklearn β identical outputs verified
docs β every algorithm: derivation β intuition β code β result
This repo exists to prove one thing: I understand the math, not just the API.
Python NumPy Matplotlib Math-first Unit tested
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CORE
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Python Β· PyTorch Β· NumPy Β· HuggingFace Transformers
LangChain Β· LangGraph Β· FastAPI Β· Pydantic
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
LLM ENGINEERING
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Fine-tuning (LoRA Β· QLoRA) Β· RLHF Β· RAG Pipelines
Prompt Engineering Β· LLM-as-judge Β· Speculative Decoding
FlashAttention Β· RoPE Β· KV Cache Β· FP8 Quantization Β· BPE
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
EVALUATION
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
RAGAS Β· DeepEval Β· Custom Metrics Β· Langfuse Β· Prometheus
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
VECTOR SEARCH
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
FAISS Β· Qdrant Β· Pinecone Β· Weaviate
Dense + Sparse + Hybrid Retrieval Β· Cross-encoder Reranking
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MLOPS & INFRA
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Docker Β· Kubernetes Β· Helm Β· GitHub Actions Β· Terraform
Prometheus Β· Grafana Β· OpenTelemetry Β· Locust
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CLOUD
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AWS β EC2 Β· S3 Β· Lambda Β· SageMaker Β· EKS
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DATA
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PostgreSQL Β· MongoDB Β· Redis Β· Celery Β· Kafka
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MATHEMATICS
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Calculus Β· Linear Algebra Β· Probability Theory Β· Statistics
Every repo I ship clears five gates before merge:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β 01 WHY DOES THIS WORK? β
β Mathematical derivation lives in docs/ β
β No black boxes. No "trust the framework." β
β β
β 02 HOW DOES IT WORK? β
β Scratch implementation before any library β
β If I can't write it in NumPy, I don't use it β
β β
β 03 DOES IT ACTUALLY WORK? β
β Benchmarks with real numbers, not vibes β
β Tested against reference implementations β
β β
β 04 WHAT BROKE? β
β Post-mortems documented in docs/failures.md β
β Failures are first-class content, not hidden β
β β
β 05 CAN IT HANDLE PRODUCTION? β
β Failure modes mapped. Fallbacks implemented. β
β Load tested. Circuit breakers in place. β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
I contribute to the ecosystem, not just consume it. Every repo I build is designed to be forked, extended, and built on β with derivations others can follow, benchmarks others can reproduce, and post-mortems others can learn from.
Actively looking to contribute to:
HuggingFace Transformers β evaluation, documentation, reproducibility
RAGAS β custom metrics, edge case coverage
DeepEval β metric implementations, CI integrations
vLLM β inference optimization experiments
LangGraph β production patterns, reliability improvements
Some projects live in private repos. Some are in closed beta. Being stress-tested with real users before the world sees them.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π― CURRENT FOCUS: PRODUCTION-GRADE AI PLATFORM β
β β
β β’ Multi-tenant architecture with usage metering β
β β’ Real-user feedback loops driving model iteration β
β β’ End-to-end observability: logs, traces, metrics β
β β’ Auth, billing, and rate-limiting baked in from day 1 β
β β
β Status: π§ Private beta Β· Invite-only Β· Real traffic β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Β Research Prototypes (not public yet)
π§ͺ multimodal-data-interpreter
ββ PDF + Excel + images + audio β unified query interface
ββ Natural language β SQL / Python / charts
ββ Auto-dashboard generation with live data refresh
ββ Scalable backend: DuckDB/Spark for >RAM datasets
π§ͺ autonomous-code-reviewer
ββ Agentic PR analysis: bugs, perf, security, style
ββ Test generation + sandboxed execution
ββ Human-in-loop approval gates (reuse production patterns)
ββ GitHub API integration + CI/CD hooks
π§ͺ real-time-meeting-copilot
ββ Live transcription + action item extraction
ββ Post-meeting RAG: "What did John say about the deadline?"
ββ Privacy-first: local inference + on-prem LLM fallback
These are research prototypes. If they survive benchmarking, hardening, and real-user testing β they'll graduate to production repos.
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
AMMAN HUSSAIN ANSARI
AI Engineer Β· MLOps Β· Open Source Contributor
India Β· Remote Β· Globally Available
βββββββββββββββββββββββββββββββββββββββββββββββββββββ

