AMMAN HUSSAIN ANSARI ammmanism

░█████╗░███╗░░░███╗███╗░░░███╗░█████╗░███╗░░██╗
██╔══██╗████╗░████║████╗░████║██╔══██╗████╗░██║
███████║██╔████╔██║██╔████╔██║███████║██╔██╗██║
██╔══██║██║╚██╔╝██║██║╚██╔╝██║██╔══██║██║╚████║
██║░░██║██║░╚═╝░██║██║░╚═╝░██║██║░░██║██║░╚███║
╚═╝░░╚═╝╚═╝░░░░╚═╝╚═╝░░░░╚═╝╚═╝░░╚═╝╚═╝░░╚══╝

`// who I am`

I don't follow tutorials. I derive equations.
I don't ship demos. I ship systems that survive production.
I don't guess. I benchmark, instrument, and iterate.

I'm an AI engineer who builds from first principles → scratch implementation → hardened deployment. Every system I create is mathematically grounded, rigorously tested, and engineered for failure resilience.

class Amman:
    stack      = ["LLMs", "RAG", "MLOps", "Transformers", "Evaluation"]
    languages  = ["Python", "SQL", "Bash"]
    approach   = "derive → implement from scratch → harden → benchmark → ship"
    based_in   = "India"
    available  = "remote, globally"
    building   = True  # always

`// what I've shipped`

╔═══════════════════════════════════════════════════════════╗
║         🚀  PRODUCTION SYSTEMS  ·  ACTIVE  ·  PUBLIC      ║
╚═══════════════════════════════════════════════════════════╝

⚡ fast-gpt-lab · active

GPT architecture implemented twice — once for clarity, once for performance. BPE tokenizer from scratch. Benchmarked against nanoGPT.

Bridges the gap between theoretical deep learning and hardware-level optimization. Two complete implementations in one repo:

legacy/        →  clean, annotated, readable
                  every operation mapped to the Attention Is All You Need paper
                  for understanding the architecture deeply

optimized/     →  FlashAttention v2
                  Rotary Position Embeddings (RoPE)
                  PagedAttention-style KV cache
                  SwiGLU MLP
                  torch.compile
                  FP8 quantization stubs
                  FSDP distributed training wrapper

BPE tokenizer built from scratch — merge rules, vocabulary, encode/decode — before touching HuggingFace tokenizers.

benchmarks vs nanoGPT:

  perplexity   →  WikiText-2, measured at every checkpoint
  throughput   →  tokens/sec at batch sizes 1, 8, 32, 128
  memory       →  peak GPU memory per optimization added
  compilation  →  torch.compile speedup measured independently

PyTorch CUDA FlashAttention FSDP FP8 torch.compile RoPE BPE

🔥 cost-aware-llm · active

A high-performance LLM Gateway that dynamically routes requests across multiple providers using cost, latency, and reliability signals.

The problem: calling OpenAI directly means paying full price on cacheable queries, and one provider outage takes your whole system down. This gateway solves all three.

# Route by strategy — gateway picks the optimal provider automatically
response = gateway.complete(prompt, strategy="cost")    # → cheapest model available
response = gateway.complete(prompt, strategy="speed")   # → lowest p99 latency
response = gateway.complete(prompt, strategy="safe")    # → circuit-broken fallback chain

# Semantic cache — similar queries return cached response
# "what is gradient descent?" and "explain gradient descent" → same cache hit

Routing architecture:

Incoming Request
      │
      ▼
  Auth + Rate Limit (token bucket per API key)
      │
      ▼
  Semantic Cache ──── HIT ──────────────────▶ Return cached response
      │
     MISS
      │
      ▼
  Router (cost / speed / safe signal scoring)
      │
      ├──▶  OpenAI
      ├──▶  Anthropic
      ├──▶  Together AI
      └──▶  Local vLLM
      │
      ▼
  Circuit Breaker ──── OPEN ──▶ Fallback chain
      │
     CLOSED
      │
      ▼
  Response + Prometheus metrics + OpenTelemetry traces

Chaos engineering included — a test suite that randomly kills providers mid-run, verifies circuit breakers open, and confirms fallback activates within SLA.

FastAPI Redis OpenTelemetry Grafana Locust Terraform Kubernetes Multi-tenant

🤖 agentic-ai-production-system · active

A multi-agent orchestration system built for production — LLMs, tool-use, and workflow orchestration for autonomous reasoning and execution.

Most "agentic AI" projects are chains wrapped in Streamlit. This is different — a full production system with instrumentation, safety gates, evaluation, and a feedback loop that fine-tunes the model on real user interactions.

┌─────────────────────────────────────────────────────────┐
│                                                         │
│   Request  ──▶  FastAPI  ──▶  LangGraph Orchestrator   │
│                                    │                    │
│                    ┌───────────────┼───────────────┐   │
│                    ▼               ▼               ▼   │
│                 Planner        Executor        Reflector│
│                    │               │               │   │
│                    └───────────────┼───────────────┘   │
│                                    │                    │
│              ┌─────────────────────┼──────────────┐    │
│              ▼                     ▼              ▼    │
│         RAG Pipeline         Tool Sandbox    Safety    │
│         (hybrid search)      (Docker)       Guards    │
│              │                                    │    │
│              └─────────────────────┼──────────────┘    │
│                                    │                    │
│        Prometheus ── Langfuse ── Audit Logs (S3)       │
│                                    │                    │
│                        Human Approval Gate             │
│                                    │                    │
│                     LoRA Fine-tuning on Feedback       │
│                                                         │
└─────────────────────────────────────────────────────────┘

✅ Circuit breakers on every external call — no silent failures
✅ PII scrubbing before any data touches the LLM
✅ RAGAS evaluation runs on every PR — merge blocked on faithfulness regression
✅ Human-in-the-loop approval gate before irreversible tool actions
✅ Every interaction logged to S3 for compliance and replay
✅ LoRA fine-tuning loop trained on collected thumbs-up/down feedback

LangGraph FastAPI Qdrant Docker Kubernetes RAGAS Prometheus Langfuse LoRA Redis

📐 pure-ml · completed

Mathematical Foundations → Algorithms → Neural Networks → Research Engineering. Machine Learning implemented from scratch using NumPy.

Before touching any framework, I sat down with the mathematics and built everything from scratch. Each algorithm comes with a full derivation document, visual comparisons against sklearn, and benchmarks proving identical outputs.

algorithms   →   Linear Regression (OLS + gradient descent + Ridge + Lasso)
                 Logistic Regression (binary + multiclass + regularized)
                 K-Nearest Neighbors (classification + regression)
                 K-Means Clustering (elbow method + silhouette analysis)
                 Naive Bayes (Gaussian + Multinomial + Bernoulli)
                 Decision Trees (CART + pruning)
                 Random Forests (bagging + feature importance)
                 Support Vector Machines (linear + kernel)
                 Principal Component Analysis
                 Gradient Boosting

testing      →   100% unit tested against sklearn — identical outputs verified
docs         →   every algorithm: derivation → intuition → code → result

This repo exists to prove one thing: I understand the math, not just the API.

Python NumPy Matplotlib Math-first Unit tested

`// tech arsenal`

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  CORE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Python · PyTorch · NumPy · HuggingFace Transformers
  LangChain · LangGraph · FastAPI · Pydantic

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  LLM ENGINEERING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Fine-tuning (LoRA · QLoRA) · RLHF · RAG Pipelines
  Prompt Engineering · LLM-as-judge · Speculative Decoding
  FlashAttention · RoPE · KV Cache · FP8 Quantization · BPE

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  EVALUATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  RAGAS · DeepEval · Custom Metrics · Langfuse · Prometheus

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  VECTOR SEARCH
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  FAISS · Qdrant · Pinecone · Weaviate
  Dense + Sparse + Hybrid Retrieval · Cross-encoder Reranking

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  MLOPS & INFRA
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Docker · Kubernetes · Helm · GitHub Actions · Terraform
  Prometheus · Grafana · OpenTelemetry · Locust

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  CLOUD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  AWS — EC2 · S3 · Lambda · SageMaker · EKS

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  DATA
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  PostgreSQL · MongoDB · Redis · Celery · Kafka

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  MATHEMATICS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Calculus · Linear Algebra · Probability Theory · Statistics

`// how I build`

  Every repo I ship clears five gates before merge:

  ┌─────────────────────────────────────────────────────┐
  │                                                     │
  │  01  WHY DOES THIS WORK?                            │
  │      Mathematical derivation lives in docs/         │
  │      No black boxes. No "trust the framework."      │
  │                                                     │
  │  02  HOW DOES IT WORK?                              │
  │      Scratch implementation before any library      │
  │      If I can't write it in NumPy, I don't use it  │
  │                                                     │
  │  03  DOES IT ACTUALLY WORK?                         │
  │      Benchmarks with real numbers, not vibes        │
  │      Tested against reference implementations       │
  │                                                     │
  │  04  WHAT BROKE?                                    │
  │      Post-mortems documented in docs/failures.md    │
  │      Failures are first-class content, not hidden   │
  │                                                     │
  │  05  CAN IT HANDLE PRODUCTION?                      │
  │      Failure modes mapped. Fallbacks implemented.   │
  │      Load tested. Circuit breakers in place.        │
  │                                                     │
  └─────────────────────────────────────────────────────┘

`// open source`

I contribute to the ecosystem, not just consume it. Every repo I build is designed to be forked, extended, and built on — with derivations others can follow, benchmarks others can reproduce, and post-mortems others can learn from.

Actively looking to contribute to:

  HuggingFace Transformers   →   evaluation, documentation, reproducibility
  RAGAS                      →   custom metrics, edge case coverage
  DeepEval                   →   metric implementations, CI integrations
  vLLM                       →   inference optimization experiments
  LangGraph                  →   production patterns, reliability improvements

`// in the lab 🔐`

Some projects live in private repos. Some are in closed beta. Being stress-tested with real users before the world sees them.

╔═══════════════════════════════════════════════════════════╗
║  🎯  CURRENT FOCUS: PRODUCTION-GRADE AI PLATFORM          ║
║                                                           ║
║  • Multi-tenant architecture with usage metering          ║
║  • Real-user feedback loops driving model iteration       ║
║  • End-to-end observability: logs, traces, metrics        ║
║  • Auth, billing, and rate-limiting baked in from day 1   ║
║                                                           ║
║  Status: 🚧 Private beta · Invite-only · Real traffic    ║
╚═══════════════════════════════════════════════════════════╝

🔍 Research Prototypes (not public yet)

🧪 multimodal-data-interpreter
   ├─ PDF + Excel + images + audio → unified query interface
   ├─ Natural language → SQL / Python / charts
   ├─ Auto-dashboard generation with live data refresh
   └─ Scalable backend: DuckDB/Spark for >RAM datasets

🧪 autonomous-code-reviewer
   ├─ Agentic PR analysis: bugs, perf, security, style
   ├─ Test generation + sandboxed execution
   ├─ Human-in-loop approval gates (reuse production patterns)
   └─ GitHub API integration + CI/CD hooks

🧪 real-time-meeting-copilot
   ├─ Live transcription + action item extraction
   ├─ Post-meeting RAG: "What did John say about the deadline?"
   └─ Privacy-first: local inference + on-prem LLM fallback

These are research prototypes. If they survive benchmarking, hardening, and real-user testing — they'll graduate to production repos.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  AMMAN HUSSAIN ANSARI
  AI Engineer  ·  MLOps  ·  Open Source Contributor
  India  ·  Remote  ·  Globally Available
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMMAN HUSSAIN ANSARI ammmanism

Achievements