Skip to content
View ammmanism's full-sized avatar
  • πŸš€ AI/ML Engineer | Production RAG β€’ LLMs β€’ MLOps | 73% cheaper | Open to Remote 🌍
  • 🌍 Remote
  • LinkedIn in/ammmanism
  • X @ammmanism

Block or report ammmanism

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ammmanism/README.md
β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–‘β–ˆβ–ˆβ–ˆβ•—β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•—β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ•—β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–‘β–ˆβ–ˆβ–ˆβ•—β–‘β–‘β–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•—β–‘β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ•—β–‘β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•—β–‘β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•‘β–‘β–‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–‘β•šβ•β•β–‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–‘β•šβ•β•β–‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–‘β–‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–‘β•šβ–ˆβ–ˆβ–ˆβ•‘
β•šβ•β•β–‘β–‘β•šβ•β•β•šβ•β•β–‘β–‘β–‘β–‘β•šβ•β•β•šβ•β•β–‘β–‘β–‘β–‘β•šβ•β•β•šβ•β•β–‘β–‘β•šβ•β•β•šβ•β•β–‘β–‘β•šβ•β•β•

LinkedIn Twitter Email GitHub


Β  Β 


// who I am

I don't follow tutorials. I derive equations.
I don't ship demos. I ship systems that survive production.
I don't guess. I benchmark, instrument, and iterate.

I'm an AI engineer who builds from first principles β†’ scratch implementation β†’ hardened deployment. Every system I create is mathematically grounded, rigorously tested, and engineered for failure resilience.

class Amman:
    stack      = ["LLMs", "RAG", "MLOps", "Transformers", "Evaluation"]
    languages  = ["Python", "SQL", "Bash"]
    approach   = "derive β†’ implement from scratch β†’ harden β†’ benchmark β†’ ship"
    based_in   = "India"
    available  = "remote, globally"
    building   = True  # always

// what I've shipped


╔═══════════════════════════════════════════════════════════╗
β•‘         πŸš€  PRODUCTION SYSTEMS  Β·  ACTIVE  Β·  PUBLIC      β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

⚑   fast-gpt-lab  ·  active

GPT architecture implemented twice β€” once for clarity, once for performance. BPE tokenizer from scratch. Benchmarked against nanoGPT.

Bridges the gap between theoretical deep learning and hardware-level optimization. Two complete implementations in one repo:

legacy/        β†’  clean, annotated, readable
                  every operation mapped to the Attention Is All You Need paper
                  for understanding the architecture deeply

optimized/     β†’  FlashAttention v2
                  Rotary Position Embeddings (RoPE)
                  PagedAttention-style KV cache
                  SwiGLU MLP
                  torch.compile
                  FP8 quantization stubs
                  FSDP distributed training wrapper

BPE tokenizer built from scratch β€” merge rules, vocabulary, encode/decode β€” before touching HuggingFace tokenizers.

benchmarks vs nanoGPT:

  perplexity   β†’  WikiText-2, measured at every checkpoint
  throughput   β†’  tokens/sec at batch sizes 1, 8, 32, 128
  memory       β†’  peak GPU memory per optimization added
  compilation  β†’  torch.compile speedup measured independently

PyTorch CUDA FlashAttention FSDP FP8 torch.compile RoPE BPE



πŸ”₯ Β  cost-aware-llm Β Β·Β  active

A high-performance LLM Gateway that dynamically routes requests across multiple providers using cost, latency, and reliability signals.

The problem: calling OpenAI directly means paying full price on cacheable queries, and one provider outage takes your whole system down. This gateway solves all three.

# Route by strategy β€” gateway picks the optimal provider automatically
response = gateway.complete(prompt, strategy="cost")    # β†’ cheapest model available
response = gateway.complete(prompt, strategy="speed")   # β†’ lowest p99 latency
response = gateway.complete(prompt, strategy="safe")    # β†’ circuit-broken fallback chain

# Semantic cache β€” similar queries return cached response
# "what is gradient descent?" and "explain gradient descent" β†’ same cache hit

Routing architecture:

Incoming Request
      β”‚
      β–Ό
  Auth + Rate Limit (token bucket per API key)
      β”‚
      β–Ό
  Semantic Cache ──── HIT ──────────────────▢ Return cached response
      β”‚
     MISS
      β”‚
      β–Ό
  Router (cost / speed / safe signal scoring)
      β”‚
      β”œβ”€β”€β–Ά  OpenAI
      β”œβ”€β”€β–Ά  Anthropic
      β”œβ”€β”€β–Ά  Together AI
      └──▢  Local vLLM
      β”‚
      β–Ό
  Circuit Breaker ──── OPEN ──▢ Fallback chain
      β”‚
     CLOSED
      β”‚
      β–Ό
  Response + Prometheus metrics + OpenTelemetry traces

Chaos engineering included β€” a test suite that randomly kills providers mid-run, verifies circuit breakers open, and confirms fallback activates within SLA.

FastAPI Redis OpenTelemetry Grafana Locust Terraform Kubernetes Multi-tenant



πŸ€– Β  agentic-ai-production-system Β Β·Β  active

A multi-agent orchestration system built for production β€” LLMs, tool-use, and workflow orchestration for autonomous reasoning and execution.

Most "agentic AI" projects are chains wrapped in Streamlit. This is different β€” a full production system with instrumentation, safety gates, evaluation, and a feedback loop that fine-tunes the model on real user interactions.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                         β”‚
β”‚   Request  ──▢  FastAPI  ──▢  LangGraph Orchestrator   β”‚
β”‚                                    β”‚                    β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚                    β–Ό               β–Ό               β–Ό   β”‚
β”‚                 Planner        Executor        Reflectorβ”‚
β”‚                    β”‚               β”‚               β”‚   β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                    β”‚                    β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚              β–Ό                     β–Ό              β–Ό    β”‚
β”‚         RAG Pipeline         Tool Sandbox    Safety    β”‚
β”‚         (hybrid search)      (Docker)       Guards    β”‚
β”‚              β”‚                                    β”‚    β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                    β”‚                    β”‚
β”‚        Prometheus ── Langfuse ── Audit Logs (S3)       β”‚
β”‚                                    β”‚                    β”‚
β”‚                        Human Approval Gate             β”‚
β”‚                                    β”‚                    β”‚
β”‚                     LoRA Fine-tuning on Feedback       β”‚
β”‚                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • βœ… Circuit breakers on every external call β€” no silent failures
  • βœ… PII scrubbing before any data touches the LLM
  • βœ… RAGAS evaluation runs on every PR β€” merge blocked on faithfulness regression
  • βœ… Human-in-the-loop approval gate before irreversible tool actions
  • βœ… Every interaction logged to S3 for compliance and replay
  • βœ… LoRA fine-tuning loop trained on collected thumbs-up/down feedback

LangGraph FastAPI Qdrant Docker Kubernetes RAGAS Prometheus Langfuse LoRA Redis



πŸ“ Β  pure-ml Β Β·Β  completed

Mathematical Foundations β†’ Algorithms β†’ Neural Networks β†’ Research Engineering. Machine Learning implemented from scratch using NumPy.

Before touching any framework, I sat down with the mathematics and built everything from scratch. Each algorithm comes with a full derivation document, visual comparisons against sklearn, and benchmarks proving identical outputs.

algorithms   β†’   Linear Regression (OLS + gradient descent + Ridge + Lasso)
                 Logistic Regression (binary + multiclass + regularized)
                 K-Nearest Neighbors (classification + regression)
                 K-Means Clustering (elbow method + silhouette analysis)
                 Naive Bayes (Gaussian + Multinomial + Bernoulli)
                 Decision Trees (CART + pruning)
                 Random Forests (bagging + feature importance)
                 Support Vector Machines (linear + kernel)
                 Principal Component Analysis
                 Gradient Boosting

testing      β†’   100% unit tested against sklearn β€” identical outputs verified
docs         β†’   every algorithm: derivation β†’ intuition β†’ code β†’ result

This repo exists to prove one thing: I understand the math, not just the API.

Python NumPy Matplotlib Math-first Unit tested



// tech arsenal

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  CORE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Python Β· PyTorch Β· NumPy Β· HuggingFace Transformers
  LangChain Β· LangGraph Β· FastAPI Β· Pydantic

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  LLM ENGINEERING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Fine-tuning (LoRA Β· QLoRA) Β· RLHF Β· RAG Pipelines
  Prompt Engineering Β· LLM-as-judge Β· Speculative Decoding
  FlashAttention Β· RoPE Β· KV Cache Β· FP8 Quantization Β· BPE

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  EVALUATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  RAGAS Β· DeepEval Β· Custom Metrics Β· Langfuse Β· Prometheus

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  VECTOR SEARCH
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  FAISS Β· Qdrant Β· Pinecone Β· Weaviate
  Dense + Sparse + Hybrid Retrieval Β· Cross-encoder Reranking

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  MLOPS & INFRA
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Docker Β· Kubernetes Β· Helm Β· GitHub Actions Β· Terraform
  Prometheus Β· Grafana Β· OpenTelemetry Β· Locust

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  CLOUD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  AWS β€” EC2 Β· S3 Β· Lambda Β· SageMaker Β· EKS

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  DATA
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  PostgreSQL Β· MongoDB Β· Redis Β· Celery Β· Kafka

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  MATHEMATICS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Calculus Β· Linear Algebra Β· Probability Theory Β· Statistics

// how I build

  Every repo I ship clears five gates before merge:

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                                                     β”‚
  β”‚  01  WHY DOES THIS WORK?                            β”‚
  β”‚      Mathematical derivation lives in docs/         β”‚
  β”‚      No black boxes. No "trust the framework."      β”‚
  β”‚                                                     β”‚
  β”‚  02  HOW DOES IT WORK?                              β”‚
  β”‚      Scratch implementation before any library      β”‚
  β”‚      If I can't write it in NumPy, I don't use it  β”‚
  β”‚                                                     β”‚
  β”‚  03  DOES IT ACTUALLY WORK?                         β”‚
  β”‚      Benchmarks with real numbers, not vibes        β”‚
  β”‚      Tested against reference implementations       β”‚
  β”‚                                                     β”‚
  β”‚  04  WHAT BROKE?                                    β”‚
  β”‚      Post-mortems documented in docs/failures.md    β”‚
  β”‚      Failures are first-class content, not hidden   β”‚
  β”‚                                                     β”‚
  β”‚  05  CAN IT HANDLE PRODUCTION?                      β”‚
  β”‚      Failure modes mapped. Fallbacks implemented.   β”‚
  β”‚      Load tested. Circuit breakers in place.        β”‚
  β”‚                                                     β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

// open source

I contribute to the ecosystem, not just consume it. Every repo I build is designed to be forked, extended, and built on β€” with derivations others can follow, benchmarks others can reproduce, and post-mortems others can learn from.

Actively looking to contribute to:

  HuggingFace Transformers   β†’   evaluation, documentation, reproducibility
  RAGAS                      β†’   custom metrics, edge case coverage
  DeepEval                   β†’   metric implementations, CI integrations
  vLLM                       β†’   inference optimization experiments
  LangGraph                  β†’   production patterns, reliability improvements

// in the lab πŸ”

Some projects live in private repos. Some are in closed beta. Being stress-tested with real users before the world sees them.

╔═══════════════════════════════════════════════════════════╗
β•‘  🎯  CURRENT FOCUS: PRODUCTION-GRADE AI PLATFORM          β•‘
β•‘                                                           β•‘
β•‘  β€’ Multi-tenant architecture with usage metering          β•‘
β•‘  β€’ Real-user feedback loops driving model iteration       β•‘
β•‘  β€’ End-to-end observability: logs, traces, metrics        β•‘
β•‘  β€’ Auth, billing, and rate-limiting baked in from day 1   β•‘
β•‘                                                           β•‘
β•‘  Status: 🚧 Private beta Β· Invite-only Β· Real traffic    β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
πŸ” Β  Research Prototypes (not public yet)
πŸ§ͺ multimodal-data-interpreter
   β”œβ”€ PDF + Excel + images + audio β†’ unified query interface
   β”œβ”€ Natural language β†’ SQL / Python / charts
   β”œβ”€ Auto-dashboard generation with live data refresh
   └─ Scalable backend: DuckDB/Spark for >RAM datasets

πŸ§ͺ autonomous-code-reviewer
   β”œβ”€ Agentic PR analysis: bugs, perf, security, style
   β”œβ”€ Test generation + sandboxed execution
   β”œβ”€ Human-in-loop approval gates (reuse production patterns)
   └─ GitHub API integration + CI/CD hooks

πŸ§ͺ real-time-meeting-copilot
   β”œβ”€ Live transcription + action item extraction
   β”œβ”€ Post-meeting RAG: "What did John say about the deadline?"
   └─ Privacy-first: local inference + on-prem LLM fallback

These are research prototypes. If they survive benchmarking, hardening, and real-user testing β€” they'll graduate to production repos.



━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  AMMAN HUSSAIN ANSARI
  AI Engineer  Β·  MLOps  Β·  Open Source Contributor
  India  Β·  Remote  Β·  Globally Available
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Email Β  LinkedIn Β  Twitter


Pinned Loading

  1. fast-gpt-lab fast-gpt-lab Public

    ⚑ Fast-GPT-Lab A high-performance, from-scratch implementation of the GPT architecture designed to bridge the gap between theoretical deep learning and hardware-level optimization.

    Python

  2. cost-aware-llm cost-aware-llm Public

    High-performance LLM Gateway that dynamically routes requests across multiple providers using cost, latency, and reliability signals. Includes semantic caching, circuit breakers, and multi-tenant b…

    Python

  3. agentic-ai-production-system agentic-ai-production-system Public

    Research-driven Agentic AI system combining LLMs, tool-use, and workflow orchestration for autonomous reasoning and execution.

    Jupyter Notebook

  4. pure-ml pure-ml Public

    Mathematical Foundations β†’ Algorithms β†’ Neural Networks β†’ Research Engineering β€” Machine Learning implemented from scratch using NumPy.

    Jupyter Notebook