ML Engineer focused on production GenAI, LLM applications, and MLOps.
I build reliable AI systems that move from experimentation to real-world deployment, with emphasis on retrieval quality, agent reliability, and observable ML infrastructure.
- Build and deploy end-to-end LLM systems: ingestion, retrieval, orchestration, serving, monitoring
- Design RAG pipelines with hybrid retrieval, reranking, and evaluation-driven optimization
- Develop agentic workflows with tool use, memory, planning, and guardrails
- Productionize ML with CI/CD, containerized deployment, and cloud-native operations
- Apply strong engineering standards: testing, observability, reproducibility, and security
- Prompt engineering, model adaptation, and response-quality optimization
- RAG architecture design with chunking, indexing, reranking, and query routing
- Agent frameworks for multi-step task execution and tool orchestration
- LLM evaluation pipelines for quality, latency, and cost trade-offs
- Multi-modal pipelines for text, image, and audio use cases
- NLP: Transformers, embedding systems, semantic search, information extraction
- Computer vision: detection, segmentation, and classification workflows
- Model lifecycle: training, validation, packaging, and deployment
- Inference optimization: batching, quantization, caching, and distillation
- Containerized deployment with autoscaling and service reliability practices
- CI/CD for ML and backend systems with automated checks and rollout safety
- Observability: logs, metrics, traces, and model/data drift monitoring
- Infrastructure as code with reproducible environments and versioned pipelines
- Data and experiment management for auditable model development
- Multi-agent orchestration with dynamic tool selection and task planning
- Context-aware agents with conversation memory and stateful workflows
- Tool-augmented LLM systems for automation, analytics, and developer productivity
- End-to-end pipelines for ingestion, indexing, retrieval, reranking, and response synthesis
- Hybrid retrieval stacks combining dense and sparse strategies
- Evaluation loops to improve answer quality, grounding, and hallucination resistance
- Real-time inference APIs for latency-sensitive workloads
- Batch prediction pipelines for high-volume offline processing
- Vision and NLP applications deployed as reliable services
- Automated training, validation, and release workflows
- Model and data monitoring with drift and performance tracking
- Scalable serving systems with reliability, rollback, and cost controls
- Reliability: strong testing, robust error handling, graceful degradation
- Scalability: horizontal scaling, caching, efficient resource utilization
- Observability: actionable metrics, structured logging, distributed tracing
- Reproducibility: versioned code/data/models and deterministic workflows
- Security: authentication, rate limiting, and secure secret management
- Cost awareness: right-sized infrastructure and efficient inference patterns
- Email: [email protected]
- Email: [email protected]
- LinkedIn: linkedin.com/in/reeth-jain-rj777
Building AI systems that scale from prototype to production.