- Ph.D. University of Illinois at Urbana-Champaign, Theoretical Physics
- B.S. Beijing Normal University, Physics
Amazon Special Projects
-
Spearheaded a cross-functional team of scientists and engineers, delivering a comprehensive protein foundation model evaluation framework within one month; Orchestrated the flawless execution of thousands of AWS
SageMakerjobs, ensuring reliability and on‑time delivery of evaluation results in an exceptionally tight schedule -
Designed and maintained a large‑scale
LLMtraining framework built onMegatron‑LM, enabling the ML team to train protein language models up to 30B parameters, achievingMFU> 30 % -
Developed an efficient nd-parallel distributed training framework (
DP,TP,FSDP2withSAC) in nativePyTorchand trained a family of protein encoder models where the 1B model achievedMFU> 50% and exceeded the performance ofESM2-3BandESM2-15Bon more than half ofBioMapbenchmarks -
Applied the
PyTorchtraining framework to Llama models and performed an ablation study that demonstrated the critical role of protein‑sequence diversity in training data
-
Innovated a latent‑space diffusion model that generated the top‑performing molecule among millions of candidates, propelling the compound into clinical trials
-
Invented a protein design optimization method leveraging fitness model gradients and Hamiltonian dynamics, achieving top-ranking performance across 2024 ML-guided protein engineering cycles — including 95% fitness gain across all 122 targets, 331% increase in average target coverage, and 100% of targets showing statistically significant fitness gains across three distinct design regions, consistently outperforming baseline and competing optimizers
-
Engineered a design aggregation and budget allocation algorithm that deduplicated submissions across all design algorithms and assigned fair, balanced budgets to each contributor — reducing runtime from 3 days to under 1 hour while correcting previously unbalanced allocations
-
Led ML-to-lab handoff for ML-guided protein engineering cycles, overseeing reverse translation and codon optimization to ensure smooth, successful transitions from computational design to experimental build phase
-
Led an internship project culminating in the publication AffinityFlow: Guided Flows for Antibody Affinity Maturation
Clarifai Inc.
- Detection and tracking
- Image embedding
- Fine-grained classification
- Semantic segmentation
- Token classification
- Named Entity Recognition (NER)
- Docker
- Kubernetes and Kubeflow
- CI/CD
Insight Data Science
- Satellite image processing
- Partial convolution inpainting model
- Repo and project page
University of Illinois at Urbana-Champaign
- Exact solution of heat equation in a high-dimensional sphere
- Applied to SVM document classification
- Publication and additional proof
- Effective Dissimilarity Transformation
- Applied to cell line clustering
- Publication
- Reinterpreted spectral clustering through the lens of quantum tunneling
- Publication
-
- Native PyTorch distributed training toolkit
-
- Repo-free experiment containerization
- Build, test, and push
Dockerimages all inPython - Reuse experiement images through
Pythonclass inheritance
-
Kubeflowis very much likeTensorflow(of course they share the last name)Mariois theKeraorPyTorchofKubeflow- Users can intuitively build pipelines without lengthy and confusing API bureaucracy
-
Machine Learning Blog
Theoretical Physics | Mathematics | Probability | Statistical Learning
Protein Engineering | ML-Driven Drug Discovery | Protein Foundation Model
PyTorch | Distributed Training | Diffusion Model | LLM
Classification | Detection | Tracking | ReID | Segmentation
Token Classification | Named Entity Recognition | Transformers
Nvidia Triton | Apple CoreML | Google Protobuf
Docker | Kubeflow | Kubernetes | CI | CD

