Skip to content
View ChenchaoZhao's full-sized avatar

Block or report ChenchaoZhao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ChenchaoZhao/README.md

Chenchao Zhao

Education

  • Ph.D. University of Illinois at Urbana-Champaign, Theoretical Physics
  • B.S. Beijing Normal University, Physics

Experience

Applied Scientist

Amazon Special Projects

Foundation Models

  • Spearheaded a cross-functional team of scientists and engineers, delivering a comprehensive protein foundation model evaluation framework within one month; Orchestrated the flawless execution of thousands of AWS SageMaker jobs, ensuring reliability and on‑time delivery of evaluation results in an exceptionally tight schedule

  • Designed and maintained a large‑scale LLM training framework built on Megatron‑LM, enabling the ML team to train protein language models up to 30B parameters, achieving MFU> 30 %

  • Developed an efficient nd-parallel distributed training framework (DP, TP, FSDP2 with SAC) in native PyTorch and trained a family of protein encoder models where the 1B model achieved MFU > 50% and exceeded the performance of ESM2-3B and ESM2-15B on more than half of BioMap benchmarks

  • Applied the PyTorch training framework to Llama models and performed an ablation study that demonstrated the critical role of protein‑sequence diversity in training data

ML-Guided Protein Engineering

  • Innovated a latent‑space diffusion model that generated the top‑performing molecule among millions of candidates, propelling the compound into clinical trials

  • Invented a protein design optimization method leveraging fitness model gradients and Hamiltonian dynamics, achieving top-ranking performance across 2024 ML-guided protein engineering cycles — including 95% fitness gain across all 122 targets, 331% increase in average target coverage, and 100% of targets showing statistically significant fitness gains across three distinct design regions, consistently outperforming baseline and competing optimizers

  • Engineered a design aggregation and budget allocation algorithm that deduplicated submissions across all design algorithms and assigned fair, balanced budgets to each contributor — reducing runtime from 3 days to under 1 hour while correcting previously unbalanced allocations

  • Led ML-to-lab handoff for ML-guided protein engineering cycles, overseeing reverse translation and codon optimization to ensure smooth, successful transitions from computational design to experimental build phase

  • Led an internship project culminating in the publication AffinityFlow: Guided Flows for Antibody Affinity Maturation

Senior Deep Learning Research Scientist

Clarifai Inc.

Computer Vision

  • Detection and tracking
  • Image embedding
  • Fine-grained classification
  • Semantic segmentation

Natural Languange Processing

DevOps

  • Docker
  • Kubernetes and Kubeflow
  • CI/CD

AI Fellowship

Insight Data Science

  • Satellite image processing
  • Partial convolution inpainting model
  • Repo and project page

Graduate Researcher

University of Illinois at Urbana-Champaign

Heat Kernel

Data Transformation

  • Effective Dissimilarity Transformation
  • Applied to cell line clustering
  • Publication

Spectral Clustering

  • Reinterpreted spectral clustering through the lens of quantum tunneling
  • Publication

Project Highlights

  • Torch Forward and Backward

    • Native PyTorch distributed training toolkit
  • MasonJar

    • Repo-free experiment containerization
    • Build, test, and push Docker images all in Python
    • Reuse experiement images through Python class inheritance
  • Mario

    • Kubeflow is very much like Tensorflow (of course they share the last name)
    • Mario is the Kera or PyTorch of Kubeflow
    • Users can intuitively build pipelines without lengthy and confusing API bureaucracy
  • Machine Learning Blog

Skills

Math

Theoretical Physics | Mathematics | Probability | Statistical Learning

Biology

Protein Engineering | ML-Driven Drug Discovery | Protein Foundation Model

Deep Learning

PyTorch | Distributed Training | Diffusion Model | LLM

Computer Vision

Classification | Detection | Tracking | ReID | Segmentation

NLP

Token Classification | Named Entity Recognition | Transformers

Deployment

Nvidia Triton | Apple CoreML | Google Protobuf

DevOps

Docker | Kubeflow | Kubernetes | CI | CD

Pinned Loading

  1. torch-forward-and-backward torch-forward-and-backward Public

    PyTorch distributed training for educational purposes

    Python 1