fanshiqing

Follow

🎯

Focusing

Shiqing Fan fanshiqing

🎯

Focusing

Follow

NVIDIA, Senior Performance Architect, Full-stack LLM Training Optimization.

135 followers · 51 following

NVIDIA
Hangzhou, Zhejiang
https://fanshiqing.github.io/

Achievements

Achievements

Pinned Loading

Megatron-LM Megatron-LM Public

Forked from NVIDIA/Megatron-LM

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1
DAPPLE DAPPLE Public

Forked from AlibabaPAI/DAPPLE

An Efficiency Pipelined Data Parallel Approach for Large Models Training

Python 3
grouped_gemm grouped_gemm Public

Forked from tgale96/grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 113 34
TransformerEngine TransformerEngine Public

Forked from NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python