collective_matmul

This unit test composes two back-to-back GEMM layers (FC1 and FC2 of LLM MLP). FC1 does AG+GEMM, and FC2 does GEMM+RS.

Running examples

python collective_matmul.py --dp 2 --tp 4

You can change dp (Data Parallel) and tp (Tensor Model Parallel) by simply giving differen numbre to above commandline.

To run baseline (i.e., no overlapping), add --no_tp_overlap in the commandline.

python collective_matmul.py --batch_size 4 --hidden_size 4096

DP, TP, and overlapping arguments are configured in the same way as 175B.

Name	Name	Last commit message	Last commit date
Latest commit xrennvidia Update README.md Oct 24, 2023 e9f4332 · Oct 24, 2023 History 4 Commits
ag_matmul	ag_matmul	init push	Aug 1, 2023
rs_matmul	rs_matmul	add optimziation_barrier to overlap P2P in RS matmul	Aug 1, 2023
.gitignore	.gitignore	init push	Aug 1, 2023
README.md	README.md	Update README.md	Oct 24, 2023
collective_matmul.py	collective_matmul.py	init push	Aug 1, 2023