Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
xrennvidia authored Oct 24, 2023
1 parent d13d93e commit e9f4332
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,19 @@
# collective_matmul

This unit test composes two back-to-back GEMM layers (FC1 and FC2 of LLM MLP). FC1 does AG+GEMM, and FC2 does GEMM+RS.

## Running examples

### 175B config

`python collective_matmul.py --dp 2 --tp 4`

You can change dp (Data Parallel) and tp (Tensor Model Parallel) by simply giving differen numbre to above commandline.

To run baseline (i.e., no overlapping), add `--no_tp_overlap` in the commandline.

### 5B config

`python collective_matmul.py --batch_size 4 --hidden_size 4096`

DP, TP, and overlapping arguments are configured in the same way as 175B.

0 comments on commit e9f4332

Please sign in to comment.