feat(examples): add FSDP #29

tpoisonooo · 2025-04-28T10:07:01Z

Content

Add an FSDP example for Muon optimizer.

Here is the difference:

Add DistributedSampler

    sampler = DistributedSampler(dataset=train_dataset,
                                 rank=rank,
                                 num_replicas=world_size,
                                 shuffle=True)

Add all reduce

dist.all_reduce(loss, op=dist.ReduceOp.AVG)

Test

loss:

GPU usage:

I am also a FSDP begginer, for any question, I would try to fix. #25

tpoisonooo · 2025-04-28T10:08:30Z

cc @toothacher17

tpoisonooo · 2025-04-28T10:10:50Z

Furthermore, toy_train_fsdp.py is compatible with toy_train.py, for single GPU, just:

export CUDA_VISIBLE_DEVICES="0"
python3 toy_train_fsdp.py

They are same.

tpoisonooo added 5 commits April 27, 2025 21:55

feat(example): add fsdp_main.py

abaa96e

feat(examples): add toy fsdp train

5c1d98f

feat(examples): update

051ccc5

update(examples): update

07930a6

typo(examples): update

cf66dc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(examples): add FSDP #29

feat(examples): add FSDP #29

Uh oh!

tpoisonooo commented Apr 28, 2025

Uh oh!

tpoisonooo commented Apr 28, 2025

Uh oh!

tpoisonooo commented Apr 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(examples): add FSDP #29

Are you sure you want to change the base?

feat(examples): add FSDP #29

Uh oh!

Conversation

tpoisonooo commented Apr 28, 2025

Content

Test

Uh oh!

tpoisonooo commented Apr 28, 2025

Uh oh!

tpoisonooo commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tpoisonooo commented Apr 28, 2025 •

edited

Loading