[MoE][PoC] model code #730

tianyu-l · 2024-12-12T00:44:54Z

Stack from ghstack (oldest at bottom):

The expert-choice MoE layer is inspired by torchtune: pytorch/torchtune#1902

Not including token-choice MoE for now.

[ghstack-poisoned]

lessw2020 · 2025-01-13T04:25:43Z

torchtitan/models/llama/model.py

-            multiple_of=model_args.multiple_of,
-            ffn_dim_multiplier=model_args.ffn_dim_multiplier,
-        )
+        self.enable_moe = model_args.enable_moe


minor nit - I think this is expressed better as self.is_dense_model and self.is_moe_model.
'Enable' is not the same thing as 'is' so I think cleaner to express it's state of being via 'is'.
In addition, I think the later checks is cleaner expressed as self.is_dense_model/layer makes for easier to read the checks vs 'not self.enable_moe', which may not later really mean it's a dense model/layer if we start enabling other arches.

lessw2020 · 2025-01-13T04:26:48Z

torchtitan/models/llama/model.py

@@ -321,14 +371,20 @@ def forward(

        """
        h = x + self.attention(self.attention_norm(x), freqs_cis)
-        out = h + self.feed_forward(self.ffn_norm(h))
+        if not self.enable_moe:


wrt to above, this is what I mean about implying it's dense but not really making it clear....is_dense_model is more precise.

I modified it to"

if self.is_dense_model: out = h + self.feed_forward(self.ffn_norm(h)) elif self.is_moe_model: out = h + self.moe(self.ffn_norm(h)) else: raise NotImplementedError("unknown model type")

in my copy of your PR..mostly showing how it reads with the is_blah_model approach.

lessw2020 · 2025-01-13T04:26:59Z

torchtitan/models/llama/model.py

        return out

    def init_weights(self):
        for norm in (self.attention_norm, self.ffn_norm):
            norm.reset_parameters()
        self.attention.init_weights(self.weight_init_std)
-        self.feed_forward.init_weights(self.weight_init_std)
+        if not self.enable_moe:


same as above

[ghstack-poisoned]

[MoE][PoC] model code

327b72b

[ghstack-poisoned]

tianyu-l mentioned this pull request Dec 12, 2024

[MoE][PoC] Expert Parallel: tp and tp2ep #731

Draft

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 12, 2024

tianyu-l mentioned this pull request Dec 12, 2024

[MoE][PoC] Expert Parallel: dp2ep #732

Draft

tianyu-l marked this pull request as draft December 12, 2024 04:09

lessw2020 reviewed Jan 13, 2025

View reviewed changes

lxww302 mentioned this pull request Jan 27, 2025

DeepSeek V3 Support #760

Open

Update

7b6223a

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE][PoC] model code #730

[MoE][PoC] model code #730

Uh oh!

tianyu-l commented Dec 12, 2024 •

edited

Loading

Uh oh!

lessw2020 Jan 13, 2025 •

edited

Loading

Uh oh!

lessw2020 Jan 13, 2025

Uh oh!

lessw2020 Jan 13, 2025

Uh oh!

lessw2020 Jan 13, 2025

Uh oh!

Uh oh!

[MoE][PoC] model code #730

Are you sure you want to change the base?

[MoE][PoC] model code #730

Uh oh!

Conversation

tianyu-l commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lessw2020 Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lessw2020 Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

lessw2020 Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

lessw2020 Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianyu-l commented Dec 12, 2024 •

edited

Loading

lessw2020 Jan 13, 2025 •

edited

Loading