Causal mask ignored in DotProductAttention #1524

anthony-Neo · 2025-02-28T09:13:46Z

for transformer_engine version 1.14.0+87fbe812f
in transformer_engine.jax.flax.module
in Softmax class

in call line 191-198:

# For the case that self.softmax == SoftmaxType.SCALED_UPPER_TRIANG_MASKED
# and kernel is unavailable, then try on pure scaled softmax custom calls.
if is_softmax_kernel_available(
    SoftmaxType.SCALED, batch, heads, q_seqlen, k_seqlen, dtype
):
    outputs = softmax(logits, None, self.scale_factor, SoftmaxType.SCALED)
else:
    outputs = jax_nn.softmax(logits * self.scale_factor)  # <- self.softmax_type ignored

after else, self.softmax_type is ignored and no causal attention is performed when e.g. self.softmax_type = SoftmaxType.SCALED_UPPER_TRIANG_MASKED

The text was updated successfully, but these errors were encountered:

philip-essential · 2025-03-04T21:22:03Z

I just ran into this as well. At least for unfused attention + no softmax kernel available (that's the only configuration i've tried), it doesn't apply any kind of causal mask.

ksivaman assigned phu0ngng Feb 28, 2025

phu0ngng assigned KshitijLakhani Feb 28, 2025

phu0ngng added the good first issue Good for newcomers label Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Causal mask ignored in DotProductAttention #1524

Causal mask ignored in DotProductAttention #1524

anthony-Neo commented Feb 28, 2025 •

edited

Loading

philip-essential commented Mar 4, 2025

Causal mask ignored in DotProductAttention #1524

Causal mask ignored in DotProductAttention #1524

Comments

anthony-Neo commented Feb 28, 2025 • edited Loading

philip-essential commented Mar 4, 2025

anthony-Neo commented Feb 28, 2025 •

edited

Loading