thd format is not supported with hierarchical CP implementation yet

**Is your feature request related to a problem? Please describe.**
`ulysess sp + ring attention` gives a good performance in SFT/RL training, which is called `hierarchical CP` here. But it doesn't support qkv_format 'thd' for packing now. Packing sequence is also a way to gain a good throughput. 
```
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformer_engine/pytorch/attention/dot_product_attention/backends.py", line 659, in forward
[rank0]:     output = attn_forward_func_with_cp(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py", line 3619, in attn_forward_func_with_cp
[rank0]:     out = AttnFuncWithCPAndKVP2P.apply(*args)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 575, in apply
[rank0]:     return super().apply(*args, **kwargs)  # type: ignore[misc]
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py", line 469, in forward
[rank0]:     qkv_format != "thd"
[rank0]: AssertionError: thd format is not supported with hierarchical CP implementation yet!
```
platform H800
pytorch 2.7
megatron-lm branch core_r0.13.0
transformer_engine 2.4.0
**Describe the solution you'd like**

I'm not very clear for now.

**Describe alternatives you've considered**

Closing packing maybe solves the error, but it will influence loss convergence.

**Additional context**

Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

thd format is not supported with hierarchical CP implementation yet #2208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

thd format is not supported with hierarchical CP implementation yet #2208

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions