Skip to content

[ONNX] Support for grouped query attention #151762

@cyanic-selkie

Description

@cyanic-selkie

🚀 The feature, motivation and pitch

Hi, when using enabled_gqa with scaled_dot_product_attention, the ONNX export fails - this is documented.

However, since QGA is very popular currently, and the Attention ONNX op already supports it, I was wondering if there is any plan to add support for it in the exporter, and if so, how soon, thanks.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

module: onnxRelated to torch.onnxtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions