Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: NVIDIA/cudnn-frontend
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: f6266a9e2a4f699ca7714b99aa76bd9fea7862c3
Choose a base ref
...
head repository: NVIDIA/cudnn-frontend
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 91b7532f3386768bba4f444ee7672b497f34da8a
Choose a head ref
  • 1 commit
  • 112 files changed
  • 1 contributor

Commits on Jan 28, 2025

  1. # cudnn frontend v1.10 release notes (#126)

    cudnn frontend v1.10 is the preferred cudnn frontend to be used for
    cudnn backend 9.7.0 and later as it adds to the Blackwell specific
    features.
    ## New API
    - cudnn Frontend v1.10 introduces two new operators,
    block_scale_quantize and block_scale_dequantize to specify the scaling
    and de-scaling of low precision datatypes supported from Blackwell GPU
    onwards.
    
    - `create_execution_plan(int64_t const engine_id,
    std::unordered_map<KnobType_t, int64_t> const &knobs)` allows creation
    of a custom execution plan with hardcoded engine and knobs. Added a
    sample in `samples/cpp/misc/custom_plan.cpp` to showcase how to work
    with different `Engine` and `Knobs`.
    
    ## Improvements
    - Users can now query behavior notes of a particular execution plan
    using `get_behavior_notes(std::vector<BehaviorNote_t> &notes) const` and
    `get_behavior_notes_for_plan_at_index(int64_t const index,
    std::vector<BehaviorNote_t> &notes) const` functions.
    
    - SDPA operations now accept both left window and right window size with
    respect to diagonal. See Attention.md for more details.
    
    - SDPA operations now accept a diagonal alignment for the Attention
    score matrix to be used describe the above window. When `s_q != s_kv`,
    and causal mask is on this can be used to specify if the diagonal is top
    left or bottom right.
    
    - Bottom right causal masking can now be enabled on the sdpa_fp8
    operation.
    
    ## Bug fixes
    - Fixed a regression in cuDNN FrontEnd v1.9.0 where the softmax node
    would override user-set dims and strides for softmax_stats and m_zinv.
    This also affected sdpa_forward and sdpa_fp8_forward node
    
    ## New samples
    - Added an example to showcase how native cuda graphs can be constructed
    from the SDPA operation graph.
    Anerudhan authored Jan 28, 2025
    Copy the full SHA
    91b7532 View commit details
Loading