CANN: optimize the rope ops #15335

YangShuai52 · 2025-08-15T01:50:32Z

Optimize the performance of the rope operator by reusing sin_tensor and cos_tensor across different layers for each token.
Before Optimization
root@worker-33-138:/home/y00939322/rope_test/llama.cpp-master# ./build/bin/llama-bench -m /home/y00939322/qwen2.5-0.5b-instruct-fp16.gguf -p 5 -n 5 -b 1 -sm none -mg 0 -t 8 -fa 1

model	size	params	backend	ngl	threads	n_batch	sm	fa	test	t/s
qwen2 1B F16	1.17 GiB	630.17 M	CANN	99	8	1	none	1	pp5	174.29 ± 0.36
qwen2 1B F16	1.17 GiB	630.17 M	CANN	99	8	1	none	1	tg5	173.14 ± 0.53

Optimized
root@worker-33-138:/home/y00939322/rope_test/llama.cpp-rope_ops# ./build/bin/llama-bench -m /home/y00939322/qwen2.5-0.5b-instruct-fp16.gguf -p 5 -n 5 -b 1 -sm none -mg 0 -t 8 -fa 1

model	size	params	backend	ngl	threads	n_batch	sm	fa	test	t/s
qwen2 1B F16	1.17 GiB	630.17 M	CANN	99	8	1	none	1	pp5	195.08 ± 0.83
qwen2 1B F16	1.17 GiB	630.17 M	CANN	99	8	1	none	1	tg5	195.94 ± 0.71

Verifying the Operator Precision：
ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
ROPE(type=f32,ne_a=[128,40,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
ROPE(type=f32,ne_a=[128,52,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
ROPE(type=f32,ne_a=[128,64,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
ROPE(type=f32,ne_a=[64,1,2,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
ROPE(type=f32,ne_a=[64,71,2,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
ROPE(type=f32,ne_a=[64,8,2,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK

noemotiovon

Thank you for your contribution! Here are some suggestions, and I’m happy to discuss them together.

noemotiovon · 2025-08-15T02:05:53Z

ggml/src/ggml-cann/aclnn_ops.cpp


+    if(ctx.init_ptr == nullptr || !is_attention) {


Add a comment indicating that is_attention is a flag used for accuracy testing.

Thank you for your suggestion; it has been revised.

noemotiovon · 2025-08-15T02:06:58Z

ggml/src/ggml-cann/aclnn_ops.cpp

+        if(ctx.init_ptr != nullptr){
+            ACL_CHECK(aclrtFree(ctx.init_ptr));
+        }
+        ACL_CHECK(aclrtMalloc(&ctx.init_ptr,theta_scale_length * sizeof(float_t), ACL_MEM_MALLOC_HUGE_FIRST));


There is a missing space after &ctx.init_ptr,.

Thank you for your suggestion; it has been revised.

noemotiovon · 2025-08-15T02:40:19Z

ggml/src/ggml-cann/common.h

+    void* init_ptr = nullptr;
+    void* sin_ptr = nullptr;
+    void* cos_ptr = nullptr;
+    int64_t max_position_length = 200000;


A maximum prompt length of 200,000 is a bit excessive; let's initialize it to 65,536 here. And rename it to max_prompt_length.

Thank you for your suggestion; it has been revised.

YangShuai52 · 2025-08-18T00:39:16Z

@hipudding

optimize rope ops

0ff18e0

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Aug 15, 2025

noemotiovon reviewed Aug 15, 2025

View reviewed changes

YangShuai52 added 2 commits August 15, 2025 11:09

amendment

930ee57

delete trailing whitespace

682600c

Green-Sky changed the title ~~optimize the rope ops~~ CANN: optimize the rope ops Aug 15, 2025

change the variable name

170d40e

hipudding self-requested a review August 19, 2025 13:26

hipudding approved these changes Aug 19, 2025

View reviewed changes

hipudding merged commit a6d3cfe into ggml-org:master Aug 19, 2025
48 checks passed

hipudding mentioned this pull request Aug 20, 2025

[CANN] Optimize RMS_NORM using cache #15419

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CANN: optimize the rope ops #15335

CANN: optimize the rope ops #15335

Uh oh!

YangShuai52 commented Aug 15, 2025 •

edited by hipudding

Loading

Uh oh!

noemotiovon left a comment

Uh oh!

noemotiovon Aug 15, 2025

Uh oh!

YangShuai52 Aug 15, 2025

Uh oh!

noemotiovon Aug 15, 2025

Uh oh!

YangShuai52 Aug 15, 2025

Uh oh!

noemotiovon Aug 15, 2025

Uh oh!

YangShuai52 Aug 15, 2025

Uh oh!

YangShuai52 commented Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

CANN: optimize the rope ops #15335

CANN: optimize the rope ops #15335

Uh oh!

Conversation

YangShuai52 commented Aug 15, 2025 • edited by hipudding Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noemotiovon left a comment

Choose a reason for hiding this comment

Uh oh!

noemotiovon Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

YangShuai52 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

noemotiovon Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

YangShuai52 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

noemotiovon Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

YangShuai52 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

YangShuai52 commented Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

YangShuai52 commented Aug 15, 2025 •

edited by hipudding

Loading