-
Notifications
You must be signed in to change notification settings - Fork 587
Benchmark HF optimum-executorch #11450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e4718b0
to
fff15c6
Compare
fff15c6
to
00149f2
Compare
00149f2
to
112eb2b
Compare
112eb2b
to
a38a694
Compare
a38a694
to
a0f636f
Compare
a0f636f
to
5d6dd04
Compare
@huydhn Okay, it turns out that I need to run install with |
5d6dd04
to
01ce07b
Compare
df785ca
to
8aa9c02
Compare
This is fixed. It turns out the artifacts must be moved from the ET root dir to the artifacts-to-be-uploaded, mv from other dir under ET root will end up with the above issue. |
8aa9c02
to
b0d829a
Compare
"hf_xnnpack_custom_spda_kv_cache_8da4w", | ||
"et_xnnpack_custom_spda_kv_cache_8da4w", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the difference between these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the explaination in the PR summary: hf_xnnpack_custom_spda_kv_cache_8da4w
represents the recipe used by optimum-et, et_xnnpack_custom_spda_kv_cache_8da4w
is the counterpart for etLLM.
-X \ | ||
--xnnpack-extended-ops \ | ||
-qmode 8da4w -G 32 -E 8,0 \ | ||
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these for llama_3_2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kimishpatel Yeah, for llama_3_2.
I'm seeing jobs hitting API Limits in AWS Device Farm. We lifted it for public AWS devices, @huydhn do we need to do same and separately for new devices in private pools? https://github.com/pytorch/executorch/actions/runs/15504512047 |
Benchmark LLMs from
optimum-executorch
. With all the work recently happening inoptimum-executorch
, we are able to boost the out-of-the-box performance. Putting these models on benchmark infra to gather perf numbers and understand the remaining perf gaps between the in-house generated model via export_llama.We are able to do apple-to-apple comparison for CPU backend by introducing quant, custom SPDA, custom KV cache to native Hugging Face models in
optimum-executorch
:hf_xnnpack_custom_spda_kv_cache_8da4w
represents the recipe used by optimum-et,et_xnnpack_custom_spda_kv_cache_8da4w
is the counterpart for etLLM.Here are the benchmark jobs in our infra:
Note there may be failures when running optimum-et models on-device due to lack of support HF tokenizers in llama runner. I will remove packing tokenizer.json from the .zip shortly so that the benchmark apps will take optimum-et LLMs as non-GenAI models.