npu support #342
referearn89-lang
started this conversation in
Ideas
npu support
#342
Replies: 2 comments
-
|
Qualcomm's NPU LLM is still quite outdated, lacking top-of-the-edge side models like the GEMMA4 and QWEN3.5. However, I found that Google's official Litert already supports running the SM8750 (theoretically 8850 forward compatible) running the Gemma4 E2B. On my Poco F7 Ultra, it can run at 15 tokens per second, and the heat generation is very low. However, memory usage is significant: the 4096T consumes 5GB for context, and 9GB for maxing out 128K I am trying to create files for qwen3.5 4b. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Ok thanks for your service, I have one more thing to say about the rag
system it doesn't work like I imagined it to be but I know with limited
hardware I can't expect much , so I have attached multiple files in a rav
project till I realised that one few paragraphs were fetched I mean it is
using only few paragraphs meanwhile the system prompt work just file the
rag system sims to be not working, so the only question is since your app
is exclusive to offline rag system will you be improving it,
Also can we increase the size of image generation from 512x512 to more
Thanks again
…On Mon, Jun 29, 2026, 3:00 AM maoist2009 ***@***.***> wrote:
Qualcomm's NPU LLM is still quite outdated, lacking top-of-the-edge side
models like the GEMMA4 and QWEN3.5.
However, I found that Google's official Litert already supports running
the SM8750 (theoretically 8850 forward compatible) running the Gemma4 E2B.
On my Poco F7 Ultra, it can run at 15 tokens per second, and the heat
generation is very low. However, memory usage is significant: the 4096T
consumes 5GB for context, and 9GB for maxing out 128K
I am trying to create files for qwen3.5 4b.
—
Reply to this email directly, view it on GitHub
<#342?email_source=notifications&email_token=B2IBZTPWLX5GVABATDS2NH35CGE6FA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZUGY2DGMRSUZZGKYLTN5XKMYLVORUG64VFMV3GK3TUVRTG633UMVZF6Y3MNFRWW#discussioncomment-17464322>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/B2IBZTLB2KHOAAOSE3LLSST5CGE6FAVCNFSNUABJKJSXA33TNF2G64TZHMYTCNBVGA2DIOJYGU5UI2LTMN2XG43JN5XDWMJQGAZDKNJQHCQXMAQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Sir, thank for making this application and I want to suggest you that and you already know it that many devices nowadays comes with npu for ai and snapdragon is one of them, and you already know it and you even implemented that for image generation if I'm right but if we can also use npu for text generation we can get faster reply without heating the device too much ,
That all I just wanted to suggest the npu support for text generation thank you!, sir pocket pal app have support for npu for text generation if you want inspiration
Beta Was this translation helpful? Give feedback.
All reactions