npu support #342

referearn89-lang · 2026-05-08T16:02:39Z

referearn89-lang
May 8, 2026

Sir, thank for making this application and I want to suggest you that and you already know it that many devices nowadays comes with npu for ai and snapdragon is one of them, and you already know it and you even implemented that for image generation if I'm right but if we can also use npu for text generation we can get faster reply without heating the device too much ,
That all I just wanted to suggest the npu support for text generation thank you!, sir pocket pal app have support for npu for text generation if you want inspiration

maoist2009 · 2026-06-28T21:29:50Z

maoist2009
Jun 28, 2026

Qualcomm's NPU LLM is still quite outdated, lacking top-of-the-edge side models like the GEMMA4 and QWEN3.5.

However, I found that Google's official Litert already supports running the SM8750 (theoretically 8850 forward compatible) running the Gemma4 E2B. On my Poco F7 Ultra, it can run at 15 tokens per second, and the heat generation is very low. However, memory usage is significant: the 4096T consumes 5GB for context, and 9GB for maxing out 128K

I am trying to create files for qwen3.5 4b.

0 replies

referearn89-lang · 2026-06-29T00:18:43Z

referearn89-lang
Jun 29, 2026
Author

Ok thanks for your service, I have one more thing to say about the rag system it doesn't work like I imagined it to be but I know with limited hardware I can't expect much , so I have attached multiple files in a rav project till I realised that one few paragraphs were fetched I mean it is using only few paragraphs meanwhile the system prompt work just file the rag system sims to be not working, so the only question is since your app is exclusive to offline rag system will you be improving it, Also can we increase the size of image generation from 512x512 to more Thanks again

…

On Mon, Jun 29, 2026, 3:00 AM maoist2009 ***@***.***> wrote: Qualcomm's NPU LLM is still quite outdated, lacking top-of-the-edge side models like the GEMMA4 and QWEN3.5. However, I found that Google's official Litert already supports running the SM8750 (theoretically 8850 forward compatible) running the Gemma4 E2B. On my Poco F7 Ultra, it can run at 15 tokens per second, and the heat generation is very low. However, memory usage is significant: the 4096T consumes 5GB for context, and 9GB for maxing out 128K I am trying to create files for qwen3.5 4b. — Reply to this email directly, view it on GitHub <#342?email_source=notifications&email_token=B2IBZTPWLX5GVABATDS2NH35CGE6FA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZUGY2DGMRSUZZGKYLTN5XKMYLVORUG64VFMV3GK3TUVRTG633UMVZF6Y3MNFRWW#discussioncomment-17464322>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/B2IBZTLB2KHOAAOSE3LLSST5CGE6FAVCNFSNUABJKJSXA33TNF2G64TZHMYTCNBVGA2DIOJYGU5UI2LTMN2XG43JN5XDWMJQGAZDKNJQHCQXMAQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Off Grid AI

npu support #342

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Off Grid AI

npu support #342

Uh oh!

referearn89-lang May 8, 2026

Replies: 2 comments

Uh oh!

maoist2009 Jun 28, 2026

Uh oh!

referearn89-lang Jun 29, 2026 Author

referearn89-lang
May 8, 2026

maoist2009
Jun 28, 2026

referearn89-lang
Jun 29, 2026
Author