Welcome to Awesome On-device AI

A curated list of awesome projects and papers for AI on Mobile/IoT/Edge devices. Everything is continuously updating. Welcome contributions! Feel free to add not just papers, but also new sections, or adjust the existing content.

Papers/Tutorial

1. Training on Devices

1.1 Memory Efficient Training

[ICML'22] POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging. by Patil et al. [paper]
[NeruIPS'22] On-Device Training Under 256KB Memory. by Ji Lin, Song Han et al. [paper]
[MobiSys'22] Melon: breaking the memory wall for resource-efficient on-device machine learning. by Qipeng Wang et al. [paper]
[MobiSys'22] Sage: Memory-efficient DNN Training on Mobile Devices. by In Gim et al. 2022 [paper]

1.2 Training Acceleration

[NeurIPS'25] LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades.
[SenSys'24] PieBridge: Fast and Parameter-Efficient On-Device Training via Proxy Networks. By Wangsong Yin.
[MobiCom'22] Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading. by Daliang Xu et al. [paper]

1.3 Training on Mobile Cluster

[TPDS'26] Resource-Efficient Personal Large Language Models Fine-Tuning with Collaborative Edge Computing. by Shengyuan Ye et al. [paper]
[ASPLOS'24] SoCFlow: Efficient and Scalable DNN Training on SoC-Clustered Edge Servers. by Daliang Xu et al.
[ATC'24] More is Different: Prototyping and Analyzing a New Form of Edge Server with Massive Mobile SoCs. by Li Zhang et al.
[ATC'24] High-density Mobile Cloud Gaming on Edge SoC Clusters. by Li Zhang et al.
[ATC'24] FwdLLM: Efficient Federated Finetuning of Large Language Models with Perturbed Inferences. by Mengwei Xu et al.
[WWW'24] Towards Energy-efficient Federated Learning via INT8-based Training on Mobile DSPs. by Jinliang Yuan et al.
[ICPP'24] Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning. by Bei Ouyang et al. [paper]
[MobiCom'24] Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices. by Shengyuan Ye et al. [paper]
[MobiCom'23] Federated Few-shot Learning for Mobile NLP. by Dongqi Cai et al.
[MobiCom'23] Efficient Federated Learning for Modern NLP. by Dongqi Cai et al.
[ICPP'22] Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training. by Shengyuan Ye et al. [paper] [code]
[SEC'21] EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment. by Pengzhan Hao et al. [paper]
[MobiSys'21 Workshop] Towards Ubiquitous Learning: A First Measurement of On-Device Training Performance. by Dongqi Chai, Mengwei Xu et al. [paper]

2. Inference on Devices

2.1 Collaborative Inference

[TMC'25] Resource-Efficient Collaborative Edge Transformer Inference with Hybrid Model Parallelism. by Shengyuan Ye et al. [paper]
[INFOCOM'25] Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices. by Shengyuan Ye et al. [paper]
[ICCAD'25] Mitigating Resource Contention for Responsive On-device Machine Learning Inferences. by Minsung Kim et al. [paper]
[INFOCOM'24] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference. by Shengyuan Ye et al. [paper]
[ICSOC'23] Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors. by Daliang Xu et al.
[MobiSys'23] NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors. by USTC & Microsoft. [paper]
[MobiSys'22] CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices. by Fucheng Jia et al. [paper]
[InfoCom'22] Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. by Chenghao hu et al. [paper]
[TON'20] Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices. by Liekang Zeng et al. [paper]
[ICCD'20] A distributed in-situ CNN inference system for IoT applications. by Jiangsu Du et al. [paper]
[TPDS'20] Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure. by Jiangsu Du et al. [paper]
[EuroSys'19] μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization. by Youngsok Kim et al. [paper]
[TCAD'18] DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. by zhuoran Zhao et al. [paper]
[DATE'17] Modnn: Local distributed mobile computing system for deep neural network. by Jiachen Mao et al. [paper]

2.2 Inference Acceleration

[ASPLOS'25] Fast On-device LLM Inference with NPUs. by Daliang Xu et al.
[TMC'24] EdgeLLM: Fast On-Device LLM Inference With Speculative Decoding. by Daliang Xu et al.
[MobiCom'24] FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices. by Xiangyu Li et al.
[MobiSys'24] Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization.
[MobiSys'23] Boosting DNN Cold Inference on Edge Devices. by Rongjie Yi et al.
[MobiSys'23] ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU. by Shanghai Jiao Tong University [paper]
[MobiSys'22] Band: coordinated multi-DNN inference on heterogeneous mobile processors. by Seoul National University et al. [paper]
[MobiSys'21] nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices. by Li Lyna Zhang et al. [paper]

3. Mobile AI Applications

3.1 Mobile GUI Agent

[ICLR'26] UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning.
[MobiSys'25] AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation. by Hao Wen et al.
[UIST'24] LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation.
[Arxiv'24] MobileViews: A Large-Scale Mobile GUI Dataset.
[MobiCom'24] AutoDroid: LLM-powered Task Automation in Android. by Hao Wen et al.

3.2 Mobile Visual and Multimodal Tasks

[INFOCOM'26] Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding. by Shengyuan Ye et al. [paper]
[NC'25] Ubiquitous Memory Augmentation via Mobile Multimodal Embedding System. by Dongqi Cai et al.
[NSDI'25] Region-based Content Enhancement for Efficient Video Analytics at the Edge.
[Arxiv'24] InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
[Arxiv'23] MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices.
[MobiSys'22] Approximate Query Service on Autonomous IoT Cameras. [paper]
[MobiCom'18] DeepCache: Principled Cache for Mobile Deep Vision. by Mengwei Xu et al. [paper]

3.3 Mobile NLP/Speech

[NeurIPS'24] SILENCE: Protecting Privacy in Offloaded Speech Understanding on Wimpy Devices. by Dongqi Cai et al.
[Ubicomp'18] DeepType: On-Device Deep Learning for Input Personalization Service with Minimal Privacy Concern. by Mengwei Xu et al. [paper]
[Arxiv 2018] Federated learning for mobile keyboard prediction. by Google [paper]

4. Survey and Tutorial

[Arxiv'24] Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security. by Yuanchun Li.
[CSUR'24] A Survey of Resource-efficient LLM and Multimodal Foundation Models. by Mengwei Xu et al. [paper]
[CVPR'23 Tutorial] Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments. by Snap Research [paper]

Open Source Projects

1. DL Framework on Mobile

mllm: Fast Multimodal LLM on Mobile Devices. by BUPT Team. [code]
Tensorflow Lite: Deploy machine learning models on mobile and edge devices. by Google. [code]
TensorflowJS: A WebGL accelerated JavaScript library for training and deploying ML models. by Google. [code]
MNN: A Universal and Efficient Inference Engine. by Alibaba. [code]
TensorRT: A C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. by Nvidia. [code]
TVM: Open deep learning compiler stack for cpu, gpu and specialized accelerators. by Tianqi Chen et al. [code]
MACE: a deep learning inference framework optimized for mobile heterogeneous computing platforms. by XiaoMi. [code]
NCNN: a high-performance neural network inference framework optimized for the mobile platform. by Tencent. [code]

2. Audio

FluidAudio: Local audio AI SDK for Apple platforms with ASR, speaker diarization, VAD, and TTS. Optimized for Apple Neural Engine. by FluidInference. [code]

3. Mobile LLM Apps

Airgap: Open-source React Native framework for on-device, offline-first customer support chatbots. Runs Gemma 4 E2B locally via llama.rn (the React Native binding of llama.cpp). Seven industry templates (telco, retail, healthcare, banking, education, insurance, airlines) ship in the repo. by Xavier Puspus. [code]

Contribute

All contributions to this repository are welcome. Open an issue or send a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to Awesome On-device AI

Contents

Papers/Tutorial

1. Training on Devices

1.1 Memory Efficient Training

1.2 Training Acceleration

1.3 Training on Mobile Cluster

2. Inference on Devices

2.1 Collaborative Inference

2.2 Inference Acceleration

3. Mobile AI Applications

3.1 Mobile GUI Agent

3.2 Mobile Visual and Multimodal Tasks

3.3 Mobile NLP/Speech

4. Survey and Tutorial

Open Source Projects

1. DL Framework on Mobile

2. Audio

3. Mobile LLM Apps

Contribute

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Welcome to Awesome On-device AI

Contents

Papers/Tutorial

1. Training on Devices

1.1 Memory Efficient Training

1.2 Training Acceleration

1.3 Training on Mobile Cluster

2. Inference on Devices

2.1 Collaborative Inference

2.2 Inference Acceleration

3. Mobile AI Applications

3.1 Mobile GUI Agent

3.2 Mobile Visual and Multimodal Tasks

3.3 Mobile NLP/Speech

4. Survey and Tutorial

Open Source Projects

1. DL Framework on Mobile

2. Audio

3. Mobile LLM Apps

Contribute

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages