Skip to content

Ganeshship/awesome-on-device-AI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 

Repository files navigation

Welcome to Awesome On-device AI

Awesome PRs Welcome

A curated list of awesome projects and papers for AI on Mobile/IoT/Edge devices. Everything is continuously updating. Welcome contributions! Feel free to add not just papers, but also new sections, or adjust the existing content.

Contents

Papers/Tutorial

1. Training on Devices

1.1 Memory Efficient Training

  • [ICML'22] POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging. by Patil et al. [paper]
  • [NeruIPS'22] On-Device Training Under 256KB Memory. by Ji Lin, Song Han et al. [paper]
  • [MobiSys'22] Melon: breaking the memory wall for resource-efficient on-device machine learning. by Qipeng Wang et al. [paper]
  • [MobiSys'22] Sage: Memory-efficient DNN Training on Mobile Devices. by In Gim et al. 2022 [paper]

1.2 Training Acceleration

  • [NeurIPS'25] LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades.
  • [SenSys'24] PieBridge: Fast and Parameter-Efficient On-Device Training via Proxy Networks. By Wangsong Yin.
  • [MobiCom'22] Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading. by Daliang Xu et al. [paper]

1.3 Training on Mobile Cluster

  • [TPDS'26] Resource-Efficient Personal Large Language Models Fine-Tuning with Collaborative Edge Computing. by Shengyuan Ye et al. [paper]
  • [ASPLOS'24] SoCFlow: Efficient and Scalable DNN Training on SoC-Clustered Edge Servers. by Daliang Xu et al.
  • [ATC'24] More is Different: Prototyping and Analyzing a New Form of Edge Server with Massive Mobile SoCs. by Li Zhang et al.
  • [ATC'24] High-density Mobile Cloud Gaming on Edge SoC Clusters. by Li Zhang et al.
  • [ATC'24] FwdLLM: Efficient Federated Finetuning of Large Language Models with Perturbed Inferences. by Mengwei Xu et al.
  • [WWW'24] Towards Energy-efficient Federated Learning via INT8-based Training on Mobile DSPs. by Jinliang Yuan et al.
  • [ICPP'24] Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning. by Bei Ouyang et al. [paper]
  • [MobiCom'24] Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices. by Shengyuan Ye et al. [paper]
  • [MobiCom'23] Federated Few-shot Learning for Mobile NLP. by Dongqi Cai et al.
  • [MobiCom'23] Efficient Federated Learning for Modern NLP. by Dongqi Cai et al.
  • [ICPP'22] Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training. by Shengyuan Ye et al. [paper] [code]
  • [SEC'21] EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment. by Pengzhan Hao et al. [paper]
  • [MobiSys'21 Workshop] Towards Ubiquitous Learning: A First Measurement of On-Device Training Performance. by Dongqi Chai, Mengwei Xu et al. [paper]

2. Inference on Devices

2.1 Collaborative Inference

  • [TMC'25] Resource-Efficient Collaborative Edge Transformer Inference with Hybrid Model Parallelism. by Shengyuan Ye et al. [paper]
  • [INFOCOM'25] Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices. by Shengyuan Ye et al. [paper]
  • [ICCAD'25] Mitigating Resource Contention for Responsive On-device Machine Learning Inferences. by Minsung Kim et al. [paper]
  • [INFOCOM'24] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference. by Shengyuan Ye et al. [paper]
  • [ICSOC'23] Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors. by Daliang Xu et al.
  • [MobiSys'23] NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors. by USTC & Microsoft. [paper]
  • [MobiSys'22] CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices. by Fucheng Jia et al. [paper]
  • [InfoCom'22] Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. by Chenghao hu et al. [paper]
  • [TON'20] Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices. by Liekang Zeng et al. [paper]
  • [ICCD'20] A distributed in-situ CNN inference system for IoT applications. by Jiangsu Du et al. [paper]
  • [TPDS'20] Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure. by Jiangsu Du et al. [paper]
  • [EuroSys'19] μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization. by Youngsok Kim et al. [paper]
  • [TCAD'18] DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. by zhuoran Zhao et al. [paper]
  • [DATE'17] Modnn: Local distributed mobile computing system for deep neural network. by Jiachen Mao et al. [paper]

2.2 Inference Acceleration

  • [ASPLOS'25] Fast On-device LLM Inference with NPUs. by Daliang Xu et al.
  • [TMC'24] EdgeLLM: Fast On-Device LLM Inference With Speculative Decoding. by Daliang Xu et al.
  • [MobiCom'24] FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices. by Xiangyu Li et al.
  • [MobiSys'24] Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization.
  • [MobiSys'23] Boosting DNN Cold Inference on Edge Devices. by Rongjie Yi et al.
  • [MobiSys'23] ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU. by Shanghai Jiao Tong University [paper]
  • [MobiSys'22] Band: coordinated multi-DNN inference on heterogeneous mobile processors. by Seoul National University et al. [paper]
  • [MobiSys'21] nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices. by Li Lyna Zhang et al. [paper]

3. Mobile AI Applications

3.1 Mobile GUI Agent

  • [ICLR'26] UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning.
  • [MobiSys'25] AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation. by Hao Wen et al.
  • [UIST'24] LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation.
  • [Arxiv'24] MobileViews: A Large-Scale Mobile GUI Dataset.
  • [MobiCom'24] AutoDroid: LLM-powered Task Automation in Android. by Hao Wen et al.

3.2 Mobile Visual and Multimodal Tasks

  • [INFOCOM'26] Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding. by Shengyuan Ye et al. [paper]
  • [NC'25] Ubiquitous Memory Augmentation via Mobile Multimodal Embedding System. by Dongqi Cai et al.
  • [NSDI'25] Region-based Content Enhancement for Efficient Video Analytics at the Edge.
  • [Arxiv'24] InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
  • [Arxiv'23] MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices.
  • [MobiSys'22] Approximate Query Service on Autonomous IoT Cameras. [paper]
  • [MobiCom'18] DeepCache: Principled Cache for Mobile Deep Vision. by Mengwei Xu et al. [paper]

3.3 Mobile NLP/Speech

  • [NeurIPS'24] SILENCE: Protecting Privacy in Offloaded Speech Understanding on Wimpy Devices. by Dongqi Cai et al.
  • [Ubicomp'18] DeepType: On-Device Deep Learning for Input Personalization Service with Minimal Privacy Concern. by Mengwei Xu et al. [paper]
  • [Arxiv 2018] Federated learning for mobile keyboard prediction. by Google [paper]

4. Survey and Tutorial

  • [Arxiv'24] Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security. by Yuanchun Li.
  • [CSUR'24] A Survey of Resource-efficient LLM and Multimodal Foundation Models. by Mengwei Xu et al. [paper]
  • [CVPR'23 Tutorial] Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments. by Snap Research [paper]

Open Source Projects

1. DL Framework on Mobile

  • mllm: Fast Multimodal LLM on Mobile Devices. by BUPT Team. [code]
  • Tensorflow Lite: Deploy machine learning models on mobile and edge devices. by Google. [code]
  • TensorflowJS: A WebGL accelerated JavaScript library for training and deploying ML models. by Google. [code]
  • MNN: A Universal and Efficient Inference Engine. by Alibaba. [code]
  • TensorRT: A C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. by Nvidia. [code]
  • TVM: Open deep learning compiler stack for cpu, gpu and specialized accelerators. by Tianqi Chen et al. [code]
  • MACE: a deep learning inference framework optimized for mobile heterogeneous computing platforms. by XiaoMi. [code]
  • NCNN: a high-performance neural network inference framework optimized for the mobile platform. by Tencent. [code]

2. Audio

  • FluidAudio: Local audio AI SDK for Apple platforms with ASR, speaker diarization, VAD, and TTS. Optimized for Apple Neural Engine. by FluidInference. [code]

3. Mobile LLM Apps

  • Airgap: Open-source React Native framework for on-device, offline-first customer support chatbots. Runs Gemma 4 E2B locally via llama.rn (the React Native binding of llama.cpp). Seven industry templates (telco, retail, healthcare, banking, education, insurance, airlines) ship in the repo. by Xavier Puspus. [code]

Contribute

All contributions to this repository are welcome. Open an issue or send a pull request.

About

A curated list of awesome projects and papers for AI on Mobile/IoT/Edge devices. Everything is continuously updating. Welcome contribution!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors