A curated list of awesome projects and papers for AI on Mobile/IoT/Edge devices. Everything is continuously updating. Welcome contributions! Feel free to add not just papers, but also new sections, or adjust the existing content.
- [ICML'22] POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging. by Patil et al. [paper]
- [NeruIPS'22] On-Device Training Under 256KB Memory. by Ji Lin, Song Han et al. [paper]
- [MobiSys'22] Melon: breaking the memory wall for resource-efficient on-device machine learning. by Qipeng Wang et al. [paper]
- [MobiSys'22] Sage: Memory-efficient DNN Training on Mobile Devices. by In Gim et al. 2022 [paper]
- [NeurIPS'25] LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades.
- [SenSys'24] PieBridge: Fast and Parameter-Efficient On-Device Training via Proxy Networks. By Wangsong Yin.
- [MobiCom'22] Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading. by Daliang Xu et al. [paper]
- [TPDS'26] Resource-Efficient Personal Large Language Models Fine-Tuning with Collaborative Edge Computing. by Shengyuan Ye et al. [paper]
- [ASPLOS'24] SoCFlow: Efficient and Scalable DNN Training on SoC-Clustered Edge Servers. by Daliang Xu et al.
- [ATC'24] More is Different: Prototyping and Analyzing a New Form of Edge Server with Massive Mobile SoCs. by Li Zhang et al.
- [ATC'24] High-density Mobile Cloud Gaming on Edge SoC Clusters. by Li Zhang et al.
- [ATC'24] FwdLLM: Efficient Federated Finetuning of Large Language Models with Perturbed Inferences. by Mengwei Xu et al.
- [WWW'24] Towards Energy-efficient Federated Learning via INT8-based Training on Mobile DSPs. by Jinliang Yuan et al.
- [ICPP'24] Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning. by Bei Ouyang et al. [paper]
- [MobiCom'24] Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices. by Shengyuan Ye et al. [paper]
- [MobiCom'23] Federated Few-shot Learning for Mobile NLP. by Dongqi Cai et al.
- [MobiCom'23] Efficient Federated Learning for Modern NLP. by Dongqi Cai et al.
- [ICPP'22] Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training. by Shengyuan Ye et al. [paper] [code]
- [SEC'21] EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment. by Pengzhan Hao et al. [paper]
- [MobiSys'21 Workshop] Towards Ubiquitous Learning: A First Measurement of On-Device Training Performance. by Dongqi Chai, Mengwei Xu et al. [paper]
- [TMC'25] Resource-Efficient Collaborative Edge Transformer Inference with Hybrid Model Parallelism. by Shengyuan Ye et al. [paper]
- [INFOCOM'25] Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices. by Shengyuan Ye et al. [paper]
- [ICCAD'25] Mitigating Resource Contention for Responsive On-device Machine Learning Inferences. by Minsung Kim et al. [paper]
- [INFOCOM'24] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference. by Shengyuan Ye et al. [paper]
- [ICSOC'23] Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors. by Daliang Xu et al.
- [MobiSys'23] NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors. by USTC & Microsoft. [paper]
- [MobiSys'22] CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices. by Fucheng Jia et al. [paper]
- [InfoCom'22] Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. by Chenghao hu et al. [paper]
- [TON'20] Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices. by Liekang Zeng et al. [paper]
- [ICCD'20] A distributed in-situ CNN inference system for IoT applications. by Jiangsu Du et al. [paper]
- [TPDS'20] Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure. by Jiangsu Du et al. [paper]
- [EuroSys'19] μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization. by Youngsok Kim et al. [paper]
- [TCAD'18] DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. by zhuoran Zhao et al. [paper]
- [DATE'17] Modnn: Local distributed mobile computing system for deep neural network. by Jiachen Mao et al. [paper]
- [ASPLOS'25] Fast On-device LLM Inference with NPUs. by Daliang Xu et al.
- [TMC'24] EdgeLLM: Fast On-Device LLM Inference With Speculative Decoding. by Daliang Xu et al.
- [MobiCom'24] FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices. by Xiangyu Li et al.
- [MobiSys'24] Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization.
- [MobiSys'23] Boosting DNN Cold Inference on Edge Devices. by Rongjie Yi et al.
- [MobiSys'23] ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU. by Shanghai Jiao Tong University [paper]
- [MobiSys'22] Band: coordinated multi-DNN inference on heterogeneous mobile processors. by Seoul National University et al. [paper]
- [MobiSys'21] nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices. by Li Lyna Zhang et al. [paper]
- [ICLR'26] UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning.
- [MobiSys'25] AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation. by Hao Wen et al.
- [UIST'24] LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation.
- [Arxiv'24] MobileViews: A Large-Scale Mobile GUI Dataset.
- [MobiCom'24] AutoDroid: LLM-powered Task Automation in Android. by Hao Wen et al.
- [INFOCOM'26] Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding. by Shengyuan Ye et al. [paper]
- [NC'25] Ubiquitous Memory Augmentation via Mobile Multimodal Embedding System. by Dongqi Cai et al.
- [NSDI'25] Region-based Content Enhancement for Efficient Video Analytics at the Edge.
- [Arxiv'24] InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
- [Arxiv'23] MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices.
- [MobiSys'22] Approximate Query Service on Autonomous IoT Cameras. [paper]
- [MobiCom'18] DeepCache: Principled Cache for Mobile Deep Vision. by Mengwei Xu et al. [paper]
- [NeurIPS'24] SILENCE: Protecting Privacy in Offloaded Speech Understanding on Wimpy Devices. by Dongqi Cai et al.
- [Ubicomp'18] DeepType: On-Device Deep Learning for Input Personalization Service with Minimal Privacy Concern. by Mengwei Xu et al. [paper]
- [Arxiv 2018] Federated learning for mobile keyboard prediction. by Google [paper]
- [Arxiv'24] Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security. by Yuanchun Li.
- [CSUR'24] A Survey of Resource-efficient LLM and Multimodal Foundation Models. by Mengwei Xu et al. [paper]
- [CVPR'23 Tutorial] Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments. by Snap Research [paper]
- mllm: Fast Multimodal LLM on Mobile Devices. by BUPT Team. [code]
- Tensorflow Lite: Deploy machine learning models on mobile and edge devices. by Google. [code]
- TensorflowJS: A WebGL accelerated JavaScript library for training and deploying ML models. by Google. [code]
- MNN: A Universal and Efficient Inference Engine. by Alibaba. [code]
- TensorRT: A C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. by Nvidia. [code]
- TVM: Open deep learning compiler stack for cpu, gpu and specialized accelerators. by Tianqi Chen et al. [code]
- MACE: a deep learning inference framework optimized for mobile heterogeneous computing platforms. by XiaoMi. [code]
- NCNN: a high-performance neural network inference framework optimized for the mobile platform. by Tencent. [code]
- FluidAudio: Local audio AI SDK for Apple platforms with ASR, speaker diarization, VAD, and TTS. Optimized for Apple Neural Engine. by FluidInference. [code]
- Airgap: Open-source React Native framework for on-device, offline-first customer support chatbots. Runs Gemma 4 E2B locally via llama.rn (the React Native binding of llama.cpp). Seven industry templates (telco, retail, healthcare, banking, education, insurance, airlines) ship in the repo. by Xavier Puspus. [code]
All contributions to this repository are welcome. Open an issue or send a pull request.