An on-device AI chat application for Android that runs LLM models completely offline using llama.cpp.
- ✅ 100% On-Device: Your conversations stay private - no internet required
- ✅ GGUF Model Support: Compatible with any GGUF format models
- ✅ Real-time Streaming: Watch responses generate token-by-token
- ✅ Modern UI: Built with Jetpack Compose Material3
- ✅ Arm-Optimized: Leverages Arm CPU features for efficient inference
- Language: Kotlin 2.2.0
- UI Framework: Jetpack Compose
- LLM Engine: kotlinllamacpp 0.2.0
- Architecture: MVVM with Kotlin Coroutines & Flow
- Minimum SDK: Android 24 (Nougat)
- Target SDK: Android 36
- Android Studio (latest version recommended)
- Android device with arm64-v8a processor
- A GGUF model file (recommended: Q4 or Q5 quantized, < 3GB)
-
Clone the repository
git clone https://github.com/YOUR_USERNAME/NanoMInd.git cd NanoMInd -
Build the project
./gradlew assembleDebug
-
Install on your device
adb install app/build/outputs/apk/debug/app-debug.apk
Or use the quick install script:
./install.sh
- Download a GGUF model (e.g., from HuggingFace)
- Rename it to
nanomind_model.gguf - Place it in your device's Downloads folder
- Grant storage permissions when the app requests them
- The app will automatically load the model on startup
Recommended Models:
- TinyLlama 1.1B Q4 - Fast, great for testing
- Phi-2 Q4 - Good balance of size and quality
- Any GGUF model under 3GB for smooth mobile performance
app/src/main/java/com/example/nanomind/
├── MainActivity.kt # Main activity with FileProvider setup
├── ChatViewModel.kt # Chat logic and LLM integration
└── res/
└── xml/file_paths.xml # FileProvider configuration
Modern Android requires content:// URIs for file access. This app uses FileProvider to convert file paths to proper URIs:
val contentUri = FileProvider.getUriForFile(
context,
"${packageName}.fileprovider",
modelFile
)Real-time response updates using Kotlin Flow:
llmFlow.collect { event ->
when (event) {
is LlamaHelper.LLMEvent.Ongoing -> {
accumulatedText += event.word
updateMessage(accumulatedText)
}
}
}- Use Q4 or Q5 quantized models for best mobile performance
- Adjust
contextLengthinChatViewModel.ktbased on available RAM (default: 2048) - Smaller models (< 3B parameters) are recommended for phones
- First response may be slower as the model initializes
Model not loading?
- Ensure file is named exactly
nanomind_model.gguf - Check it's in the Downloads folder (not a subfolder)
- Verify storage permissions are granted
- Check logcat:
adb logcat -s NanoMInd
App crashes on model load?
- Model may be too large for available RAM
- Try a smaller or more quantized model (Q4_K_M recommended)
Slow inference?
- Normal for larger models on mobile devices
- Try a smaller model or higher quantization level
- Ensure your device has arm64-v8a architecture
This project overcame several challenges during development:
- ✅ Kotlin 2.0+ Compose Compiler plugin configuration
- ✅ ContentResolver file access on modern Android
- ✅ FileProvider URI generation for external storage
- ✅ Flow collection lifecycle management for streaming responses
See walkthrough.md for detailed implementation notes.
MIT License - feel free to use and modify!
- llama.cpp by Georgi Gerganov
- kotlinllamacpp by ljcamargo
- Built with ❤️ for on-device AI
Contributions are welcome! Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
Note: This is an offline AI assistant. No data leaves your device. All processing happens locally on your phone.