Skip to content

SharadhNaidu/PocketRag

Repository files navigation

PocketRAG

A fully on-device RAG assistant for Android. Drop in PDFs, .txt, .md, or .docx files, ask questions, get answers — without anything leaving your phone.

No cloud calls. No accounts. No telemetry. The model runs locally on CPU via MediaPipe + LiteRT, embeddings are computed with a local TextEmbedder, and chunks live in an ObjectBox vector DB on disk.

What you can do with it

  • Chat with your documents. Upload one file or a whole pile of them; the app chunks, embeds, and indexes them. Ask questions and you get answers with citations to the source page.
  • Workspaces. Keep work, study, and personal libraries separate — each workspace has its own documents, conversations, and system prompt.
  • Multiple conversations per workspace. Gemini-style left drawer with every past chat, auto-titled from your first question.
  • Pick your model. Built-in catalog of small LLMs (Gemma 3 1B, Gemma 2 2B, Phi-4 Mini, Qwen 2.5, Gemma 3n) — tap to download, tap to switch. You can also paste any MediaPipe .task URL.
  • Prompt presets. Balanced, Strict RAG, Tutor, Summarizer, Code Reviewer — or write your own per-workspace prompt.

Install

Grab the latest APK from Releases and sideload it on an Android 10+ phone with about 6 GB of RAM (the bigger models want more). Open the app, hit Settings → Recommended for your device, and download a model. The Gemma 3 1B (int4) is the fastest place to start.

Build it yourself

git clone https://github.com/SharadhNaidu/PocketRag.git
cd PocketRag
# Point local.properties at your Android SDK
echo "sdk.dir=/path/to/Android/Sdk" > local.properties
./gradlew :app:assembleDebug
adb install -r app/build/outputs/apk/debug/app-debug.apk

Models distributed via Hugging Face at SharadhNaiduTrains/PocketRag. The mirror script (scripts/upload_models.py) is what populates that repo from upstream LiteRT sources.

How it works

┌─────────────────────┐    chunks + vectors    ┌──────────────────┐
│  DocumentProcessor  │ ─────────────────────▶ │  ObjectBox HNSW  │
│  (PDF/txt/md/docx)  │                        └──────────────────┘
└─────────────────────┘                                 │
                                                        ▼
                                              ┌──────────────────┐
            user query  ──── TextEmbedder ──▶ │  RetrievalCore   │
                                              │  (semantic + MMR)│
                                              └──────────────────┘
                                                        │ top-K
                                                        ▼
                                              ┌──────────────────┐
                                              │    RagManager    │
                                              │ (prompt + history)│
                                              └──────────────────┘
                                                        │
                                                        ▼
                                              ┌──────────────────┐
                                              │  MediaPipe LLM   │
                                              │   (CPU, .task)   │
                                              └──────────────────┘

Chunking is paragraph-aware with overlap. Retrieval blends semantic top-K with keyword fallback and MMR-style diversity, then merges adjacent chunks so the model gets contiguous context. Prompt strategy is intent-aware — summary, reasoning, and extraction queries get different scaffolds.

Hardware notes

The default LLM backend is CPU. GPU was tried on Realme/Oplus devices and got reliably killed by Athena's GPU-memory watchdog at the ~3.5 GB cap, so it's pinned off until that's worked around.

License

MIT.

Author

@SharadhNaidu — sole maintainer. PRs welcome once the project stabilizes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors