feat(litert): add LiteRT-LM as second on-device inference engine by dishit-wednesday · Pull Request #360 · off-grid-ai/mobile

dishit-wednesday · 2026-05-16T12:18:46Z

Add LiteRT as a second on-device inference engine

Adds LiteRT (Google's on-device inference runtime) as a peer to the existing llama.cpp engine. Android-only at ship. The JS layer is built platform-agnostic so the iOS Swift LiteRT SDK can drop in when Google releases it with no code changes on this side.

Engine is decided by file extension — .gguf runs on llama.cpp as before, .litertlm runs on LiteRT. No engine toggle in the UI; it follows the model.

Why a second engine

LiteRT runs Gemma 4, Gemma 3n, and other LiteRT-LM packaged models with GPU acceleration paths that llama.cpp does not have on Android. CPU remains as the universal fallback.

Both engines stay in the app. Users keep their existing .gguf models and existing chats. Adding a .litertlm model is purely additive.

Architecture

Follows the same shape as the existing llama.cpp / ONNX split — a peer service, not a layer on top of llmService.

src/services/ llm.ts ──▶ llama.cpp via llama.rn (existing) litert.ts ──▶ LiteRT via LiteRTModule.kt (new) engines.ts ──▶ getActiveEngineService() router liteRTCompaction ──▶ context summarization for LiteRT activeModelService/loaders.ts doLoadTextModel ──┬─▶ doLoadLiteRTModel (model.engine === 'litert') └─▶ llama path (model.engine === 'llama') generationServiceHelpers.ts generateResponseImpl ├─▶ runLiteRTResponseImpl └─▶ llama path

android/.../litert/LiteRTModule.kt ReactContextBaseJavaModule wrapping com.google.ai.edge.litertlm:litertlm-android:0.11.0

Type model

DownloadedModel is a discriminated union over engine:

type DownloadedModel = LlamaDownloadedModel | LiteRTDownloadedModel;
interface LlamaDownloadedModel  { engine: 'llama';  mmProjPath?: string; ... }

interface LiteRTDownloadedModel { engine: 'litert'; liteRTVision: boolean; ... }

Engine-specific fields live on the engine-specific type. Consumers narrow with model.engine === 'llama' and let the compiler enforce correctness. isLlamaModel / isLiteRTModel type guards are exported for tests and selectors.

One downloadedModels collection, one activeModelId. The image-model pattern of separate collections is not used here because only one text model is loaded at a time and most call sites are engine-agnostic.

Settings model

LiteRT-specific settings are flat fields prefixed with liteRT:

Setting	Type	Default
liteRTBackend	'gpu' \| 'cpu'	'gpu'
liteRTTemperature	number	0.7
liteRTTopP	number	0.9
liteRTMaxTokens	number	4096

Test plan

# Add LiteRT as a second on-device inference engine

Adds [LiteRT](https://ai.google.dev/edge/litert) (Google's on-device inference runtime) as a peer to the existing llama.cpp engine. Android-only at ship. The JS layer is built platform-agnostic so the iOS Swift LiteRT SDK can drop in when Google releases it with no code changes on this side.

Engine is decided by file extension — .gguf runs on llama.cpp as before, .litertlm runs on LiteRT. No engine toggle in the UI; it follows the model.

Why a second engine

LiteRT runs Gemma 4, Gemma 3n, and other LiteRT-LM packaged models with GPU acceleration paths that llama.cpp does not have on Android. CPU remains as the universal fallback.

Both engines stay in the app. Users keep their existing .gguf models and existing chats. Adding a .litertlm model is purely additive.

Architecture

Follows the same shape as the existing llama.cpp / ONNX split — a peer service, not a layer on top of llmService.

src/services/
  llm.ts            ──▶  llama.cpp via llama.rn (existing)
  litert.ts         ──▶  LiteRT via LiteRTModule.kt (new)
  engines.ts        ──▶  getActiveEngineService() router
  liteRTCompaction  ──▶  context summarization for LiteRT

activeModelService/loaders.ts
  doLoadTextModel  ──┬─▶ doLoadLiteRTModel  (model.engine === 'litert')
                     └─▶ llama path         (model.engine === 'llama')

generationServiceHelpers.ts
  generateResponseImpl
     ├─▶ runLiteRTResponseImpl
     └─▶ llama path

android/.../litert/LiteRTModule.kt
  ReactContextBaseJavaModule wrapping
  com.google.ai.edge.litertlm:litertlm-android:0.11.0

Type model

DownloadedModel is a discriminated union over engine:

type DownloadedModel = LlamaDownloadedModel | LiteRTDownloadedModel;

interface LlamaDownloadedModel  { engine: 'llama';  mmProjPath?: string; ... }
interface LiteRTDownloadedModel { engine: 'litert'; liteRTVision: boolean; ... }

Engine-specific fields live on the engine-specific type. Consumers narrow with model.engine === 'llama' and let the compiler enforce correctness. isLlamaModel / isLiteRTModel type guards are exported for tests and selectors.

One downloadedModels collection, one activeModelId. The image-model pattern of separate collections is not used here because only one text model is loaded at a time and most call sites are engine-agnostic.

Settings model

LiteRT-specific settings are flat fields prefixed with liteRT:

Setting	Type	Default
`liteRTBackend`	`'gpu' \| 'cpu'`	`'gpu'`
`liteRTTemperature`	`number`	`0.7`
`liteRTTopP`	`number`	`0.9`
`liteRTMaxTokens`	`number`	`4096`

liteRTMaxTokens maps to LiteRT's EngineConfig.maxNumTokens — one budget covering system prompt, history, input, and output combined. Existing llama settings are untouched. Switching engines does not cross-contaminate preferences.

Core features

Text generation

liteRTService.sendMessage() streams tokens through four event types from the native module:

Native event	JS callback
`litert_token`	`onToken` — response stream
`litert_thinking`	`onReasoning` — extended-reasoning stream
`litert_complete`	`onComplete` with `BenchmarkInfo` stats
`litert_error`	`onError`
`litert_tool_call`	tool invocation request

The native Conversation object holds turn history internally. JS only sends the current user turn after the conversation is set up, avoiding re-prefilling the entire chat on every message.

prepareConversation only triggers a native resetConversation when activeConversationId, activeSystemPrompt, or activeToolsJson changes. Otherwise it reuses the live native session.

After load on GPU, the engine runs a one-token warmup against an empty prompt to prime the shader/kernel cache and avoid first-compile latency on the first real prompt.

Thinking

Gemma 4 models need a <|think|> token prepended to the system prompt to activate extended reasoning. applyGemma4ThinkToken handles this for both engines:

applyGemma4ThinkToken(prompt, isRemote, { isLiteRT, thinkingEnabled })

Thinking tokens stream into a separate channel (litert_thinking native / onReasoning JS). The chat UI renders them in a collapsible "Thought process" section above the response. Disabling the thinking toggle removes the prefix and the model produces direct answers only.

Context compaction

LiteRT models load with a fixed maxNumTokens budget. The service tracks cumulative tokens and auto-compacts at 65% of the budget:

Situation	Strategy
Active session loaded	Ask the model to summarize itself in 3-5 sentences, then reset with `[summary, recent turns]`
First load, no session	Slice the oldest history (cannot summarize what is not loaded)

Summarization runs inside the existing KV cache while ~35% headroom remains, then a single reset rebuilds the conversation with the compacted history. Recent-turn selection keeps the last 40% of context by char estimate, with a minimum of two turns. A 20-second timeout falls back to slice-only if summarization stalls.

Implementation: src/services/liteRTCompaction.ts — pure function, takes history + maxTokens + cumulativeTokens, returns the reset call to make.

Tools

Tool calling is handled natively by the LiteRT SDK. Tools are passed in as JSON at conversation reset time. When the model emits a tool call, the SDK fires litert_tool_call to JS with the tool name and arguments. JS executes the tool and calls respondToToolCall(id, result) back to the native side. The SDK feeds the result into the conversation and continues generation internally — JS never sees the second prefill.

Three text-based parsing fallbacks exist for models that do not use the SDK's tool format (Gemma 4 emits a non-standard <|tool_call> syntax). These parse the response text after generation and execute tools through the same path.

Vision

LiteRT supports multimodal models (Gemma 3n E2B / E4B variants) with a GPU-accelerated vision encoder. The liteRTVision boolean on the model record gates this:

At import time, a dialog asks "Text Only / Vision" for any .litertlm file
If liteRTVision === true, the engine loads with visionBackend = Backend.GPU()
The image-attach button is shown only when the active model supports vision
Attaching an image to a non-vision LiteRT model is blocked with a clear error rather than silently dropping the image

The dialog is needed because LiteRT does not expose introspection for vision support on a loaded model.

Backend selection and fallback

Backend	Notes
GPU (OpenCL via Adreno)	Works on most modern Android devices
CPU	Universal fallback

The native module does a two-tier fallback at load time with per-tier timeouts (GPU 20s, CPU 15s). The actually-loaded backend is reported back via getActiveBackend() so the UI can show "Requested GPU, running on CPU" if fallback occurs.

Model management

Import flow

Models screen → Import → pick a .litertlm file
             ↓
Vision support dialog (only for .litertlm)
             ↓
File copy to app documents
             ↓
DownloadedModel record created with engine: 'litert'

No LiteRT model catalog or HuggingFace download flow in this PR — users sideload .litertlm files. Curated downloads will come in a follow-up.

Recommended models for sideload:

google/gemma-3n-E2B-it-litert-lm — vision-capable, smaller
litert-community/gemma-4-E2B-it-litert-lm — text-only, larger

Settings UI

ModelSettingsScreen and GenerationSettingsModal render one of two sibling components based on the active model:

isLiteRT ? <LiteRTTextSettings /> : <LlamaTextSettings />

LiteRTTextSettings shows: Temperature, Max Tokens, Top P (advanced), Acceleration (GPU/CPU), Show Generation Details, Thinking toggle.

LlamaTextSettings shows: Temperature, Max Tokens, Context Length, Top P, Repeat Penalty, CPU Threads, Batch Size, Backend, Flash Attention, KV Cache Type, Model Loading Strategy.

LiteRT hides controls that do not map to its SDK (no flash attention, no GPU layer count, no manual thread count).

Settings that bake into the engine at load time (backend, maxTokens) trigger a "settings changed, reload required" banner. Settings the engine reads per-generation (temperature, top-P) take effect on the next reset.

Memory budget

The memory check at model load now sums RAM across both engines plus any loaded image model. Loading a LiteRT model while llama is resident unloads llama first, and vice versa. Loading a LiteRT model alongside an image model checks both budgets to prevent OOM.

Build infrastructure

Kotlin 2.2.0

The LiteRT SDK requires Kotlin 2.x. kotlinVersion is bumped to 2.2.0 in android/build.gradle along with the Kotlin Gradle plugin classpath. This unlocks the K2 compiler — most modules compile faster and produce slightly smaller bytecode.

`react-native-gesture-handler` patch

Kotlin 2.2's K2 compiler tightened smart-cast inference inside lambdas when the receiver is a property. react-native-gesture-handler@2.30.0 relies on pre-K2 behavior in findRootHelperForViewAncestor:

// Before (fails K2 — smart cast doesn't carry across && for property receivers)
it.rootView is ReactRootView && it.rootView.rootViewTag == rootViewTag

// After (patches/react-native-gesture-handler+2.30.0.patch)
it.rootView is ReactRootView && (it.rootView as ReactRootView).getRootViewTag() == rootViewTag

Same JVM bytecode, same runtime behavior. Applied via patch-package in postinstall. Will self-remove when gesture-handler ships a K2-compatible version upstream.

Native module

android/app/src/main/java/ai/offgridmobile/litert/LiteRTModule.kt registers as a standard ReactContextBaseJavaModule:

Method	What it does
`loadModel`	Two-tier backend fallback init
`resetConversation`	Close current, create new with system + history + tools
`sendMessage`	Send current turn, stream tokens via events
`respondToToolCall`	Feed tool result back to the SDK
`stopGeneration`	Cancel current generation, null active session
`unloadModel`	Close `Conversation`, then `Engine`, in that order
`getMemoryInfo`	Native RAM stats for `DeviceStatsChip`

OpenCL opted in via AndroidManifest:

<uses-native-library android:name="libOpenCL.so" android:required="false" />

Platform readiness

liteRTService detects availability by native module presence (!!NativeModules.LiteRTModule), not Platform.OS === 'android'. When the iOS Swift LiteRT SDK is released and registered under the same JS name, the JS service starts working on iOS with no code changes here.

Behavior summary

Action	Result
Tap a `.gguf` model	Loads on llama.cpp (existing)
Tap a `.litertlm` model	Loads on LiteRT, settings UI swaps
Change LiteRT backend while loaded	Reload banner appears
Change llama backend while LiteRT loaded	No banner (independent settings)
Stop mid-generation	Cancels, next turn safely restarts
Send an image to a non-vision LiteRT model	Clean error, no silent drop
Conversation context hits 65% of budget	Auto-summarizes and resets
App backgrounded with LiteRT loaded, then resumed	State syncs correctly
iOS user opens the app	Llama path unchanged, no LiteRT UI

Test plan

- Add LiteRTModule.kt — native Android module managing Engine/Conversation lifecycle with NPU→GPU→CPU fallback chain and image decode pipeline - Add LiteRTPackage.kt and register in MainApplication - Add LiteRTService.ts — JS bridge with streaming token events - Wire generation routing in generationServiceHelpers (litert vs llama.cpp) - Add doLoadLiteRTModel in activeModelService loaders - Add .litertlm import support with per-model vision toggle dialog - Add liteRTVision and engine fields to DownloadedModel type - Add persistent debug logs store (AsyncStorage-backed, survives crashes) - Add DebugLogsScreen modal accessible from ChatHeader terminal icon - Upgrade litertlm-android 0.10.0→0.11.0, Kotlin 2.1.20→2.2.0, kapt→ksp - Fix SIGSEGV: gate visionBackend=GPU behind supportsVision flag - Fix double load: check liteRTService.isModelLoaded() before triggering load - Fix reload loop: skip hasPendingSettings and handleReloadTextModel for litert - Add LITERT_TODO.md with full production readiness backlog - Fix lint errors and update modelManager tests for .litertlm support Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

gemini-code-assist

Code Review

This pull request introduces LiteRT-LM on-device inference support for Android, featuring a new native module, JS bridge, and debug screen. It also migrates the build system to KSP and updates Kotlin and Gradle versions. Review feedback points out a hardcoded local Java path in gradle.properties that breaks CI, a resource leak in the native engine initialization fallback logic, and several instances where vision support is incorrectly hardcoded to true for all LiteRT models instead of respecting the specific model configuration.

gemini-code-assist · 2026-05-16T12:21:49Z

 # The setting is particularly useful for tweaking memory settings.
 # Default value: -Xmx512m -XX:MaxMetaspaceSize=256m
 org.gradle.jvmargs=-Xmx2048m -XX:MaxMetaspaceSize=512m
+org.gradle.java.home=/Library/Java/JavaVirtualMachines/temurin-21.jdk/Contents/Home


The org.gradle.java.home property is hardcoded to a local path on your machine. This will break the build for other developers and on CI environments. As noted in docs/LITERT_TODO.md, this should be removed before merging.

gemini-code-assist · 2026-05-16T12:21:49Z

+        for (backend in chain) {
+            val name = backendName(backend)
+            Log.i(TAG, "initializeWithFallback — trying $name vision=$visionEnabled")
+            try {
+                val cfg = EngineConfig(
+                    modelPath = modelPath,
+                    backend = backend,
+                    cacheDir = null,
+                    visionBackend = if (visionEnabled) Backend.GPU() else null,
+                )
+                val eng = Engine(cfg)
+                val timeoutMs = when (backend) {
+                    is Backend.NPU -> NPU_TIMEOUT_MS
+                    is Backend.GPU -> GPU_TIMEOUT_MS
+                    else           -> CPU_TIMEOUT_MS
+                }
+                withTimeout(timeoutMs) {
+                    eng.initialize()
+                }
+                engine = eng
+                Log.i(TAG, "initializeWithFallback — $name succeeded")
+                return backend
+            } catch (e: Exception) {
+                Log.w(TAG, "initializeWithFallback — $name failed: ${e.message}")
+                engine?.close()
+                engine = null
+                lastError = e
+                if (backend == chain.last()) break
+                Log.i(TAG, "initializeWithFallback — falling back to next tier")
+            }
+        }


There is a resource leak in the fallback chain. If eng.initialize() fails or times out, the local eng instance is never closed. The catch block calls engine?.close(), but engine (the class property) is still null at that point because the assignment at line 112 is never reached. You should declare eng outside the try block and ensure it is closed on failure.

for (backend in chain) { val name = backendName(backend) Log.i(TAG, "initializeWithFallback — trying $name vision=$visionEnabled") var eng: Engine? = null try { val cfg = EngineConfig( modelPath = modelPath, backend = backend, cacheDir = null, visionBackend = if (visionEnabled) Backend.GPU() else null, ) eng = Engine(cfg) val timeoutMs = when (backend) { is Backend.NPU -> NPU_TIMEOUT_MS is Backend.GPU -> GPU_TIMEOUT_MS else -> CPU_TIMEOUT_MS } withTimeout(timeoutMs) { eng.initialize() } engine = eng Log.i(TAG, "initializeWithFallback — $name succeeded") return backend } catch (e: Exception) { Log.w(TAG, "initializeWithFallback — $name failed: ${e.message}") eng?.close() lastError = e if (backend == chain.last()) break Log.i(TAG, "initializeWithFallback — falling back to next tier") } }

References

When a process with an iteration limit exceeds that limit, treat it as a failure and invoke the same fallback logic used for other exceptions.

Avoid code duplication by refactoring common operations into a single call, especially when they appear in multiple exit paths of a function.

gemini-code-assist · 2026-05-16T12:21:49Z

    await activeModelService.loadTextModel(activeModelId);
    const multimodalSupport = llmService.getMultimodalSupport();
-    deps.setSupportsVision(multimodalSupport?.vision || false);
+    deps.setSupportsVision(activeModel.engine === 'litert' ? true : (multimodalSupport?.vision || false));


For LiteRT models, supportsVision is hardcoded to true. This will enable vision UI elements (like the image attachment button) even for text-only LiteRT models. It should instead respect the liteRTVision flag stored in the model record.

Suggested change

deps.setSupportsVision(activeModel.engine === 'litert' ? true : (multimodalSupport?.vision || false));

deps.setSupportsVision(activeModel.engine === 'litert' ? !!activeModel.liteRTVision : (multimodalSupport?.vision || false));

References

Vision-language models should be specifically categorized with type 'vision' to ensure correct UI behavior.

gemini-code-assist · 2026-05-16T12:21:50Z

    await activeModelService.loadTextModel(activeModelId);
    const multimodalSupport = llmService.getMultimodalSupport();
-    deps.setSupportsVision(multimodalSupport?.vision || false);
+    deps.setSupportsVision(activeModel.engine === 'litert' ? true : (multimodalSupport?.vision || false));


Similar to the previous comment, supportsVision should respect the liteRTVision flag from the model record for LiteRT models.

Suggested change

deps.setSupportsVision(activeModel.engine === 'litert' ? true : (multimodalSupport?.vision || false));

deps.setSupportsVision(activeModel.engine === 'litert' ? !!activeModel.liteRTVision : (multimodalSupport?.vision || false));

References

Vision-language models should be specifically categorized with type 'vision' to ensure correct UI behavior.

gemini-code-assist · 2026-05-16T12:21:50Z

+      deps.setSupportsVision(true);
+      return;
+    }
+    dbg('log', `[LiteRT] ensureModelLoaded — model=${activeModel.name}, triggering load`);
+    deps.setSupportsVision(true);


Hardcoding supportsVision to true here will cause the UI to incorrectly show vision capabilities for text-only LiteRT models. Please use the liteRTVision property from the activeModel.

Suggested change

deps.setSupportsVision(true);

return;

}

dbg('log', `[LiteRT] ensureModelLoaded — model=${activeModel.name}, triggering load`);

deps.setSupportsVision(true);

deps.setSupportsVision(!!activeModel.liteRTVision);

return;

}

dbg('log', `[LiteRT] ensureModelLoaded — model=${activeModel.name}, triggering load`);

deps.setSupportsVision(!!activeModel.liteRTVision);

References

Vision-language models should be specifically categorized with type 'vision' to ensure correct UI behavior.

gemini-code-assist · 2026-05-16T12:21:50Z

    await activeModelService.loadTextModel(model.id);
    const multimodalSupport = llmService.getMultimodalSupport();
-    deps.setSupportsVision(multimodalSupport?.vision || false);
+    deps.setSupportsVision(model.engine === 'litert' ? true : (multimodalSupport?.vision || false));


Ensure supportsVision is set based on the model's actual capabilities rather than hardcoding it to true for all LiteRT models.

Suggested change

deps.setSupportsVision(model.engine === 'litert' ? true : (multimodalSupport?.vision || false));

deps.setSupportsVision(model.engine === 'litert' ? !!model.liteRTVision : (multimodalSupport?.vision || false));

References

Vision-language models should be specifically categorized with type 'vision' to ensure correct UI behavior.

gemini-code-assist · 2026-05-16T12:21:50Z

    if (activeModelInfo.isRemote) {
      setSupportsVision(activeRemoteModel?.capabilities?.supportsVision ?? false);
+    } else if (activeModel?.engine === 'litert') {
+      setSupportsVision(true);


The supportsVision state should be derived from the liteRTVision flag in the model record to avoid misleading the user with vision UI on text-only models.

Suggested change

setSupportsVision(true);

setSupportsVision(!!activeModel.liteRTVision);

References

Vision-language models should be specifically categorized with type 'vision' to ensure correct UI behavior.

…ams to LiteRTModule Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

…memory budget, BenchmarkInfo wiring Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

…eload trigger, iOS guard Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

… tps, init time in generation details Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

…/NPU warmup getBenchmarkInfo() requires internal BenchmarkParams not exposed in the public API. Track TTFT, decode tok/s, and token count via wall-clock timers in JS instead. Add model warmup after GPU/NPU load to prime shader caches. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

- Fix regeneration for LiteRT: use ensureModelReady instead of bare llmService.isModelLoaded() check which always returns false for LiteRT - Invalidate native conversation before regenerate/edit so native history is correctly rewound to match the JS message array - Fix context loss after stopGeneration: remove activeConversationId=null which was wiping native turn history on every stop - Add invalidateConversation() to LiteRTService for explicit resets - Extend tool call parser to handle: no-args calls, Gemma function-call style args NAME({"k":"v"}), and </tool_call> closing tag variant - Fix Gemma native parser regex to accept both <tool_call|> and </tool_call> as closing tags - GPU retry logic in LiteRTModule: retry non-CPU backends up to 3 times with 600ms backoff before falling back, handles transient VRAM pressure after model switches - Capture benchmark stats from generateRaw path for generation meta display - Raise debug log capacity from 200 to 2000 entries Co-Authored-By: Dishit Karia <hamadishit74@gmail.com>

…ext size, and wire tool call event bridge Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

…ore selector Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

…extend reload detection to context length Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

Tapping the input shrank the FlatList viewport without repositioning the scroll, leaving the last AI message hidden behind the keyboard. Track height changes via onLayout and scroll to end when the viewport shrinks. Add a keyboardWillShow/keyboardDidShow listener as a secondary trigger for iOS. Co-Authored-By: Dishit Karia hanmadishit74@gmail.com

…imeout - Fix Gemma tool call parsing to handle the "tool_name{json}" body pattern alongside the existing key:value format; add key validation so non-word strings are not treated as argument keys - Pass temperature/topK/topP through prepareConversation in the tool loop so generation settings are respected during tool-call turns - Unify model init timeout to 90s across all backends (was 45/20/15s) to prevent premature timeout failures on slower devices - Add debugLog helper in LiteRTModule that emits litert_debug_log events to the in-app debug screen alongside logcat Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

…ter from history Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

… settings UI Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

Split DownloadedModel into LlamaDownloadedModel | LiteRTDownloadedModel with engine as required discriminant. Legacy records without an engine field are backfilled to 'llama' on load from AsyncStorage. All call sites that touch llama-only fields (mmProjPath, mmProjFileSize, isVisionModel) now narrow via engine === 'llama' guards before access, removing the previous implicit assumption that every model had those fields. Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

Add liteRTTemperature, liteRTTopP, liteRTContextLength, and liteRTMaxOutputTokens to AppSettings so LiteRT and llama no longer share a single contextLength/temperature/topP field with different semantics per engine. LiteRT generation paths now read liteRT* fields. Llama paths are unchanged. Migration seeds the new fields from existing shared values on first upgrade so user preferences carry over. The pending-reload banner check for LiteRT now watches liteRTContextLength instead of the shared contextLength. Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

Show amber dot on chat settings gear icon and amber tool icon/badge in the quick settings popover when more than 3 tools are active. ToolPickerSheet shows a one-time dismissable banner explaining the latency impact. Dismissed state is persisted — never shown again once acknowledged. Also sets 3-tool default for new users and adds hint copy to the bottom of the tool picker. Co-Authored-By: Dishit Karia hanmadishit74@gmail.com

…d LiteRT - Revert applicationId, versionCode, versionName to match main - Revert app_name to "Off Grid" - Revert JVM heap args to match main - Migrate kapt → KSP for Room compiler (required for Kotlin 2.2.0) - Bump Kotlin 2.1.20 → 2.2.0, pin AGP to 8.8.2 - Add KSP plugin at 2.2.0-2.0.2 - Add LiteRT dependency: litertlm-android:0.11.0

Extract color literals to constants, move inline styles to stylesheets, remove unused AVAILABLE_TOOLS imports from Popovers and ChatInput.

… dependency in ChatInput ChatInput now receives showSettingsDot as a prop from ChatMessageArea, keeping the store read out of ChatInput and fixing test failures caused by unmocked useAppStore in existing ChatInput tests.

…ile overhead Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

and remove excessive comments Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

Wrap the settings icon in a relative-positioned View so the dot is anchored to the icon bounds (18px) not the button bounds (32px).

Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

Co-Authored-By: Dishit Karia hanmadishit74@gmail.com

LiteRT runs the tool loop natively via automaticToolCalling, so the JS MAX_TOTAL_TOOL_CALLS cap never applied to it — a single message could trigger unbounded tool calls and overflow the ~4096-token KV cache mid-turn, producing degenerate output or crashing. Add a per-turn counter in buildLiteRTToolCallHandler: calls 1-3 run normally; the 4th+ skips execution and returns a 'stop, answer now' nudge to the model. Counter resets each turn (closure rebuilt per generation). Loop stays native. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

…ume retry When a download fails on the Models screen, the card now renders the error message, a red partial-progress bar, and Retry / Remove buttons directly inside the card boundary — matching the Download Manager UI. Tapping Retry calls backgroundDownloadService.retryDownload with the existing download ID so the native WorkManager resumes from the partial file via HTTP Range instead of starting a fresh download from 0. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

Remove the two logs that fire in tight loops: - llm.ts: reasoning_content chunk received (fired on every thinking token — O(N²) string work serializing accumulated text each call) - useDownloads.ts: mmproj progress and missed-entry debug logs (fired every 1.5s during download and on every progress event miss) All other diagnostic logs (model load, download lifecycle, tool calls) are untouched — they fire once per user action and are useful for diagnosing real issues. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

ChatsListScreen was subscribing to the entire chatStore and appStore with no selectors, causing it to re-render on every streaming token while mounted in the tab navigator. Actions moved to getState() and data fields use targeted selectors. Adds an informational banner above the chat input when a llama model is loaded with OpenCL selected as the inference backend, nudging users to switch to CPU in Settings. Does not show for LiteRT models or remote models. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

… StyleSheet Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

…stener split useDownloads.ts: remove useDownloadListeners() call — now fully independent. App.tsx: mount useDownloadListeners() directly at root so listener registration is not lost after the split. TextModelsTab handleRetryDownload: - Android-only guard; iOS falls back to proceedDownload (fresh download) - mmproj sidecar retry: set pending before retry, only call resetMmProjForRetry if native retry succeeded, set failed on error. Matches retryAndroidDownload in useDownloadManager exactly — prevents silent vision loss on retry from the Models screen. - onRetry branches on Platform.OS - Use storeDownloads selector instead of getState() snapshot for storeEntry Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

Co-Authored-By: Dishit Karia hanmadishit74@gmail.com

…callback Replace captured store.downloadIdIndex snapshot with a live getState() call inside the async callback, matching the pattern in reattachRetriedTextDownload in useDownloadManager.ts. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

- ModelCard: 8 new tests covering failedState / FailedSection (new inline retry UI from fix/69d17d28) - generationToolLoop: 4 new tests for LiteRT native tool-call cap introduced in fix/73f85ff8 — verifies cap at 3, Aborted fast-path, and per-generation counter reset - activeModelService loaders: fix stale-path test (add isVisionModel:true), add guard tests for text-only model and mmProjFileName repair sentinel - scan.test.ts (new): unit tests for extractBaseName and findMatchingMmProj, plus curatedLiteRTRegistry entry lookup - visionRepair: 3 additional branch tests (name-lookup false path, catalog-no-mmproj path, fileName vl-detection path) Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

…e ID - Create shared test utilities: mocks.ts with AsyncStorage, logger, whisper service, and HTTP client factories - Add store-specific reset helpers (resetDownloadStore, resetRemoteServerStore, resetWhisperStore, etc) - Add act() wrapper utilities (actStoreUpdate, actAsyncStoreUpdate) to reduce boilerplate - Refactor remoteServerStore.test.ts to use shared actStoreUpdate() instead of 50+ act() calls - Refactor whisperStore.test.ts to use resetWhisperStore() from shared utilities - Change litert bundle ID from ai.offgridmobile to ai.offgridmobile.litert (allows side-by-side install with Play Store version) Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

Improve perf

sonarqubecloud · 2026-06-03T10:58:45Z

Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 16, 2026

View reviewed changes

dishit-wednesday and others added 27 commits May 19, 2026 15:08

fix(android): add BenchmarkInfo stats, getMemoryInfo, and sampler par…

dcf40c8

…ams to LiteRTModule Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

fix(litert): production fixes - stopGeneration, multi-turn tracking, …

a8aa88b

…memory budget, BenchmarkInfo wiring Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

feat(ui): DeviceStatsChip, hide irrelevant LiteRT settings, backend r…

85f887d

…eload trigger, iOS guard Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

test: add LiteRT service tests and improve existing coverage

d560373

Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

feat(litert): expose full BenchmarkInfo - prefill speed, TTFT, decode…

6265011

… tps, init time in generation details Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

feat(litert): pass maxNumTokens to engine, scale init timeout by cont…

7fa4f29

…ext size, and wire tool call event bridge Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

feat(litert): filter settings UI per engine and add selectIsLiteRT st…

491746e

…ore selector Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

fix(litert): fix read_url colon-arg parsing, track context fill, and …

5b0169c

…extend reload detection to context length Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

fix: resolve lint and type errors blocking push

cad3920

fix(litert): fix engine variable scoping in initializeWithFallback

2da411a

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

fix(litert): pass image URI through tool loop and generation pipeline

c57a29c

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

fix(litert): stream thinking tokens incrementally instead of all at once

a33f67e

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

fix(litert): enable thinking toggle and deduplicate tool text hint

4921b41

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

fix(tools): update web_search and read_url descriptions for chained use

98b7ab8

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

feat(litert): add RAM-based context slider limits for LiteRT models

4d5ef28

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

feat(litert): lower auto-compact threshold to 65% and seed token coun…

8e9605a

…ter from history Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

feat(litert): add liteRTBackend setting defaulting to gpu

b0c7d96

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

fix(litert): add display branch debug logs to getDisplayMessages

8934c29

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

fix(litert): detect native module by existence, not Platform.OS

ff2a4e0

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

feat(litert): wire liteRTBackend to loader, pending-reload check, and…

6b55efb

… settings UI Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

feat(litert): add getActiveEngineService helper

dbafafa

Co-authored-by: Dishit Karia <hanmadishit74@gmail.com>

dishit-wednesday and others added 28 commits May 27, 2026 12:45

fix(lint): fix lint errors in tool warning UI components

67adafd

Extract color literals to constants, move inline styles to stylesheets, remove unused AVAILABLE_TOOLS imports from Popovers and ChatInput.

fix(ios): hide LiteRT recommended card on iOS

aacc410

chore(android): increase JVM Metaspace limit to 1024m for LiteRT comp…

ecd3bf6

…ile overhead Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

fix(settings): default thinking to off

850dff1

and remove excessive comments Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

chore:delete unwanted files nd comments

deb6b1f

fix(tools): fix amber dot position on settings gear icon

019a6ae

Wrap the settings icon in a relative-positioned View so the dot is anchored to the icon bounds (18px) not the button bounds (32px).

feat(perf): enable React 19 compiler

64ae219

Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

feat(tools): on-device HTML parsing for read_url, improve tool prompts

3009df8

Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

test: disable React Compiler for Jest

5ffdd19

Co-Authored-By: Dishit Karia hanmadishit74@gmail.com

fix(lint): remove unused colors variable and extract inline styles to…

b60f24f

… StyleSheet Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>

fix tests

5ed5743

Co-Authored-By: Dishit Karia hanmadishit74@gmail.com

add one more test

3dca3df

fix lint

ff16147

fix test

1a80c41

fix test

e9ec787

fix sonar errors

1715831

Merge pull request #381 from alichherawalla/improve-perf

ff16d3e

Improve perf

dishit-wednesday merged commit a7bc414 into main Jun 3, 2026
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(litert): add LiteRT-LM as second on-device inference engine#360

feat(litert): add LiteRT-LM as second on-device inference engine#360
dishit-wednesday merged 95 commits into
mainfrom
litertsupport

dishit-wednesday commented May 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Uh oh!

gemini-code-assist Bot May 16, 2026

Uh oh!

gemini-code-assist Bot May 16, 2026

Uh oh!

gemini-code-assist Bot May 16, 2026

Uh oh!

gemini-code-assist Bot May 16, 2026

Uh oh!

gemini-code-assist Bot May 16, 2026

Uh oh!

gemini-code-assist Bot May 16, 2026

Uh oh!

sonarqubecloud Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	deps.setSupportsVision(activeModel.engine === 'litert' ? true : (multimodalSupport?.vision \|\| false));
	deps.setSupportsVision(activeModel.engine === 'litert' ? !!activeModel.liteRTVision : (multimodalSupport?.vision \|\| false));

	deps.setSupportsVision(model.engine === 'litert' ? true : (multimodalSupport?.vision \|\| false));
	deps.setSupportsVision(model.engine === 'litert' ? !!model.liteRTVision : (multimodalSupport?.vision \|\| false));

	setSupportsVision(true);
	setSupportsVision(!!activeModel.liteRTVision);

Uh oh!

Conversation

dishit-wednesday commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add LiteRT as a second on-device inference engine

Why a second engine

Architecture

Type model

Settings model

Test plan

Why a second engine

Architecture

Type model

Settings model

Core features

Text generation

Thinking

Context compaction

Tools

Vision

Backend selection and fallback

Model management

Import flow

Settings UI

Memory budget

Build infrastructure

Kotlin 2.2.0

react-native-gesture-handler patch

Native module

Platform readiness

Behavior summary

Test plan

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 3, 2026

Quality Gate failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dishit-wednesday commented May 16, 2026 •

edited

Loading

`react-native-gesture-handler` patch