Skip to content

fix: iOS + Android stability & performance (MCP boost regression, memory guards, deferred-load switcher)#425

Open
alichherawalla wants to merge 188 commits into
mainfrom
fix/llm-stability-and-perf
Open

fix: iOS + Android stability & performance (MCP boost regression, memory guards, deferred-load switcher)#425
alichherawalla wants to merge 188 commits into
mainfrom
fix/llm-stability-and-perf

Conversation

@alichherawalla

@alichherawalla alichherawalla commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

iOS + Android stability, performance, and image-generation fixes. Pairs with pro PR off-grid-ai/mobile-pro#8.

Stability / performance regression (the primary fix)

  • Remove the MCP context auto-boost (da072bd5) — enabling tools no longer forces a 32k context + model reload, the root cause of the post-Pro slowdown and the iOS Metal-buffer / Android litert OOM crashes on flagship devices. The embedding tool-router already fits schemas in the default window.
  • Un-stick users already pinned at the boosted 32k context via a one-time migration (10236e32).

Memory correctness

  • Budget the embedding model as a residency-tracked sidecar + bound its load so a stalled init can't wedge the global load lock (ccbb7fce).
  • Correct the KV-cache estimate and make the memory guard downgrade context / block instead of warning-then-crashing (a802efd8).

Performance

  • Persist the tool-embedding cache (content-hashed) so the first-message embedding burst happens once ever, cutting TTFT (df6619fa).

Android engine hardening

  • Harden LiteRT load against OOM + Mali GPU crashes: RAM-aware token clamp, vision delegate follows the backend tier (a7ee32f5).

iOS image generation (iOS 26)

  • Right-size the Core ML image RAM estimate and then pick the compute path by RAM tier (96b82e53, 76fcf1f7, bea0e1b7, bd302781, df6d39b0): on iOS 26 the Neural Engine is degraded for palettized diffusion models — it fails to load on the 8 GB iPhone 15 Pro and stalls on smaller devices. So iOS 26 devices with ≥7 GB load on the GPU (the working path), and lower-RAM devices load on the ANE (far smaller system-RAM footprint) instead of OOMing on the GPU. The residency estimate follows the same choice (GPU 2.5×, ANE 1.8×) so the gate admits a load that fits rather than refusing it; the native loader falls back only GPU→ANE (never the reverse, which would OOM).

Downloads / discovery / UX

  • Faster iOS downloads (foreground URLSession), Off Grid AI Gateway discovery (port 7878), STT download retry, and model-switcher bottom-sheet fix (defer opening the picker until the manager sheet fully closes — iOS present-while-dismissing race) (27884cdb).
  • Single source of truth for STT download state across the Download Manager and Transcriptions tab (c2173e65).
  • iOS background-downloader: marshal event emission to the main thread (94f27d8f).
  • Deferred-load model switcher reflects the selected model (421be2f7).

Refactor

  • Extract download retry handlers into their own module (9af76d50) — keeps useDownloadManager under the complexity/line limits after the STT-retry change.

Test plan

  • Unit + integration tests added/updated (residency budget, KV estimate, image RAM by compute path, download retry). JS gates (eslint/tsc/jest) pass.
  • Device-verified: regression gone on flagship; image generation works on iOS 26 (GPU on 8 GB Pro, ANE on 6 GB).

Download-manager sync, Kokoro fix, Desktop card (latest)

  • Kokoro "downloaded the instant you tap" bug (pro 33ab25a0, tests 8c1a4dda): the Download Manager showed Kokoro TTS completed (82 MB) on the first tap while the Voice panel correctly showed it downloading (4%). Root cause: the engine defined "downloaded" as just the two core .pte models, which survive a prior interrupted download, so the disk scan flipped to completed conclusively before the active voice's assets finished. Completeness is now defined once, from the same _activeVoiceSources() the downloader fetches (core .pte + the active voice's embedding/tagger/lexicon), reading listDownloadedFiles() instead of listDownloadedModels() (which filters to .pte and never sees the voice assets). Regression test reproduces the exact case.
  • Off Grid AI Desktop link now points at http://getoffgridai.co/desktop via the single OFF_GRID_DESKTOP_URL constant (all 8 call sites updated) (c2b9dd3f, d9b9a810).
  • Home Desktop card: added a "Copy link" action so people viewing on a phone can copy the URL and share it (WhatsApp/Slack) to open on a Mac later; extracted the card into its own DesktopPromoCard component that owns its copy/dismiss state (d878f403, c506cce6).
  • pro submodule bumped to the pushed pro HEAD 63d1cf52 (Kokoro fix + merge of origin's audio fixes) (66ea85c7).

Concurrency, memory-gate, audio & entitlement hardening (latest)

Downloads - concurrency + lifecycle (fixes the ~25-parallel freeze and the finalize hang)

  • Cap concurrent downloads at 3 via a FIFO queue (98fe0719) - starting a whole catalog no longer spawns ~25 parallel transfers that OOM-froze the device; extra starts wait for a slot. Queued (not-yet-started) downloads render as "Queued" in the Download Manager (a11eac5b), and interrupted iOS downloads are re-queued on relaunch instead of marked failed (a36399f3).
  • One entry per model - never show a model as both Active and Downloaded (56b34002); adopt only in-flight restored downloads on relaunch (a completed one would leak a concurrency slot forever) (bfdc4cf0).
  • Cancellable queued downloads (74ff0e9b) - a "Queued" row had no native id, so Cancel/Remove no-opped; the queue owner now exposes cancelQueued and the service routes a not-found cancel/remove to it by the same uniform id. Queued multi-file image parts are cancellable too and typed image (36eb775b).
  • Vision finalize-hang closed - unified the completion catch-up so the main GGUF (not just the mmproj) recovers a completion that fired before its listener attached (fd3b4136), and start the sidecar before the main so the watcher is live before the main can complete under queue pressure (73eace72).

Memory - live os_proc gating (turns OOM/jetsam into graceful refusal)

  • Gate every load on live os_proc_available_memory under dirty-memory pressure (8c349f25), register the image model as dirtyMemory so the resident-dirty arm of the gate actually fires (08acaf31), and log the raw available/total in the [MEM-SM] decision (f94e9b4b).
  • "Free memory & Retry" actually frees memory before retrying (1dd64cd5), single-sourced via reasonFromLoadError (b9aee47b). Device Info shows per-process Available / Footprint / Process Limit (7dab9f1e).

Audio - one owning view-model (bottom bar can't desync)

  • Single deriveAudioActivity view-model drives the bottom bar (generation-stop > tts-stop > mic; a paused clip shows the mic) and a coordinator resets the voice-switch on engine loss (pro side; tests ffeb86b8, dd47aadc, 3f8d4434).
  • Stop also stops TTS (d6f2b194) - the Stop button ended the LLM but left buffered-ahead sentences playing.

Pro / chat

  • Never upsell a Pro user - gate on real entitlement, not hasRegisteredPro (d2412a5e).
  • Close the model picker before loading, not after, so the sheet doesn't linger over the loading card (5313ecd1).

pro submodule bumped to 5ed8e4b9 (single audio view-model + Kokoro download-state fixes) (d20d006d).

Full JS suite green (247 suites / 6241 tests), tsc + eslint clean. Each fix ships with a regression test that fails against the pre-fix code. Remaining known items (tracked, not blocking): the dirty/clean residual-RAM accounting in the memory gate and CI coverage for the pro audio suite.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added “Get Off Grid AI Desktop” links and updated remote-server/Pro/model screens with consistent deep-link messaging.
    • Improved model selection so “Switch Model” appears for deferred loads, even before the model finishes loading.
  • Bug Fixes
    • Refined remote discovery/model filtering (including gateway support and kind-based exclusions).
    • Improved stability and recovery for model loading and download retries, including safer memory/context budgeting.
    • Fixed deferred sheet opening so navigation after closing the model manager works reliably.
  • Documentation
    • Updated README/Pro guidance, added stability/performance fix plan, and expanded marketing how-tos.

alichherawalla and others added 12 commits June 27, 2026 21:12
Investigation findings and phased fix plan for the crash clusters (iOS Metal
buffer-alloc, Android litert OOM, watchdog hangs) and the post-Pro-activation
slowness regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reduce mcpContextBoost to just isMcpEnabled(); drop applyMcpContextBoost and the
'Raised to the model maximum' Model Settings notes. The auto-boost pinned context
to 32768 and reloaded the model on MCP enable (never restoring), which was the #1
crash cluster on both platforms (iOS metal_buffer_type_alloc_buffer, Android
litert nativeCreateEngine OOM) and the cause of the post-activation slowness on
flagship devices. Tool schemas are thinned by the embedding router instead.

Bumps the pro submodule to the matching boost-removal commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migrate persisted settings: reset contextLength/maxTokens (llama) and
liteRTMaxTokens (litert) that are at the removed boost ceiling (32768/8192) back
to device-safe defaults, so existing Pro users recover without reinstalling.
Leaves legitimate non-boost settings untouched. Adds migration tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MiniLM embedding model (loaded for RAG and MCP tool-routing) took the global
load lock but never registered with the residency manager — so its footprint was
invisible to the RAM budget and a chat model would load believing it had more free
RAM than it did (OOM contributor on both platforms). Register it as a last-resort
sidecar ('embedding' ResidentType) so its RAM is counted and it can be evicted;
release it on unload.

Also bound the native init with a 30s timeout: a stalled embedding load (the
ThreadPool::startWorkers hang behind the 19-device condition_variable::wait crash)
now releases the lock and fails instead of wedging a concurrent chat-model load and
tripping the OS watchdog. Adds tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing (R3, R4)

The pre-load memory guard was advisory only (warn then load anyway) and its KV-cache
estimate was ~1000x too low (a flat ~2MB regardless of model size), so it never
caught oversized loads — they hit the native allocator and crashed (iOS
metal_buffer_type_alloc_buffer, Android litert OOM).

- Scale the KV estimate with model size and cache type (f16 vs quantized), the right
  order of magnitude for real models.
- On an unsafe check, step the context down to the largest size that fits rather than
  loading blindly; only block when the model weights alone exceed available RAM. This
  also covers iOS, where GPU layers were never RAM-capped — reducing context cuts the
  Metal working set. Adds unit tests for the estimate and the downgrade/block paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MCP tool-router embedded the query + every tool sequentially on the tiny CPU
embedding context, caching results only in memory — so the ~60-embed cold burst
repeated on the first message of every session (a visible time-to-first-token stall
for Pro/MCP users). Persist the cache to AsyncStorage keyed by a content hash, so the
burst happens once ever; a changed tool description re-embeds only that tool. Adds tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (F10)

Patch @kesha-antonov/react-native-background-downloader: safeEmitEvent invoked the
TurboModule JSI event emitter directly from the NSURLSession background delegate queue
(didWriteData -> flushProgressReportsIfNeeded), racing the JS runtime's emitter map and
crashing with EXC_BAD_ACCESS (pointer-authentication failure) during active downloads.
Marshal every emission onto the main queue so emitter calls are serialized off the
volatile delegate threads. Applied via patch-package (postinstall).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s (R7, F8)

The liblitertlm crashes (SIGSEGV in inference, SIGABRT in nativeCreateEngine) are the
top Android crash cluster. Two graceful-degradation fixes:

- Clamp the token budget to free RAM before creating the engine (clampMaxTokens,
  unit-tested). The KV cache grows with the budget; an over-budget request aborts
  engine creation or segfaults under memory pressure. We now degrade to a smaller
  context (>=1024 floor) instead of crashing.
- Tie the vision delegate to the main backend tier instead of always forcing
  Backend.GPU(). The always-on Mali GPU vision delegate SIGSEGVs (libGLES_mali) on
  weak/low-VRAM GPUs; on the CPU fallback tier vision now runs on CPU too, so the
  fallback can actually succeed instead of failing the whole load.

Verified: compileDebugKotlin, lintDebug, and testDebugUnitTest all pass locally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…red loading

With deferred loading no model is in memory until first send, so the selector keyed
its 'Switch Model' UI off the loaded path and showed 'Available Models' with nothing
marked active — reading as 'can't switch models in chat'. Derive the selected model
from activeModelId and reflect it: the switcher now shows 'Switch Model' and highlights
the active choice (still tappable, so tapping loads it on tap), while the
'Currently Loaded'/Unload section stays gated on an actual in-memory load. Deferred
loading is unchanged. Adds a test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alichherawalla

Copy link
Copy Markdown
Collaborator Author

/gemini review

1 similar comment
@alichherawalla

Copy link
Copy Markdown
Collaborator Author

/gemini review

alichherawalla and others added 10 commits June 27, 2026 23:12
The Transcriptions tab and the Download Manager could disagree — Download Manager
showed an STT model 'failed' while the tab showed it stuck 'downloading'. They read
different stores: the Download Manager reads the canonical useDownloadStore (driven by
the native background-download events), while the tab read whisperStore.downloadProgressById,
a parallel copy that's only cleared in whisperService's finally — which never runs if the
background promise hangs on failure.

Derive the tab's in-flight STT state from useDownloadStore (filtered to modelType 'stt'),
so a failed entry reports active=false and the model becomes downloadable again instead of
showing a stuck progress bar. whisperStore.downloadProgressById is kept only as a fallback
for the RNFS URL-import path, which has no download-store entry. Adds tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s, kokoro playback)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…her + Desktop links

Downloads
- iOS DownloadManager uses a foreground .default URLSession (was background,
  which iOS throttles) + HTTP/3-off request + no URLCache: large model
  downloads now run at full speed, matching Android.

Remote servers / discovery
- Discover the Off Grid AI Gateway by probing port 7878 (/v1/models) across the
  subnet, alongside Ollama/LM Studio; align the LM Studio probe to /v1/models.
- Categorize remote models by the gateway's `kind` so image/TTS/STT models no
  longer show up as text models in the chat picker.
- Remote Server form placeholders + copy point at Off Grid AI Desktop (:7878).

Bug fixes
- STT (transcription) downloads can be retried from the Download Manager
  (was unhandled on iOS; re-invokes whisperService and clears the stale row).
- Chat model switcher: defer opening the picker until the manager sheet has
  fully closed, so the row tap no longer just dismisses the sheet (iOS race).

Copy
- Promote Off Grid AI Desktop (free, Mac, not Pro) everywhere Ollama/LM Studio
  appear and on every Pro surface, linking to /releases/latest.

Docs/marketing
- CLAUDE.md: test every approved behavior change in the same pass.
- 5 dev.to mobile articles (marketing/devto).

Excludes local build artifacts (xcodeproj/Podfile.lock/ggml .so).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gate)

showToast: native ToastAndroid on Android, brief Alert on iOS (no native toast).
Used to tell the user a voice note can't play while the response is streaming.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
estimateImageModelRam used a flat fileSize x2.5 for every platform, but iOS Core
ML pipelines load with reduceMemory=true (submodels load/unload sequentially), so
peak RAM is roughly the largest submodel, not 2.5x the summed on-disk size. SD 1.5
(~1.5GB file) estimated at ~3.7GB and the residency gate refused it on a 6GB
iPhone 15 with 4.9GB free. iOS now uses x2.0 (still under the budget cap); Android
(ONNX/QNN reserves accelerator memory up front) keeps x2.5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Neural Engine is degraded on iOS 26 for these palettized .mlmodelc: the load
either fails instantly (iPhone 15 Pro: 'Failed to load model') or loads but stalls
at step 0 (UNet never finishes compiling for the ANE; iPhone 15). On iOS 26+ the
pipeline now loads CPU+GPU-first (GPU-accelerated, so palettized weights still
decode correctly - no gray images); older iOS keeps ANE-first with a CPU+GPU
fallback on load failure. Also logs the real native load error instead of the
generic 'Failed to load model'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The STT-retry change pushed useDownloadManager over two eslint limits
(handleRetryDownload complexity 22 > 20; file 533 > 500 lines), which blocked
the push gate. Move the per-platform retry helpers (text/image/STT, mmproj
sidecar, finalization-resume) into retryHandlers.ts and expose a single
runRetryDownload dispatcher; handleRetryDownload is now a thin wrapper. No
behavior change — 72 related tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR updates memory handling, discovery, retry flows, iOS/Android loading behavior, and multiple Off Grid AI Desktop links. It also adds new stability guidance and five markdown articles.

Changes

Stability, memory fixes, and UI updates

Layer / File(s) Summary
MCP context boost removal and store migration
src/services/mcpContextBoost.ts, src/stores/appStore.ts, src/screens/ModelSettingsScreen/TextGenerationSection.tsx, __tests__/unit/stores/appStore.test.ts
Removes MCP auto-boost/reload logic, resets boosted persisted settings during rehydration, and removes MCP-dependent warning behavior from text-generation sliders.
Memory-safe LLM loading with context downgrade
src/services/llmSafetyChecks.ts, src/services/llm.ts, src/services/hardware.ts, __tests__/unit/services/llmSafetyChecks.test.ts, __tests__/unit/services/llm.test.ts, __tests__/unit/services/hardware.branches.test.ts
Refactors memory checks to use object arguments and model-size-based KV cache estimates, adds context downgrade on load, and updates image RAM estimation tests and logic.
Android LiteRT token budgeting and vision backend
android/app/src/main/java/ai/offgridmobile/litert/LiteRTModule.kt, android/app/src/test/java/ai/offgridmobile/litert/LiteRTTokenBudgetTest.kt
Adds RAM-aware token clamping, safe token resolution during load, and backend-aligned vision delegate selection.
Embedding residency tracking and load timeout
src/services/modelResidency/policy.ts, src/services/rag/embedding.ts, __tests__/unit/services/rag/embedding.test.ts
Adds embedding residency to the policy, wraps embedding model load in a timeout with orphan cleanup, and releases residency on unload.
Persistent tool-embedding cache via AsyncStorage
src/services/toolEmbeddingRouter.ts, __tests__/unit/services/toolEmbeddingRouter.test.ts
Persists tool embeddings by content hash, hydrates cache before selection, and adds a reset helper for tests.
Download retry extraction and STT handling
src/screens/DownloadManagerScreen/retryHandlers.ts, src/screens/DownloadManagerScreen/useDownloadManager.ts, __tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts, __tests__/rntl/components/TranscriptionModelsTab.test.tsx
Extracts retry logic into a shared module, delegates retry execution from the hook, and adds whisper retry coverage.
Canonical STT download-state tracking
src/screens/ModelsScreen/TranscriptionModelsTab.tsx, __tests__/rntl/components/TranscriptionModelsTab.test.tsx
Uses the download store as the source of truth for STT download state and falls back to legacy progress when needed.
iOS download and CoreML hardening
ios/DownloadManagerModule.swift, ios/CoreMLDiffusionModule.swift, patches/@kesha-antonov+react-native-background-downloader+4.5.6.patch, __tests__/unit/services/audioRecorderService.test.ts
Switches to a foreground download session, adds request shaping, changes CoreML compute-unit retry behavior, and moves downloader event emission onto the main queue.
Gateway discovery and model kind filtering
src/services/networkDiscovery.ts, src/stores/remoteServerHelpers.ts, __tests__/unit/services/networkDiscovery.test.ts, __tests__/integration/stores/remoteServerDiscovery.test.ts
Adds the gateway LAN provider and filters /v1/models results by explicit kind when present.
Off Grid Desktop links and deferred model selection
src/constants/index.ts, src/screens/ProDetailScreen/..., src/components/settings/ProUpsellBanner.tsx, src/screens/ModelsScreen/VoiceModelsUpsell.tsx, src/screens/RemoteServersScreen.*, src/screens/ModelDownloadScreen.tsx, src/components/RemoteServerModal/index.tsx, src/screens/SettingsScreen.tsx, src/screens/ChatScreen/index.tsx, src/components/ModelSelectorModal/..., src/utils/toast.ts, __tests__/rntl/...
Adds Off Grid AI Desktop links and copy updates across the UI, defers model-row opening until the manager sheet closes, updates deferred-load selector state, and adds cross-platform toast support.
Stability fix plan and guidance
docs/STABILITY_FIX_PLAN.md, CLAUDE.md
Adds the fix-plan document and updates repository/test guidance.

Marketing Articles

Layer / File(s) Summary
New marketing documentation articles
marketing/devto/*
Adds five markdown articles covering offline PDF chat, desktop model connectivity, local tool calling, voice transcription, and offline vision.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hop through caches, soft and sly,
With context trimmed and downloads nigh.
The Gateway greets the local lane,
And Desktop links light up again.
My whiskers twitch at memory's glow—
Off Grid now hums in steady flow ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR is detailed, but it does not follow the required template and omits the Type of Change, screenshots, checklist, and related issues sections. Reformat the description to match the template and add the missing sections, especially screenshots for UI changes and the checklist entries.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately highlights the main stability, performance, and deferred-load fixes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/llm-stability-and-perf

Comment @coderabbitai help to get the list of available commands.

alichherawalla and others added 2 commits June 28, 2026 05:14
The iOS 2.0x multiplier under-budgeted the GPU-primary load path: the GPU keeps
diffusion buffers in system RAM (more than the ANE), so the real peak is ~2.5x
the file size. 2.0x let the residency gate allow a load that then OOM/jetsam-
crashed on a 6GB iPhone 15. Restore the flat 2.5x so the gate correctly blocks
the load on devices that can't fit it (graceful 'not enough memory') while 8GB+
devices still fit. Reverts 96b82e5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decide GPU vs Neural Engine for iOS image gen from device RAM (iOS 26+: GPU when
>=7GB so the 8GB iPhone 15 Pro avoids the failing ANE load; ANE otherwise so the
6GB iPhone 15 uses the lower-system-RAM path instead of OOMing on GPU). The
residency estimate follows the same decision (GPU 2.5x, ANE 1.8x) so the gate
admits an ANE load that fits rather than refusing it. Native wiring follows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (16)
src/services/modelResidency/policy.ts-80-83 (1)

80-83: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Align the sidecar comment with the eviction algorithm.

Line 80 says these sidecars are "never evicted for capacity", but Lines 107-118 still select SIDECAR_TYPES as last-resort eviction candidates. Either update the comment to reflect "evicted last" behavior or filter sidecars out of the candidate set if they must be pinned.

Proposed comment fix
-  // Speech (whisper), TTS, and the RAG/MCP embedding model are small always-resident
-  // sidecars: never evicted for capacity, and they never trigger eviction of the
-  // active generation model. Only text and image are heavy enough to swap.
+  // Speech (whisper), TTS, and the RAG/MCP embedding model are small sidecars:
+  // loading them never evicts the active generation model. When a generation
+  // model needs memory, they are retained until non-sidecars are exhausted and
+  // may then be evicted as a last resort.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/modelResidency/policy.ts` around lines 80 - 83, The comment for
SIDECAR_TYPES in policy.ts does not match the eviction behavior in the residency
policy. Update the comment near the SIDECAR_TYPES definition to say these models
are evicted last instead of “never evicted for capacity,” or, if they truly must
be pinned, adjust the eviction logic in the residency selection code that uses
SIDECAR_TYPES so they are excluded from candidate eviction.
src/services/rag/embedding.ts-22-34 (1)

22-34: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Clear the timeout when native init fails before the deadline.

If initLlama rejects before Line 26 fires, the promise.then(...) branch rejects and the timer is never cleared. Track whether the timeout actually fired, attach orphan cleanup only then, and clear the timer in finally.

Proposed timeout cleanup fix
 function withTimeout<T>(promise: Promise<T>, opts: { ms: number; message: string; onOrphan: (v: T) => void }): Promise<T> {
   const { ms, message, onOrphan } = opts;
-  let timer: ReturnType<typeof setTimeout>;
+  let timedOut = false;
+  let timer: ReturnType<typeof setTimeout> | null = null;
   const timeout = new Promise<never>((_, reject) => {
-    timer = setTimeout(() => reject(new Error(message)), ms);
+    timer = setTimeout(() => {
+      timedOut = true;
+      reject(new Error(message));
+    }, ms);
   });
   return Promise.race([
-    promise.then(v => { clearTimeout(timer); return v; }),
+    promise,
     timeout,
   ]).catch(err => {
-    promise.then(onOrphan).catch(() => { /* underlying load failed too — nothing to clean up */ });
+    if (timedOut) {
+      promise.then(onOrphan).catch(() => { /* underlying load failed too — nothing to clean up */ });
+    }
     throw err;
+  }).finally(() => {
+    if (timer) clearTimeout(timer);
   });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/rag/embedding.ts` around lines 22 - 34, The withTimeout helper
leaves its timer running when the wrapped promise rejects before the deadline,
so update the cleanup logic in withTimeout to always clear the timeout in a
finally path and only run the orphan-handling promise.then(onOrphan) when the
timeout has actually fired. Use the existing withTimeout, timeout, and onOrphan
flow to track whether the deadline was reached, and make sure initLlama
rejection does not leave a dangling timer.
__tests__/unit/services/rag/embedding.test.ts-89-101 (1)

89-101: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Restore timers in finally and cover orphan release.

If an assertion fails before Line 100, later tests inherit fake timers. Wrap this test in try/finally; while touching it, make initLlama resolve after the timeout and assert mockRelease runs so the orphan cleanup behavior is covered. Based on learnings, test every approved behavior change in the same pass, including bug fixes, new branches/conditions, and contract changes that other code or tests depend on.

Proposed test hardening
     it('rejects and does not register if the native load times out (F5)', async () => {
       jest.useFakeTimers();
-      // initLlama never resolves → the timeout must fire and release the lock.
-      mockInitLlama.mockReturnValue(new Promise(() => {}) as any);
-      const loadPromise = embeddingService.load().catch((e: Error) => e);
-      await jest.advanceTimersByTimeAsync(31000);
-      const result = await loadPromise;
-      expect(result).toBeInstanceOf(Error);
-      expect((result as Error).message).toMatch('timed out');
-      expect(embeddingService.isLoaded()).toBe(false);
-      expect(modelResidencyManager.isResident('embedding')).toBe(false);
-      jest.useRealTimers();
+      try {
+        let resolveNative!: (ctx: unknown) => void;
+        mockInitLlama.mockReturnValue(new Promise(resolve => { resolveNative = resolve; }) as any);
+        const loadPromise = embeddingService.load().catch((e: Error) => e);
+
+        await jest.advanceTimersByTimeAsync(31000);
+        const result = await loadPromise;
+        expect(result).toBeInstanceOf(Error);
+        expect((result as Error).message).toMatch('timed out');
+        expect(embeddingService.isLoaded()).toBe(false);
+        expect(modelResidencyManager.isResident('embedding')).toBe(false);
+
+        resolveNative({ release: mockRelease });
+        await Promise.resolve();
+        expect(mockRelease).toHaveBeenCalled();
+      } finally {
+        jest.useRealTimers();
+      }
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/unit/services/rag/embedding.test.ts` around lines 89 - 101, The
embedding timeout test leaves fake timers enabled if an assertion fails, so wrap
the body of the `embeddingService.load()` scenario in a try/finally and always
restore real timers in `finally`. While updating
`__tests__/unit/services/rag/embedding.test.ts`, also add coverage for the
orphan cleanup path by having `mockInitLlama` resolve after the timeout and
asserting `mockRelease` is called, alongside the existing `load`, `isLoaded`,
and `modelResidencyManager` expectations.

Source: Learnings

src/services/toolEmbeddingRouter.ts-45-48 (1)

45-48: 🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick win

Serialize first-time cache hydration.

Setting hydrated = true before AsyncStorage.getItem completes lets a concurrent first routing call skip hydration and proceed with an empty in-memory cache, reintroducing the cold embedding burst despite persisted data. Store and await a shared hydration promise.

Proposed hydration guard
 const toolEmbeddingCache = new Map<string, CacheEntry>();
 let hydrated = false;
+let hydratePromise: Promise<void> | null = null;
 let saveTimer: ReturnType<typeof setTimeout> | null = null;
@@
 async function hydrateCache(): Promise<void> {
   if (hydrated) return;
-  hydrated = true;
-  try {
-    const raw = await AsyncStorage.getItem(CACHE_STORAGE_KEY);
-    if (!raw) return;
-    const parsed = JSON.parse(raw) as Record<string, CacheEntry>;
-    for (const [name, entry] of Object.entries(parsed)) {
-      if (entry && typeof entry.h === 'string' && Array.isArray(entry.v)) toolEmbeddingCache.set(name, entry);
+  if (hydratePromise) return hydratePromise;
+  hydratePromise = (async () => {
+    try {
+      const raw = await AsyncStorage.getItem(CACHE_STORAGE_KEY);
+      if (!raw) return;
+      const parsed = JSON.parse(raw) as Record<string, CacheEntry>;
+      for (const [name, entry] of Object.entries(parsed)) {
+        if (entry && typeof entry.h === 'string' && Array.isArray(entry.v)) toolEmbeddingCache.set(name, entry);
+      }
+      logger.log(`[ToolRouter] hydrated ${toolEmbeddingCache.size} cached tool embeddings`);
+    } catch (e) {
+      logger.warn(`[ToolRouter] failed to hydrate embedding cache: ${String(e)}`);
+    } finally {
+      hydrated = true;
+      hydratePromise = null;
     }
-    logger.log(`[ToolRouter] hydrated ${toolEmbeddingCache.size} cached tool embeddings`);
-  } catch (e) {
-    logger.warn(`[ToolRouter] failed to hydrate embedding cache: ${String(e)}`);
-  }
+  })();
+  return hydratePromise;
 }
@@
 export function _resetToolEmbeddingCache(): void {
   toolEmbeddingCache.clear();
   hydrated = false;
+  hydratePromise = null;
   if (saveTimer) { clearTimeout(saveTimer); saveTimer = null; }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/toolEmbeddingRouter.ts` around lines 45 - 48, The first-time
cache hydration in hydrateCache is being marked complete before
AsyncStorage.getItem finishes, which can let concurrent callers skip loading
persisted data and use an empty cache. Change toolEmbeddingRouter so
hydrateCache stores a shared hydration promise and awaits it for all callers,
setting the hydrated state only after the async load and cache population
completes.
__tests__/unit/services/hardware.branches.test.ts-76-97 (1)

76-97: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Restore Platform.OS after this suite.

These tests mutate a process-global and leave the last case as iOS, so any later tests in this file can inherit the wrong platform. Save the original value and restore it in afterEach to keep the suite order-independent.

Suggested fix
 describe('estimateImageModelRam', () => {
+  const originalOS = Platform.OS;
+
+  afterEach(() => {
+    Platform.OS = originalOS;
+  });
+
   it('budgets 2.5x the model total size on Android (ONNX/QNN reserves NPU memory)', () => {
     Platform.OS = 'android';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/unit/services/hardware.branches.test.ts` around lines 76 - 97, The
hardware.branches test suite mutates the global Platform.OS and leaves it set to
iOS, which can leak into later tests. In the hardwareService
estimateImageModelRam branch tests, save the original Platform.OS before
changing it and restore it in an afterEach so each case stays isolated and
order-independent.
src/screens/ModelSettingsScreen/TextGenerationSection.tsx-72-72 (1)

72-72: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Make the warning copy concrete and guideline-compliant.

Both warnings still use vague "significant RAM / some devices" wording, and Line 72 also introduces an em dash. This screen already knows the threshold, so the warning should state that threshold directly instead of using generic copy.

Suggested fix
-        warning={maxTokens > contextWarnThreshold ? 'High context uses significant RAM — may slow or crash on some devices' : null}
+        warning={maxTokens > contextWarnThreshold ? `Values above ${formatContext(contextWarnThreshold)} use more RAM and can slow generation or fail to load on lower-memory devices` : null}
...
-        warning={contextLength > 8192 ? 'High context uses significant RAM and may crash on some devices' : null}
+        warning={contextLength > 8192 ? 'Context lengths above 8K use more RAM and can fail to load on lower-memory devices' : null}

As per coding guidelines, "Content must use proof-first language, stating measurable specifics instead of vague claims" and "Do not use em dashes in copy."

Also applies to: 127-127

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/screens/ModelSettingsScreen/TextGenerationSection.tsx` at line 72, The
warning copy in TextGenerationSection should be made concrete and
guideline-compliant instead of using vague RAM/device language and an em dash.
Update the warning text used for the maxTokens/contextWarnThreshold check, and
the matching warning at the other referenced location, to state the actual
threshold or measurable condition directly using proof-first language. Keep the
warning tied to the existing maxTokens and contextWarnThreshold logic, and
remove any em dash from the copy.

Source: Coding guidelines

marketing/devto/connect-phone-ollama-lmstudio.md-9-9 (1)

9-9: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Qualify the 14B vs 4B claim. The current wording reads like a fixed cutoff, but the app's own guidance ties results to device memory and quantization. Add the benchmark setup or soften the numbers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@marketing/devto/connect-phone-ollama-lmstudio.md` at line 9, Soften the 14B
vs 4B comparison in the opening copy and avoid presenting it as a hard cutoff.
In the markdown content, update the wording in the intro paragraph to either
qualify the claim with the benchmark/setup used (device memory, GPU, and
quantization) or rephrase it as a general example of expected performance. Keep
the language aligned with the guidance in the article so the statement matches
the behavior described by the phone-to-desktop workflow.

Source: Coding guidelines

marketing/devto/vision-ai-phone-camera-offline.md-31-32 (1)

31-32: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Soften the seven-second claim. The line appears twice; tie it to a specific device, model, prompt length, and OS version, or drop the number.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@marketing/devto/vision-ai-phone-camera-offline.md` around lines 31 - 32, The
performance claim in the article is too absolute and is repeated without enough
context; update the wording in the recommendation section to avoid stating
“about seven seconds” as a general result. Use a specific benchmark tied to a
concrete device, model, prompt length, and OS version if you keep the timing, or
remove the number and replace it with a softer, qualified statement in the
relevant copy.

Source: Coding guidelines

marketing/devto/connect-phone-ollama-lmstudio.md-101-103 (1)

101-103: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Remove the Anthropic-compatible claim. marketing/devto/connect-phone-ollama-lmstudio.md:102 only ships an OpenAI-compatible remote provider path; the Anthropic support here is just shared parsing/types, so this FAQ overstates the feature set.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@marketing/devto/connect-phone-ollama-lmstudio.md` around lines 101 - 103, The
FAQ in connect-phone-ollama-lmstudio.md overstates provider support by claiming
Anthropic-compatible servers work. Update the “Which servers work?” answer to
mention only the OpenAI-compatible API path currently supported here (for
example Ollama, LM Studio, LocalAI, and similar), and remove the
Anthropic-compatible statement so the marketing copy matches the actual remote
provider behavior.

Source: Coding guidelines

docs/STABILITY_FIX_PLAN.md-1-163 (1)

1-163: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Replace em dashes throughout this doc.

This file uses em dashes in the title, bullets, and section text, but the docs style guide forbids them. Please convert them to plain hyphens consistently. As per coding guidelines, "Do not use em dashes in copy; use a hyphen instead."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/STABILITY_FIX_PLAN.md` around lines 1 - 163, Replace all em dashes in
STABILITY_FIX_PLAN.md with plain hyphens, including the title, bullets, and
section prose, and keep the wording otherwise unchanged. Review the document for
any remaining "—" characters and update the surrounding text consistently so the
style guide rule is followed throughout.

Source: Coding guidelines

docs/STABILITY_FIX_PLAN.md-163-164 (1)

163-164: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Remove the stray closing tags at the end of the doc.

</content> and </invoke> look accidental and will render as garbage in the markdown output.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/STABILITY_FIX_PLAN.md` around lines 163 - 164, The document ends with
stray closing tags that should not be in the markdown output. Remove the
accidental `</content>` and `</invoke>` text from the end of STABILITY_FIX_PLAN
so the rendered doc is clean; this is just a cleanup in the markdown content,
not a functional change.
src/screens/SettingsScreen.tsx-188-188 (1)

188-188: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Avoid and more in this nav copy.

and more is vague. Please replace it with the concrete supported category, e.g. another OpenAI-compatible server, so the row describes an actual capability instead of a catch-all. As per coding guidelines, "Content must use proof-first language, stating measurable specifics instead of vague claims."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/screens/SettingsScreen.tsx` at line 188, Update the Remote Servers nav
copy in SettingsScreen so it removes the vague “and more” phrasing and replaces
it with a concrete supported capability, such as “another OpenAI-compatible
server.” Keep the change localized to the settings row object with title Remote
Servers so the description uses proof-first, specific language instead of a
catch-all claim.

Source: Coding guidelines

docs/STABILITY_FIX_PLAN.md-6-6 (1)

6-6: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Fix the heading level jump at Line 6.

### Progress skips ##, which breaks markdown structure and matches the linter warning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/STABILITY_FIX_PLAN.md` at line 6, The markdown heading in the fix plan
has a level jump because Progress is written as a third-level heading without an
intermediate second-level section. Update the heading for Progress in the
document to use the correct hierarchy so it follows the surrounding section
structure and satisfies the markdown linter.

Source: Linters/SAST tools

src/screens/RemoteServersScreen.tsx-141-143 (1)

141-143: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Replace the vague server wording.

other LLM servers is too broad for the copy rule here. Please name the supported class directly, e.g. another OpenAI-compatible server on your network, so the UI states the contract instead of a catch-all phrase. As per coding guidelines, "Content must use proof-first language, stating measurable specifics instead of vague claims."

Also applies to: 247-249

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/screens/RemoteServersScreen.tsx` around lines 141 - 143, The empty-state
copy in RemoteServersScreen should avoid the vague phrase “other LLM servers”
and instead state the supported contract directly. Update the text rendered in
the empty state to name the class of supported endpoint explicitly, using the
same wording consistently in the other affected copy block, so the messaging is
proof-first and specific (for example, “another OpenAI-compatible server on your
network”).

Source: Coding guidelines

src/screens/ModelsScreen/VoiceModelsUpsell.tsx-35-43 (1)

35-43: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Surface desktop-link failures instead of swallowing them.

If openURL rejects here, the CTA becomes a dead tap. This PR already adds showToast() for short cross-platform hints, so this should report the failure instead of using an empty catch.

Suggested fix
 import { View, Text, TouchableOpacity, Linking } from 'react-native';
 import Icon from 'react-native-vector-icons/Feather';
 import { Button } from '../../components';
 import { useTheme, useThemedStyles } from '../../theme';
 import type { ThemeColors } from '../../theme';
 import { TYPOGRAPHY, SPACING, OFF_GRID_DESKTOP_URL } from '../../constants';
+import { showToast } from '../../utils/toast';
@@
       <TouchableOpacity
         style={styles.desktopLink}
-        onPress={() => Linking.openURL(OFF_GRID_DESKTOP_URL).catch(() => {})}
+        onPress={() => Linking.openURL(OFF_GRID_DESKTOP_URL).catch(() => {
+          showToast('Could not open Off Grid AI Desktop link.');
+        })}
         accessibilityRole="link"
         accessibilityLabel="Get Off Grid AI Desktop"
       >
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/screens/ModelsScreen/VoiceModelsUpsell.tsx` around lines 35 - 43, The
desktop CTA in VoiceModelsUpsell is swallowing Linking.openURL failures, which
leaves the tap with no feedback. Update the TouchableOpacity onPress handler in
VoiceModelsUpsell to catch the rejection and surface a short user-facing error
via showToast instead of an empty catch, using the existing OFF_GRID_DESKTOP_URL
and the current desktop link action for context.
src/screens/ModelDownloadScreen.tsx-250-250 (1)

250-250: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Replace the em dash in the alert title.

Line 250 adds user-facing copy with an em dash, which violates the copy rules for this repo.

As per coding guidelines, "Do not use em dashes in copy; use a hyphen instead."

Suggested fix
-        setAlertState(showAlert('Connected — No Models Found', `${server.name} is reachable but has no models loaded. Start a model in Off Grid AI Desktop, Ollama, or LM Studio, then reconnect.`));
+        setAlertState(showAlert('Connected - No Models Found', `${server.name} is reachable but has no models loaded. Start a model in Off Grid AI Desktop, Ollama, or LM Studio, then reconnect.`));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/screens/ModelDownloadScreen.tsx` at line 250, The user-facing alert title
in ModelDownloadScreen’s showAlert call uses an em dash, which violates the repo
copy rule; update the connected/no-models message to use a hyphen instead of the
em dash while keeping the rest of the text unchanged. Locate the string in the
setAlertState(showAlert(...)) call within ModelDownloadScreen and replace the
title copy accordingly.

Source: Coding guidelines

🧹 Nitpick comments (6)
__tests__/integration/stores/remoteServerDiscovery.test.ts (1)

571-585: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Cover the no-kind fallback with a non-generative model.

This only proves that a generative ID survives when kind is missing. Add an embedding or reranker ID to the same payload and assert it is still excluded, otherwise a regression in isTextModel() can leak non-chat models into discoveredModels. Based on learnings, test every approved behavior change in the same pass, including new branches/conditions and contract changes that other code or tests depend on.

Suggested test extension
       mockFetch.mockImplementation((url: string) => {
         if (url.endsWith('/v1/models')) {
           return Promise.resolve(
-            jsonResponse({ object: 'list', data: [{ id: 'llama-3.2' }] }),
+            jsonResponse({
+              object: 'list',
+              data: [{ id: 'llama-3.2' }, { id: 'bge-base-en-v1.5' }],
+            }),
           );
         }
         return Promise.resolve(jsonResponse({}, false, 404));
       });

       const models = await useRemoteServerStore.getState().discoverModels('srv-plain');
       expect(models.map((m) => m.id)).toEqual(['llama-3.2']);
+      expect(models.some((m) => m.id === 'bge-base-en-v1.5')).toBe(false);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/integration/stores/remoteServerDiscovery.test.ts` around lines 571
- 585, The no-kind fallback test in remoteServerDiscovery should also verify
non-generative models are excluded, not just that a chat model survives. Update
the discovery payload in the existing `discoverModels('srv-plain')` test to
include an embedding or reranker entry alongside `llama-3.2`, then assert only
the generative model is returned. Keep the check focused on the
`useRemoteServerStore` discovery path and the `isTextModel()` filtering behavior
so regressions in `discoveredModels` are caught.

Source: Learnings

__tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts (1)

192-220: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add the STT early-failure regression case.

These tests cover the happy path, but not the branch where whisperService.downloadModel() rejects before a new row is registered. That is the risky edge in the new retry flow, and a regression test here would lock it down. Based on learnings, "Test every approved behavior change in the same pass, including bug fixes, new branches/conditions, and contract or copy changes that other code or tests depend on."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@__tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts`
around lines 192 - 220, Add a regression test in
useDownloadManager.branches.test.ts for the STT retry path where
useDownloadManager.handleRetryDownload calls whisperService.downloadModel and
that promise rejects before a new row is created. Reuse the existing
handleRetryDownload, mockWhisperService.downloadModel, and
mockBackgroundDownloadService assertions to verify the failed retry is
surfaced/handled as expected, while ensuring the stale row cleanup and cancel
behavior are still covered by the stt retry branch.

Source: Learnings

__tests__/rntl/components/TranscriptionModelsTab.test.tsx (1)

165-170: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Assert the card itself cannot re-trigger downloads.

This test says the card is not tappable, but only checks that the nested download button is absent. Add a press on transcription-model-card-0 and assert downloadModel was not called.

Proposed test strengthening
   it('treats an active STT download-store entry as downloading (no re-download affordance)', () => {
     seedSttDownload('tiny.en', 'running', 0.6);
-    const { queryByTestId } = render(<TranscriptionModelsTab />);
+    const { getByTestId, queryByTestId } = render(<TranscriptionModelsTab />);
     // Downloading → no download button and the card is not tappable to re-download.
     expect(queryByTestId('transcription-model-card-0-download')).toBeNull();
+    fireEvent.press(getByTestId('transcription-model-card-0'));
+    expect(mockWhisperActions.downloadModel).not.toHaveBeenCalled();
   });

Based on learnings, test every approved behavior change in the same pass. <retrieved_learnings>

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/rntl/components/TranscriptionModelsTab.test.tsx` around lines 165 -
170, The current TranscriptionModelsTab test only verifies that the nested
download affordance is hidden, but it does not prove the card itself cannot
start a re-download. Strengthen the existing test in
TranscriptionModelsTab.test.tsx by triggering a press on the
transcription-model-card-0 element and asserting downloadModel is not called,
using the same seeded running STT state so the behavior is covered in one pass.

Source: Learnings

__tests__/rntl/screens/RemoteServersScreen.test.tsx (1)

143-145: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the new desktop actions actually open the URL.

These checks only prove the label exists. If the onPress handler is removed or wired to the wrong URL, the tests still pass. Please invoke the captured alert action and the visible link, then assert Linking.openURL(OFF_GRID_DESKTOP_URL) was called. Based on learnings, "Test every approved behavior change in the same pass, including bug fixes, new branches/conditions, and contract or copy changes that other code or tests depend on."

Also applies to: 527-533

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/rntl/screens/RemoteServersScreen.test.tsx` around lines 143 - 145,
The empty-state desktop action test only checks that the “Get Off Grid AI
Desktop” label renders, so it won’t catch a broken or miswired press handler.
Update the affected tests around RemoteServersScreen to trigger the captured
alert action and the visible link, then assert that Linking.openURL is called
with OFF_GRID_DESKTOP_URL. Use the existing RemoteServersScreen and
OFF_GRID_DESKTOP_URL symbols so the test verifies the actual behavior, not just
the copy.

Source: Learnings

src/screens/ProDetailScreen/index.tsx (1)

54-54: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Extract the desktop CTA into a shared component/helper.

This same OFF_GRID_DESKTOP_URL + Linking.openURL(...).catch(() => {}) + "Get Off Grid AI Desktop" pattern already exists in src/screens/RemoteServersScreen.tsx:144-152, and this PR is adding it to several more surfaces. Keeping each copy inline will make text, accessibility props, and failure handling drift. Based on learnings, "Before writing any new component, style, hook, or service, search for an existing one and reuse it instead of building a parallel version" and "two screens that show the same kind of thing must use the same component."

Also applies to: 165-184, 305-330

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/screens/ProDetailScreen/index.tsx` at line 54, The desktop CTA logic is
duplicated across ProDetailScreen and RemoteServersScreen, so extract the shared
OFF_GRID_DESKTOP_URL / Linking.openURL(...).catch(() => {}) / “Get Off Grid AI
Desktop” behavior into a reusable helper or component and have ProDetailScreen
use it instead of inline copies. Centralize the tap handler and CTA text/props
in a shared symbol (for example a dedicated desktop CTA component or helper used
by ProDetailScreen and RemoteServersScreen) so future surfaces can reuse the
same implementation consistently.

Source: Learnings

__tests__/rntl/components/RemoteServerModal.test.tsx (1)

114-114: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use stable selectors for these inputs.

Lines 114, 206, 244, 313, and 410 couple the tests to placeholder copy, so every marketing/example-text tweak forces unrelated test churn. Prefer a stable testID or accessibility label for the server name and endpoint fields, and keep placeholder assertions separate if you want copy coverage.

Also applies to: 206-213, 243-245, 312-314, 408-410

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/rntl/components/RemoteServerModal.test.tsx` at line 114, The
RemoteServerModal tests are coupled to placeholder text via the server name and
endpoint field queries, so they will churn when copy changes. Update the
relevant assertions in RemoteServerModal.test.tsx to use stable selectors from
the component, such as testID or accessibility labels, by locating the inputs
used around VALID_ENDPOINT and the related render/getBy* checks in the affected
test cases. Keep any placeholder-specific assertions separate if you still want
coverage for the copy itself.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@android/app/src/main/java/ai/offgridmobile/litert/LiteRTModule.kt`:
- Around line 595-597: The vision backend selection in visionBackendFor() still
returns Backend.GPU() even when the existing skipGpu guard should block GPU
usage, which can reintroduce the Pixel 10 crash path. Update
visionBackendFor(mainBackend) in LiteRTModule to consult the same device gate
used by buildBackendChain() and return CPU vision whenever GPU is disallowed,
including for the NPU-backed path.
- Around line 80-84: The clamp logic in LiteRTModule.clampMaxTokens still forces
MIN_TOKEN_FLOOR when kvBudgetMb is non-positive, which can overcommit memory.
Update the early return so it does not default to 1024 tokens; instead, either
reject the load path or cap to the highest affordable token count based on
availMb, modelMb, TOKEN_BUDGET_HEADROOM_MB, and KV_MB_PER_TOKEN. Keep the fix
localized to clampMaxTokens and preserve the existing affordability calculation
used for the requested token clamp.

In `@ios/CoreMLDiffusionModule.swift`:
- Around line 192-199: The `cpuOnly` flag is being ignored in the
`CoreMLDiffusionModule` pipeline setup, so the current `primaryUnits` selection
can still use GPU-backed compute. Update the logic around `preferNeuralEngine`
and `primaryUnits` so `cpuOnly` maps directly to `.cpuOnly`, and only
non-CPU-only loads choose between `.cpuAndNeuralEngine` and `.cpuAndGPU` in the
same `StableDiffusionPipelineProtocol` setup path.

In `@ios/DownloadManagerModule.swift`:
- Around line 285-302: The persisted download state in DownloadManagerModule’s
restoreTasksFromSession/getActiveDownloads flow can outlive the foreground
URLSessionDownloadTask after relaunch, so missing active records are never
reconciled. Update restoreTasksFromSession to detect persisted running/pending
downloads that are not returned by the session, and mark those records inactive
(failed/paused) or remove them before JS hydration so stale work is not
rehydrated as still running.

In `@src/components/ModelSelectorModal/TextTab.tsx`:
- Around line 106-119: The active-row highlighting in TextTab is still using
selectedModelPath even after a remote model becomes current, so a stale local
model can appear selected alongside the remote one. Update the isSelected logic
in the model row rendering to only apply the deferred local highlight when
currentRemoteModelId is null, keeping the load-on-tap behavior intact while
ensuring only the correct model is marked active.

In `@src/screens/DownloadManagerScreen/retryHandlers.ts`:
- Around line 185-198: The STT retry flow in retryWhisperDownload removes the
existing store row before whisperService.downloadModel has successfully
registered the replacement, so a startup failure can leave the Download Manager
with no row at all. Update retryWhisperDownload to keep the failed item in
useDownloadStore until the new download is registered, or add catch-path
recovery that restores a failed row if downloadModel rejects; use the existing
retryWhisperDownload, useDownloadStore.getState().remove, and
whisperService.downloadModel hooks to keep the row visible during retry.

In `@src/screens/RemoteServersScreen.styles.ts`:
- Around line 57-67: The `desktopLink` and `desktopLinkText` styles use
hardcoded spacing and font sizing instead of design-system tokens. Replace
`gap`, `marginTop`, and `paddingVertical` in `desktopLink` with the appropriate
`SPACING` values, and replace the `fontSize` in `desktopLinkText` with the
matching `TYPOGRAPHY` token so the CTA stays consistent with the rest of the UI.

In `@src/services/llm.ts`:
- Around line 97-103: resolveSafeContext() is only testing a fixed fallback list
capped at 4096, so it can miss larger safe contexts and return too small a
value. Update the fallback generation in llm.ts to search downward from
requestedCtx in sensible steps (instead of hardcoding [4096, 3072, 2048, 1024])
so values like 12288 or 8192 are tried before smaller ones. Keep the existing
checkMemoryForModel and logger.warn flow, but ensure the first safe ctxLen
returned is the largest fitting context.
- Around line 71-73: The memory guard in llm loading is using the stored cache
setting only, so OpenCL paths can still be checked as quantized even when the
effective KV cache becomes f16. Update the guard setup around
checkMemoryForModel in the llm service to derive quantizedCache from the actual
load mode, and force it to false for OpenCL loads so resolveSafeContext() uses
the f16 estimate. Use the existing llm.ts flow and symbols like
settings.cacheType, checkMemoryForModel, and resolveSafeContext to locate the
fix.

In `@src/stores/appStore.ts`:
- Around line 198-205: The rehydrate migration in appStore is too broad because
it uses >= against MCP_BOOST_CTX_CEILING and MCP_BOOST_MAX_OUTPUT_TOKENS, which
can clobber valid persisted user settings. Update the migration logic around the
contextLength/maxTokens and liteRTMaxTokens reset paths to only undo the old MCP
boost when the stored values exactly match the boosted constants, preserving any
legitimately higher user-configured values. Keep the change localized to the
rehydration block in the appStore migration.

---

Minor comments:
In `@__tests__/unit/services/hardware.branches.test.ts`:
- Around line 76-97: The hardware.branches test suite mutates the global
Platform.OS and leaves it set to iOS, which can leak into later tests. In the
hardwareService estimateImageModelRam branch tests, save the original
Platform.OS before changing it and restore it in an afterEach so each case stays
isolated and order-independent.

In `@__tests__/unit/services/rag/embedding.test.ts`:
- Around line 89-101: The embedding timeout test leaves fake timers enabled if
an assertion fails, so wrap the body of the `embeddingService.load()` scenario
in a try/finally and always restore real timers in `finally`. While updating
`__tests__/unit/services/rag/embedding.test.ts`, also add coverage for the
orphan cleanup path by having `mockInitLlama` resolve after the timeout and
asserting `mockRelease` is called, alongside the existing `load`, `isLoaded`,
and `modelResidencyManager` expectations.

In `@docs/STABILITY_FIX_PLAN.md`:
- Around line 1-163: Replace all em dashes in STABILITY_FIX_PLAN.md with plain
hyphens, including the title, bullets, and section prose, and keep the wording
otherwise unchanged. Review the document for any remaining "—" characters and
update the surrounding text consistently so the style guide rule is followed
throughout.
- Around line 163-164: The document ends with stray closing tags that should not
be in the markdown output. Remove the accidental `</content>` and `</invoke>`
text from the end of STABILITY_FIX_PLAN so the rendered doc is clean; this is
just a cleanup in the markdown content, not a functional change.
- Line 6: The markdown heading in the fix plan has a level jump because Progress
is written as a third-level heading without an intermediate second-level
section. Update the heading for Progress in the document to use the correct
hierarchy so it follows the surrounding section structure and satisfies the
markdown linter.

In `@marketing/devto/connect-phone-ollama-lmstudio.md`:
- Line 9: Soften the 14B vs 4B comparison in the opening copy and avoid
presenting it as a hard cutoff. In the markdown content, update the wording in
the intro paragraph to either qualify the claim with the benchmark/setup used
(device memory, GPU, and quantization) or rephrase it as a general example of
expected performance. Keep the language aligned with the guidance in the article
so the statement matches the behavior described by the phone-to-desktop
workflow.
- Around line 101-103: The FAQ in connect-phone-ollama-lmstudio.md overstates
provider support by claiming Anthropic-compatible servers work. Update the
“Which servers work?” answer to mention only the OpenAI-compatible API path
currently supported here (for example Ollama, LM Studio, LocalAI, and similar),
and remove the Anthropic-compatible statement so the marketing copy matches the
actual remote provider behavior.

In `@marketing/devto/vision-ai-phone-camera-offline.md`:
- Around line 31-32: The performance claim in the article is too absolute and is
repeated without enough context; update the wording in the recommendation
section to avoid stating “about seven seconds” as a general result. Use a
specific benchmark tied to a concrete device, model, prompt length, and OS
version if you keep the timing, or remove the number and replace it with a
softer, qualified statement in the relevant copy.

In `@src/screens/ModelDownloadScreen.tsx`:
- Line 250: The user-facing alert title in ModelDownloadScreen’s showAlert call
uses an em dash, which violates the repo copy rule; update the
connected/no-models message to use a hyphen instead of the em dash while keeping
the rest of the text unchanged. Locate the string in the
setAlertState(showAlert(...)) call within ModelDownloadScreen and replace the
title copy accordingly.

In `@src/screens/ModelSettingsScreen/TextGenerationSection.tsx`:
- Line 72: The warning copy in TextGenerationSection should be made concrete and
guideline-compliant instead of using vague RAM/device language and an em dash.
Update the warning text used for the maxTokens/contextWarnThreshold check, and
the matching warning at the other referenced location, to state the actual
threshold or measurable condition directly using proof-first language. Keep the
warning tied to the existing maxTokens and contextWarnThreshold logic, and
remove any em dash from the copy.

In `@src/screens/ModelsScreen/VoiceModelsUpsell.tsx`:
- Around line 35-43: The desktop CTA in VoiceModelsUpsell is swallowing
Linking.openURL failures, which leaves the tap with no feedback. Update the
TouchableOpacity onPress handler in VoiceModelsUpsell to catch the rejection and
surface a short user-facing error via showToast instead of an empty catch, using
the existing OFF_GRID_DESKTOP_URL and the current desktop link action for
context.

In `@src/screens/RemoteServersScreen.tsx`:
- Around line 141-143: The empty-state copy in RemoteServersScreen should avoid
the vague phrase “other LLM servers” and instead state the supported contract
directly. Update the text rendered in the empty state to name the class of
supported endpoint explicitly, using the same wording consistently in the other
affected copy block, so the messaging is proof-first and specific (for example,
“another OpenAI-compatible server on your network”).

In `@src/screens/SettingsScreen.tsx`:
- Line 188: Update the Remote Servers nav copy in SettingsScreen so it removes
the vague “and more” phrasing and replaces it with a concrete supported
capability, such as “another OpenAI-compatible server.” Keep the change
localized to the settings row object with title Remote Servers so the
description uses proof-first, specific language instead of a catch-all claim.

In `@src/services/modelResidency/policy.ts`:
- Around line 80-83: The comment for SIDECAR_TYPES in policy.ts does not match
the eviction behavior in the residency policy. Update the comment near the
SIDECAR_TYPES definition to say these models are evicted last instead of “never
evicted for capacity,” or, if they truly must be pinned, adjust the eviction
logic in the residency selection code that uses SIDECAR_TYPES so they are
excluded from candidate eviction.

In `@src/services/rag/embedding.ts`:
- Around line 22-34: The withTimeout helper leaves its timer running when the
wrapped promise rejects before the deadline, so update the cleanup logic in
withTimeout to always clear the timeout in a finally path and only run the
orphan-handling promise.then(onOrphan) when the timeout has actually fired. Use
the existing withTimeout, timeout, and onOrphan flow to track whether the
deadline was reached, and make sure initLlama rejection does not leave a
dangling timer.

In `@src/services/toolEmbeddingRouter.ts`:
- Around line 45-48: The first-time cache hydration in hydrateCache is being
marked complete before AsyncStorage.getItem finishes, which can let concurrent
callers skip loading persisted data and use an empty cache. Change
toolEmbeddingRouter so hydrateCache stores a shared hydration promise and awaits
it for all callers, setting the hydrated state only after the async load and
cache population completes.

---

Nitpick comments:
In `@__tests__/integration/stores/remoteServerDiscovery.test.ts`:
- Around line 571-585: The no-kind fallback test in remoteServerDiscovery should
also verify non-generative models are excluded, not just that a chat model
survives. Update the discovery payload in the existing
`discoverModels('srv-plain')` test to include an embedding or reranker entry
alongside `llama-3.2`, then assert only the generative model is returned. Keep
the check focused on the `useRemoteServerStore` discovery path and the
`isTextModel()` filtering behavior so regressions in `discoveredModels` are
caught.

In `@__tests__/rntl/components/RemoteServerModal.test.tsx`:
- Line 114: The RemoteServerModal tests are coupled to placeholder text via the
server name and endpoint field queries, so they will churn when copy changes.
Update the relevant assertions in RemoteServerModal.test.tsx to use stable
selectors from the component, such as testID or accessibility labels, by
locating the inputs used around VALID_ENDPOINT and the related render/getBy*
checks in the affected test cases. Keep any placeholder-specific assertions
separate if you still want coverage for the copy itself.

In `@__tests__/rntl/components/TranscriptionModelsTab.test.tsx`:
- Around line 165-170: The current TranscriptionModelsTab test only verifies
that the nested download affordance is hidden, but it does not prove the card
itself cannot start a re-download. Strengthen the existing test in
TranscriptionModelsTab.test.tsx by triggering a press on the
transcription-model-card-0 element and asserting downloadModel is not called,
using the same seeded running STT state so the behavior is covered in one pass.

In `@__tests__/rntl/screens/RemoteServersScreen.test.tsx`:
- Around line 143-145: The empty-state desktop action test only checks that the
“Get Off Grid AI Desktop” label renders, so it won’t catch a broken or miswired
press handler. Update the affected tests around RemoteServersScreen to trigger
the captured alert action and the visible link, then assert that Linking.openURL
is called with OFF_GRID_DESKTOP_URL. Use the existing RemoteServersScreen and
OFF_GRID_DESKTOP_URL symbols so the test verifies the actual behavior, not just
the copy.

In
`@__tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts`:
- Around line 192-220: Add a regression test in
useDownloadManager.branches.test.ts for the STT retry path where
useDownloadManager.handleRetryDownload calls whisperService.downloadModel and
that promise rejects before a new row is created. Reuse the existing
handleRetryDownload, mockWhisperService.downloadModel, and
mockBackgroundDownloadService assertions to verify the failed retry is
surfaced/handled as expected, while ensuring the stale row cleanup and cancel
behavior are still covered by the stt retry branch.

In `@src/screens/ProDetailScreen/index.tsx`:
- Line 54: The desktop CTA logic is duplicated across ProDetailScreen and
RemoteServersScreen, so extract the shared OFF_GRID_DESKTOP_URL /
Linking.openURL(...).catch(() => {}) / “Get Off Grid AI Desktop” behavior into a
reusable helper or component and have ProDetailScreen use it instead of inline
copies. Centralize the tap handler and CTA text/props in a shared symbol (for
example a dedicated desktop CTA component or helper used by ProDetailScreen and
RemoteServersScreen) so future surfaces can reuse the same implementation
consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e16ebeda-bff1-493d-b1a5-9a308e57bed3

📥 Commits

Reviewing files that changed from the base of the PR and between e65db82 and 9af76d5.

📒 Files selected for processing (60)
  • CLAUDE.md
  • README.md
  • __tests__/integration/stores/remoteServerDiscovery.test.ts
  • __tests__/rntl/components/ModelSelectorModal.test.tsx
  • __tests__/rntl/components/RemoteServerModal.test.tsx
  • __tests__/rntl/components/TranscriptionModelsTab.test.tsx
  • __tests__/rntl/screens/ChatScreen.test.tsx
  • __tests__/rntl/screens/ModelDownloadScreen.test.tsx
  • __tests__/rntl/screens/ProDetailScreen.test.tsx
  • __tests__/rntl/screens/RemoteServersScreen.test.tsx
  • __tests__/rntl/screens/SettingsScreen.test.tsx
  • __tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts
  • __tests__/unit/services/audioRecorderService.test.ts
  • __tests__/unit/services/hardware.branches.test.ts
  • __tests__/unit/services/llm.test.ts
  • __tests__/unit/services/llmSafetyChecks.test.ts
  • __tests__/unit/services/networkDiscovery.test.ts
  • __tests__/unit/services/rag/embedding.test.ts
  • __tests__/unit/services/toolEmbeddingRouter.test.ts
  • __tests__/unit/stores/appStore.test.ts
  • __tests__/unit/utils/toast.test.ts
  • android/app/src/main/java/ai/offgridmobile/litert/LiteRTModule.kt
  • android/app/src/test/java/ai/offgridmobile/litert/LiteRTTokenBudgetTest.kt
  • docs/STABILITY_FIX_PLAN.md
  • ios/CoreMLDiffusionModule.swift
  • ios/DownloadManagerModule.swift
  • marketing/devto/chat-with-pdfs-phone-offline.md
  • marketing/devto/connect-phone-ollama-lmstudio.md
  • marketing/devto/local-ai-tools-web-search-phone.md
  • marketing/devto/talk-to-local-ai-voice-phone.md
  • marketing/devto/vision-ai-phone-camera-offline.md
  • patches/@kesha-antonov+react-native-background-downloader+4.5.6.patch
  • pro
  • src/components/ModelSelectorModal/TextTab.tsx
  • src/components/ModelSelectorModal/index.tsx
  • src/components/RemoteServerModal/index.tsx
  • src/components/settings/ProUpsellBanner.tsx
  • src/constants/index.ts
  • src/screens/ChatScreen/index.tsx
  • src/screens/DownloadManagerScreen/retryHandlers.ts
  • src/screens/DownloadManagerScreen/useDownloadManager.ts
  • src/screens/ModelDownloadScreen.tsx
  • src/screens/ModelSettingsScreen/TextGenerationSection.tsx
  • src/screens/ModelsScreen/TranscriptionModelsTab.tsx
  • src/screens/ModelsScreen/VoiceModelsUpsell.tsx
  • src/screens/ProDetailScreen/index.tsx
  • src/screens/RemoteServersScreen.styles.ts
  • src/screens/RemoteServersScreen.tsx
  • src/screens/SettingsScreen.tsx
  • src/services/hardware.ts
  • src/services/llm.ts
  • src/services/llmSafetyChecks.ts
  • src/services/mcpContextBoost.ts
  • src/services/modelResidency/policy.ts
  • src/services/networkDiscovery.ts
  • src/services/rag/embedding.ts
  • src/services/toolEmbeddingRouter.ts
  • src/stores/appStore.ts
  • src/stores/remoteServerHelpers.ts
  • src/utils/toast.ts

Comment thread android/app/src/main/java/ai/offgridmobile/litert/LiteRTModule.kt
Comment thread android/app/src/main/java/ai/offgridmobile/litert/LiteRTModule.kt Outdated
Comment thread ios/CoreMLDiffusionModule.swift Outdated
Comment on lines +192 to +199
let preferNeuralEngine: Bool
if #available(iOS 26.0, *) {
preferNeuralEngine = false
} else {
pipe = try StableDiffusionPipeline(
resourcesAt: url,
controlNet: [],
configuration: config,
reduceMemory: true
)
preferNeuralEngine = true
}

// Skip prewarm for 'original' variant (low-memory devices): prewarm
// loads the full Unet into memory just to unload it, causing an OOM spike.
// With reduceMemory=true the pipeline lazily loads each submodel during
// generateImages(), so prewarming is unnecessary.
if attentionVariant != "original" {
try pipe.loadResources()
let primaryUnits: MLComputeUnits = (cpuOnly || !preferNeuralEngine) ? .cpuAndGPU : .cpuAndNeuralEngine
let pipe: StableDiffusionPipelineProtocol

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

cpuOnly currently routes to .cpuAndGPU.

Line 198 ignores the CPU-only request and still enables the GPU. That breaks the parameter contract and can keep callers on the exact fallback path they were trying to avoid. Split this into three cases so cpuOnly maps to .cpuOnly, then choose ANE vs CPU+GPU only for non-CPU-only loads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ios/CoreMLDiffusionModule.swift` around lines 192 - 199, The `cpuOnly` flag
is being ignored in the `CoreMLDiffusionModule` pipeline setup, so the current
`primaryUnits` selection can still use GPU-backed compute. Update the logic
around `preferNeuralEngine` and `primaryUnits` so `cpuOnly` maps directly to
`.cpuOnly`, and only non-CPU-only loads choose between `.cpuAndNeuralEngine` and
`.cpuAndGPU` in the same `StableDiffusionPipelineProtocol` setup path.

Comment thread ios/DownloadManagerModule.swift
Comment thread src/components/ModelSelectorModal/TextTab.tsx
Comment thread src/screens/DownloadManagerScreen/retryHandlers.ts Outdated
Comment on lines +57 to +67
desktopLink: {
flexDirection: 'row' as const,
alignItems: 'center' as const,
gap: 6,
marginTop: 12,
paddingVertical: 4,
},
desktopLinkText: {
fontSize: 14,
color: colors.primary,
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Replace the hardcoded spacing and font size with design-system tokens.

gap: 6, marginTop: 12, paddingVertical: 4, and fontSize: 14 bypass the required SPACING/TYPOGRAPHY tokens, so this new CTA can drift from the rest of the UI. As per coding guidelines, "Use design-system TYPOGRAPHY tokens only; do not hardcode font sizes" and "Use design-system SPACING tokens only; do not hardcode margin or padding values."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/screens/RemoteServersScreen.styles.ts` around lines 57 - 67, The
`desktopLink` and `desktopLinkText` styles use hardcoded spacing and font sizing
instead of design-system tokens. Replace `gap`, `marginTop`, and
`paddingVertical` in `desktopLink` with the appropriate `SPACING` values, and
replace the `fontSize` in `desktopLinkText` with the matching `TYPOGRAPHY` token
so the CTA stays consistent with the rest of the UI.

Source: Coding guidelines

Comment thread src/services/llm.ts
Comment on lines +71 to +73
const quantizedCache = settings.cacheType !== 'f16';
const getMem = () => hardwareService.getAppMemoryUsage();
let memCheck = await checkMemoryForModel({ modelFileSize: fileSize, contextLength: params.ctxLen, getAvailableMemory: getMem, quantizedCache });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Treat OpenCL loads as f16 in the memory guard.

quantizedCache is derived from the stored setting only, but OpenCL loads can force the effective cache type to f16. On Android that makes the guard use the cheaper quantized estimate even when the actual KV cache is f16, so resolveSafeContext() can still approve contexts that later OOM during load.

Suggested fix
-    const quantizedCache = settings.cacheType !== 'f16';
+    const quantizedCache =
+      settings.inferenceBackend !== INFERENCE_BACKENDS.OPENCL &&
+      settings.cacheType !== 'f16';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const quantizedCache = settings.cacheType !== 'f16';
const getMem = () => hardwareService.getAppMemoryUsage();
let memCheck = await checkMemoryForModel({ modelFileSize: fileSize, contextLength: params.ctxLen, getAvailableMemory: getMem, quantizedCache });
const quantizedCache =
settings.inferenceBackend !== INFERENCE_BACKENDS.OPENCL &&
settings.cacheType !== 'f16';
const getMem = () => hardwareService.getAppMemoryUsage();
let memCheck = await checkMemoryForModel({ modelFileSize: fileSize, contextLength: params.ctxLen, getAvailableMemory: getMem, quantizedCache });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/llm.ts` around lines 71 - 73, The memory guard in llm loading is
using the stored cache setting only, so OpenCL paths can still be checked as
quantized even when the effective KV cache becomes f16. Update the guard setup
around checkMemoryForModel in the llm service to derive quantizedCache from the
actual load mode, and force it to false for OpenCL loads so resolveSafeContext()
uses the f16 estimate. Use the existing llm.ts flow and symbols like
settings.cacheType, checkMemoryForModel, and resolveSafeContext to locate the
fix.

Comment thread src/services/llm.ts Outdated
Comment thread src/stores/appStore.ts Outdated
alichherawalla and others added 2 commits June 28, 2026 05:22
Thread preferGpu (from hardwareService.preferGpuForImageGen) through the image
load path into the native loadModel params, so the native compute path matches
the residency estimate the gate sized the load against.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pick compute units from the JS-supplied preferGpu instead of guessing natively:
GPU when chosen (devices with RAM; iOS 26 ANE load fails on the 8GB Pro), ANE
otherwise (low-RAM devices where the GPU OOMs). Fall back only GPU -> ANE (lower
system RAM); never ANE -> GPU, which would OOM the device the ANE was protecting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: typecheck

Failed stage: Type check [❌]

Failed test name: ""

Failure summary:

The action failed during the TypeScript compile step (npx tsc --noEmit) with exit code 2 due to
TypeScript errors in test files:
- Multiple TS2307 module resolution failures: tests import files
under pro/ that cannot be found (or have no type declarations), e.g.:
-
tests/integration/audio/streamingPlayback.test.ts(47,29): cannot find
../../../pro/audio/ttsStore
- tests/integration/audio/streamingPlayback.test.ts(50,8): cannot
find ../../../pro/audio/streamingSpeech
-
tests/integration/audio/streamingStateMachine.test.ts(50,29): cannot find
../../../pro/audio/ttsStore
- tests/integration/audio/streamingStateMachine.test.ts(53,8):
cannot find ../../../pro/audio/streamingSpeech
-
tests/integration/audio/streamingStateMachine.test.ts(54,42): cannot find
../../../pro/audio/ttsLog
- tests/rntl/components/PlaybackControls.test.tsx(15,28): cannot
find ../../../pro/audio/ui/AudioMessageBubble/PlaybackControls
-
tests/unit/audioProgressCaption.test.ts(6,41): cannot find
../../pro/audio/ui/AudioMessageBubble/useAudioProgressCaption
-
tests/unit/mcp/McpToolExtension.test.ts(19,34): cannot find ../../../pro/mcp/McpToolExtension
-
One TS7006 strict typing error:
-
tests/integration/audio/streamingStateMachine.test.ts(92,29): parameter e implicitly has an any
type.

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

160:  > offgrid-mobile@0.0.100 prepare
161:  > husky && (cd pro && git config core.hooksPath .githooks 2>/dev/null || true)
162:  added 1168 packages, and audited 1169 packages in 30s
163:  234 packages are looking for funding
164:  run `npm fund` for details
165:  35 vulnerabilities (1 low, 21 moderate, 10 high, 3 critical)
166:  To address issues that do not require attention, run:
167:  npm audit fix
168:  To address all issues (including breaking changes), run:
169:  npm audit fix --force
170:  Run `npm audit` for details.
171:  ##[group]Run npx tsc --noEmit
172:  �[36;1mnpx tsc --noEmit�[0m
173:  shell: /usr/bin/bash -e {0}
174:  ##[endgroup]
175:  ##[error]__tests__/integration/audio/streamingPlayback.test.ts(47,29): error TS2307: Cannot find module '../../../pro/audio/ttsStore' or its corresponding type declarations.
176:  ##[error]__tests__/integration/audio/streamingPlayback.test.ts(50,8): error TS2307: Cannot find module '../../../pro/audio/streamingSpeech' or its corresponding type declarations.
177:  ##[error]__tests__/integration/audio/streamingStateMachine.test.ts(50,29): error TS2307: Cannot find module '../../../pro/audio/ttsStore' or its corresponding type declarations.
178:  ##[error]__tests__/integration/audio/streamingStateMachine.test.ts(53,8): error TS2307: Cannot find module '../../../pro/audio/streamingSpeech' or its corresponding type declarations.
179:  ##[error]__tests__/integration/audio/streamingStateMachine.test.ts(54,42): error TS2307: Cannot find module '../../../pro/audio/ttsLog' or its corresponding type declarations.
180:  ##[error]__tests__/integration/audio/streamingStateMachine.test.ts(92,29): error TS7006: Parameter 'e' implicitly has an 'any' type.
181:  ##[error]__tests__/rntl/components/PlaybackControls.test.tsx(15,28): error TS2307: Cannot find module '../../../pro/audio/ui/AudioMessageBubble/PlaybackControls' or its corresponding type declarations.
182:  ##[error]__tests__/unit/audioProgressCaption.test.ts(6,41): error TS2307: Cannot find module '../../pro/audio/ui/AudioMessageBubble/useAudioProgressCaption' or its corresponding type declarations.
183:  ##[error]__tests__/unit/mcp/McpToolExtension.test.ts(19,34): error TS2307: Cannot find module '../../../pro/mcp/McpToolExtension' or its corresponding type declarations.
184:  ##[error]Process completed with exit code 2.
185:  Node 20 is being deprecated. This workflow is running with Node 24 by default. If you need to temporarily use Node 20, you can set the ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true environment variable. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

alichherawalla and others added 29 commits July 1, 2026 02:02
The public repo's CI does not check out the private pro/ submodule, so any
__tests__ file importing ../pro/* fails to resolve there. Several such tests were
added to the branch without updating both CI configs, so tsc --noEmit (typecheck
job) errored on integration/audio/*, PlaybackControls, audioProgressCaption and
unit/mcp/McpToolExtension, and the test job would have failed on the latter two
(absent from jest's ignore list). Bring tsconfig.exclude and
jest.config testPathIgnorePatterns into lockstep so both cover the same full set
of pro-importing tests; they run in pro's CI where the submodule is present.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pleted)

Regression for the exact device case: every Kokoro asset basename is present on
disk (executorch creates files before their bytes finish) yet a download is live —
the engine must report downloading, not completed, so the Download Manager and
Voice panel agree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reframe the card from a feature list to the outcomes people get: chat, on-device
image and voice, live artifacts, projects that answer from your own docs with
citations, plus the private layer that remembers what you see, rewinds your screen,
runs one search across your day, connects your tools, dictates anywhere, keeps a
searchable clipboard, and turns activity into approval-gated to-dos and actions -
all on-device. Follows the brand arc (recognition → return → freedom); no em
dashes, no forbidden words.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…RegisteredPro

The Pro "aha" sheet and the settings upsell banner showed even when Pro was
unlocked, because they gated on hasRegisteredPro (set only when a license is
registered in-app) while loadProFeatures actually unlocks Pro from the keychain
entitlement OR a __DEV__ unlock — and never reflected that in the store. So a
keychain/dev-unlocked Pro user saw the upgrade prompt.

Add an authoritative isProActive flag set by loadProFeatures (the same `active`
signal that activates paid features), and gate both upsells on
hasRegisteredPro || isProActive. Not persisted — recomputed each launch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…he freeze

Firing every requested download at the native layer at once split bandwidth across
all of them and, with multi-GB models, drove iOS into a freeze (25 concurrent
observed on device). backgroundDownloadService.startDownload — the single chokepoint
every native download (text/image/stt) passes through — now admits at most
MAX_CONCURRENT_DOWNLOADS (3); the rest wait in a FIFO queue and begin as running
downloads complete, error, or are cancelled. A reservation token is added
synchronously so a same-tick burst can't over-admit; duplicate queued starts for the
same model coalesce. restore adopts resumed downloads against the cap so a relaunch
can't admit a fresh batch on top of them. Because only ≤3 ever start, the native
store never holds more than that, so a relaunch can't resume a storm either.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d of failing them

After an app-kill, iOS foreground downloads can't resume, so reconcile() used to mark
every interrupted download 'failed' ("Interrupted — tap retry"), forcing a manual tap
per model. Now it re-queues them: marks 'pending' (→ 'queued' in the service
vocabulary) and re-issues each through restartIosTextDownload — the SAME path retry
uses (modelManager.downloadModelBackground → backgroundDownloadService.startDownload),
so they flow through the 3-slot concurrency cap, auto-resuming up to 3 and queueing the
rest. Re-issue is fire-and-forget so launch isn't blocked behind the cap. Shared helper
keeps retry and reconcile DRY; no new status, no caller branching, no cap bypass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ve and Downloaded

A model that completed and was then re-started (a stale restore / re-download) showed
in the Download Manager as BOTH a failed "Active Downloads" row and a "Downloaded
Models" row — two guys doing the same thing (SDXL Core ML in both sections). The DM
builds those sections from separate stores (downloadStore vs registered models) with
no cross-dedup.

Fix at two seams:
- Display: a downloaded (on-disk, registered) model is authoritative, so drop any
  in-flight/failed row for the same model from activeItems, keyed by the shared
  uniformDownloadId.
- State: textProvider.reconcile now drops a stale in-flight row when the model is
  already registered (never re-download something we already have) instead of
  re-queuing it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…wnload Manager

A download waiting for a concurrency slot has no native downloadId and no store row
(performBackgroundDownload adds the row only after startDownload resolves), so it was
invisible while queued. Surface the waiting starts from their owner
(backgroundDownloadService.getQueuedItems) and render them as 'pending' (→ "Queued").
Merged into Active Downloads after the started rows, deduped against started + already
downloaded models so a model never appears twice.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Selecting a new model in the chat picker loaded it with the sheet still open, then
closed the sheet only once the load finished — because proceedWithModelLoadFn closed
the picker in its finally block (after awaiting loadTextModel). Move the close to the
start so the sheet dismisses immediately and the minimal in-chat loading card shows
during the load, matching the normal flow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mic)

Guards the single bottom-bar view-model: mic when idle, tts-stop while playing/paused
(disabled only while preparing), generation-stop outranks playback, and voice-switch is
reported alongside without competing with the center action.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The outcome-led copy was too long on the card. Trim to the core outcomes — chat,
image, voice, on-device projects, plus the private layer that remembers, rewinds,
searches, and drafts actions — kept brand-voice clean (no em dashes, no forbidden
words). Nothing leaves the device.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ressure (OOM/jetsam)

The app was jetsam-killed (code 9) when a whisper sidecar auto-loaded onto an actively-
generating CoreML image model: makeRoomFor said fits=true (7279+142 ≤ 0.78*12GB) and
stacked them, blind to the "~120s GPU optimization" compile spike that had already
consumed the real RAM. Root cause: budgetForSpec only consulted the live
os_proc_available memory for DIRTY specs; an mmap sidecar took the static
physical-cap path and skipped the live check entirely.

Fix, in ONE owner (budgetForSpec — no duplicated memory math in makeRoomFor): apply the
live os_proc budget whenever there is dirty-memory PRESSURE — the incoming model is
dirty OR a dirty model is already resident. A dirty model's working set can't be paged
out like clean mmap weights, so while one is present every load (even a sidecar) must
fit real free RAM. With no dirty pressure, mmap GGUF stays bounded by physical RAM only
(a big LLM still loads on a high-RAM phone when instantaneous available is low). Adds
dirtyMemory to the Resident type so residents carry it. Regression tests cover the
sidecar-refused-under-dirty-pressure case and the no-false-refusal case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cision

A refusal on a 12GB device is only explainable with the raw numbers: log
os_procAvailMB + totalMB + dirty alongside the budget, so we can tell whether real
free RAM is genuinely low (correct refusal) or the app footprint is bloated/leaking
(the real bug) rather than deriving it from the budget.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The memory-pressure failure card offers "Free memory & Retry", but the handler just
re-ran the same request into the same wall (makeRoomFor returns fits=false without
evicting, so nothing was freed). Now a memory-pressure retry ejects resident models
(activeModelService.ejectAll, lazy-required to avoid the import cycle) BEFORE re-running,
so the label is honest and the retry can actually recover memory when models are loaded.
(It still can't conjure RAM that isn't model-held — when the real os_proc headroom is
below the model's need, the message correctly points to a smaller model.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ocess Limit

Total RAM alone is misleading — iOS caps a process well below physical, and model
loads are gated on the live os_proc figure. Surface the real picture on Device
Information: Available Now, App Footprint, and the derived Process Limit
(available + footprint) — the actual OS cap on the app. This makes a "not enough
memory" refusal explainable from the UI (and readable in release, where the __DEV__
file log sink is off), and confirms whether the increased-memory entitlement is in
effect: a distribution build should show a much higher Process Limit than a
development-signed one. Adds hardwareService.getProcessMemory().

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y gate fires

The live-os_proc gate has two triggers: incoming-is-dirty (worked) and a-dirty-model-
is-resident. The second was a runtime no-op because activeModelService registered the
image model WITHOUT dirtyMemory: true, so residents.some(r => r.dirtyMemory) was always
false — exactly the jetsam case (a whisper sidecar stacking onto a resident, generating
image) went ungated. Tests passed only because they hand-set the flag. Set it on the real
register so the resident-dirty arm actually engages.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…permanent slot leak)

adoptActive was fed every restored id including 'completed' ones. A completed download
never emits another terminal event, so its concurrency slot never released — after a few
background-completions the cap starved and all downloads silently stalled. Adopt only
genuinely in-flight (running/pending/retrying/waiting_for_network) restored downloads.
Also repairs two mocks that were left missing adoptActive/getQueuedItems (bypassed the
pre-commit test gate). Adds a regression test that a completed restored download is not
adopted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e memory & Retry"

The card's "Free memory & Retry" label (from reasonFromLoadError → memoryPressure) and
the retry's decision to actually eject used TWO different regexes that could disagree —
so the label could promise a free the retry never performed. Derive both from the one
owner (reasonFromLoadError === 'insufficient-memory'); drop the second regex and the
dead lazy-require (no import cycle exists — activeModelService is already imported).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… them

The review fixes route through backgroundDownloadService.adoptActive (restore) and
appStore.setProActive (loadProFeatures); update the parallelMmproj and proBootFlow
mocks to include them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t mmproj

watchBackgroundDownload only reconciled the mmproj sidecar against native
'completed' status after subscribing — the main GGUF had no catch-up. Under the
new 3-concurrent cap the listener can be registered AFTER the main completes in
native (setup delayed behind an awaited-queued sidecar start), so its single
DownloadComplete event fires with no subscriber and is lost: mainCompleted stays
false and tryFinalize hangs at 100% forever.

Extract handleMainComplete/handleMmProjComplete as the single completion handlers
invoked by EITHER the live onComplete event OR one reconcile that checks BOTH
downloads against native status (the *CompleteHandled flags guard double-runs).
One owner per completion, no duplicated move/finalize logic.

Regression test: a non-vision main reported 'completed' before listener
registration finalizes via the reconcile alone, with no live event fired.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A start waiting for a concurrency slot lived only in backgroundDownloadService's
queue — no native downloadId, no store row — so tapping remove on a "Queued" row
hit the provider's findEntry, missed, and no-opped: the item stayed queued and its
startDownload() promise never settled until a slot freed.

Fix at the seam, keeping single ownership:
- backgroundDownloadService.cancelQueued(key) — the queue owner removes the waiting
  start and settles its promise as a user cancellation (reusing the `.cancelled`
  convention the onError path already uses), touching no native download.
- ModelDownloadService.dispatch routes a not-found cancel/remove to the queue,
  mapping the uniform id onto the queued item via the SAME uniformDownloadId the
  providers' list() and the View use — so queued and started rows route identically.
  retry is not routed (a not-yet-started item can't be retried).
- startModelDownload swallows a `.cancelled` rejection (no store row to fail, not an
  error) instead of firing onError.
- The Download Manager's queued projection now refreshes on the service's notify, so
  a cancel drops the "Queued" row immediately rather than on the 1s poll.

Tests: cancelQueued removes+settles+frees (and returns false for an unknown key);
dispatch routes a queued-only id to cancelQueued by uniform id and refuses retry;
startModelDownload does not surface onError on a cancellation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
handleStopFn ended the generation session and stopped LLM + image generation but
never fired HOOKS.audioStop. Streaming TTS runs ahead of playback (synthesis is
~100ms, playback is seconds), so at Stop time there are always buffered sentences
— the phone kept talking through them after the user aborted. The new-turn path
(handleSendFn) already fired audio.stop; the manual-stop path was missing it.

Regression test asserts handleStopFn fires HOOKS.audioStop (fails against the prior
code).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pe it as image

A HuggingFace multi-file image download runs its parts sequentially through
downloadFileTo. When the current part is queued behind the 3-download cap it has no
native downloadId, so cancelSyntheticImageDownload (which only had currentDownloadId
to work with) could not reach it — the part sat in the queue holding a slot, then
promoted, briefly started, and was only cancelled by wireCurrentDownloadPromise. A
cancel that visibly does nothing until a slot frees.

- cancelSyntheticImageDownload now also calls backgroundDownloadService.cancelQueued
  with the part's key (== makeImageModelKey) to drop a queued part at once.
- The part now carries modelType:'image' (it defaulted to 'text'), so getQueuedItems
  types it correctly and the service's uniform-id cancel routing matches the View.
- isCancelledError recognizes the cross-service `.cancelled` convention (message
  "Download cancelled") in addition to the local sentinel, so a cancelled part is
  treated as a cancel, not surfaced as a download failure — this also hardens the
  existing active-cancel path.

Regression tests: cancel drops a queued part via cancelQueued (no cancelDownload,
no native id); multi-file parts are typed image. Both fail against the prior code.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… vision finalize-hang

startModelDownload attaches watchDownload only AFTER downloadModelBackground resolves,
and startBgDownload started the MAIN first, then blocked awaiting the sidecar's
startDownload. Under the 3-concurrent cap the sidecar can queue for minutes; during
that wait the main (already started) can complete and fire its single DownloadComplete
with no listener attached yet — the event is lost, ctx.mainCompleted never flips, and
finalization hangs at 100%. The reconcile added in fd3b413 only recovers this when
native still reports the main 'completed' at watcher-attach time, which is not
guaranteed after a long unwatched wait (iOS URLSession evicts completed tasks).

Fix: start the sidecar FIRST, so the main is the last download started before the
function returns and the caller attaches the watcher. Nothing long is awaited after
the main starts, so the watcher is live before the main can complete (the reconcile
still covers the microtask gap). Behavior-neutral otherwise: same two downloads, same
store row, same listeners, same finalize path — only the start order swaps and
setMmProjDownloadId moves just after the store row is added.

Tests: assert the sidecar's startDownload precedes the main's (fails against the prior
order). stubStartDownload now keys ids by role (main vs mmproj fileName) not call
order, so the suite is agnostic to start ordering.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… queue contract

The dispatch queue-fallback (74ff0e9) consults backgroundDownloadService.getQueuedItems
on a not-found cancel/remove, but this integration test mocked the service without it,
so remove of an unknown id threw instead of refusing cleanly. Mirror the real contract
(getQueuedItems/cancelQueued) so the mock can't diverge from the code under test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nload-state fixes)

Records the pro-side commits made this session so core builds against them:
- single audio view-model (deriveAudioActivity) for the bottom bar; paused clip
  shows the mic, not a stop button
- coordinator resets the voice-switch on engine loss
- Kokoro: a live download wins over disk-presence (DM no longer shows completed at
  3%); a fetch collision is treated as benign; [KOKORO-DL] decision logging

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This session's stop-TTS (#2) and queued-image-cancel (#4) fixes tipped
useChatGenerationActions.ts and imageDownloadActions.ts just over the max-lines
limit. Condensed verbose comments only — no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant