fix: iOS + Android stability & performance (MCP boost regression, memory guards, deferred-load switcher)#425
fix: iOS + Android stability & performance (MCP boost regression, memory guards, deferred-load switcher)#425alichherawalla wants to merge 188 commits into
Conversation
Investigation findings and phased fix plan for the crash clusters (iOS Metal buffer-alloc, Android litert OOM, watchdog hangs) and the post-Pro-activation slowness regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reduce mcpContextBoost to just isMcpEnabled(); drop applyMcpContextBoost and the 'Raised to the model maximum' Model Settings notes. The auto-boost pinned context to 32768 and reloaded the model on MCP enable (never restoring), which was the #1 crash cluster on both platforms (iOS metal_buffer_type_alloc_buffer, Android litert nativeCreateEngine OOM) and the cause of the post-activation slowness on flagship devices. Tool schemas are thinned by the embedding router instead. Bumps the pro submodule to the matching boost-removal commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migrate persisted settings: reset contextLength/maxTokens (llama) and liteRTMaxTokens (litert) that are at the removed boost ceiling (32768/8192) back to device-safe defaults, so existing Pro users recover without reinstalling. Leaves legitimate non-boost settings untouched. Adds migration tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MiniLM embedding model (loaded for RAG and MCP tool-routing) took the global
load lock but never registered with the residency manager — so its footprint was
invisible to the RAM budget and a chat model would load believing it had more free
RAM than it did (OOM contributor on both platforms). Register it as a last-resort
sidecar ('embedding' ResidentType) so its RAM is counted and it can be evicted;
release it on unload.
Also bound the native init with a 30s timeout: a stalled embedding load (the
ThreadPool::startWorkers hang behind the 19-device condition_variable::wait crash)
now releases the lock and fails instead of wedging a concurrent chat-model load and
tripping the OS watchdog. Adds tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing (R3, R4) The pre-load memory guard was advisory only (warn then load anyway) and its KV-cache estimate was ~1000x too low (a flat ~2MB regardless of model size), so it never caught oversized loads — they hit the native allocator and crashed (iOS metal_buffer_type_alloc_buffer, Android litert OOM). - Scale the KV estimate with model size and cache type (f16 vs quantized), the right order of magnitude for real models. - On an unsafe check, step the context down to the largest size that fits rather than loading blindly; only block when the model weights alone exceed available RAM. This also covers iOS, where GPU layers were never RAM-capped — reducing context cuts the Metal working set. Adds unit tests for the estimate and the downgrade/block paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MCP tool-router embedded the query + every tool sequentially on the tiny CPU embedding context, caching results only in memory — so the ~60-embed cold burst repeated on the first message of every session (a visible time-to-first-token stall for Pro/MCP users). Persist the cache to AsyncStorage keyed by a content hash, so the burst happens once ever; a changed tool description re-embeds only that tool. Adds tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (F10) Patch @kesha-antonov/react-native-background-downloader: safeEmitEvent invoked the TurboModule JSI event emitter directly from the NSURLSession background delegate queue (didWriteData -> flushProgressReportsIfNeeded), racing the JS runtime's emitter map and crashing with EXC_BAD_ACCESS (pointer-authentication failure) during active downloads. Marshal every emission onto the main queue so emitter calls are serialized off the volatile delegate threads. Applied via patch-package (postinstall). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s (R7, F8) The liblitertlm crashes (SIGSEGV in inference, SIGABRT in nativeCreateEngine) are the top Android crash cluster. Two graceful-degradation fixes: - Clamp the token budget to free RAM before creating the engine (clampMaxTokens, unit-tested). The KV cache grows with the budget; an over-budget request aborts engine creation or segfaults under memory pressure. We now degrade to a smaller context (>=1024 floor) instead of crashing. - Tie the vision delegate to the main backend tier instead of always forcing Backend.GPU(). The always-on Mali GPU vision delegate SIGSEGVs (libGLES_mali) on weak/low-VRAM GPUs; on the CPU fallback tier vision now runs on CPU too, so the fallback can actually succeed instead of failing the whole load. Verified: compileDebugKotlin, lintDebug, and testDebugUnitTest all pass locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…red loading With deferred loading no model is in memory until first send, so the selector keyed its 'Switch Model' UI off the loaded path and showed 'Available Models' with nothing marked active — reading as 'can't switch models in chat'. Derive the selected model from activeModelId and reflect it: the switcher now shows 'Switch Model' and highlights the active choice (still tappable, so tapping loads it on tap), while the 'Currently Loaded'/Unload section stays gated on an actual in-memory load. Deferred loading is unchanged. Adds a test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
/gemini review |
1 similar comment
|
/gemini review |
The Transcriptions tab and the Download Manager could disagree — Download Manager showed an STT model 'failed' while the tab showed it stuck 'downloading'. They read different stores: the Download Manager reads the canonical useDownloadStore (driven by the native background-download events), while the tab read whisperStore.downloadProgressById, a parallel copy that's only cleared in whisperService's finally — which never runs if the background promise hangs on failure. Derive the tab's in-flight STT state from useDownloadStore (filtered to modelType 'stt'), so a failed entry reports active=false and the model becomes downloadable again instead of showing a stuck progress bar. whisperStore.downloadProgressById is kept only as a fallback for the RNFS URL-import path, which has no download-store entry. Adds tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s, kokoro playback) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…her + Desktop links Downloads - iOS DownloadManager uses a foreground .default URLSession (was background, which iOS throttles) + HTTP/3-off request + no URLCache: large model downloads now run at full speed, matching Android. Remote servers / discovery - Discover the Off Grid AI Gateway by probing port 7878 (/v1/models) across the subnet, alongside Ollama/LM Studio; align the LM Studio probe to /v1/models. - Categorize remote models by the gateway's `kind` so image/TTS/STT models no longer show up as text models in the chat picker. - Remote Server form placeholders + copy point at Off Grid AI Desktop (:7878). Bug fixes - STT (transcription) downloads can be retried from the Download Manager (was unhandled on iOS; re-invokes whisperService and clears the stale row). - Chat model switcher: defer opening the picker until the manager sheet has fully closed, so the row tap no longer just dismisses the sheet (iOS race). Copy - Promote Off Grid AI Desktop (free, Mac, not Pro) everywhere Ollama/LM Studio appear and on every Pro surface, linking to /releases/latest. Docs/marketing - CLAUDE.md: test every approved behavior change in the same pass. - 5 dev.to mobile articles (marketing/devto). Excludes local build artifacts (xcodeproj/Podfile.lock/ggml .so). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gate) showToast: native ToastAndroid on Android, brief Alert on iOS (no native toast). Used to tell the user a voice note can't play while the response is streaming. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
estimateImageModelRam used a flat fileSize x2.5 for every platform, but iOS Core ML pipelines load with reduceMemory=true (submodels load/unload sequentially), so peak RAM is roughly the largest submodel, not 2.5x the summed on-disk size. SD 1.5 (~1.5GB file) estimated at ~3.7GB and the residency gate refused it on a 6GB iPhone 15 with 4.9GB free. iOS now uses x2.0 (still under the budget cap); Android (ONNX/QNN reserves accelerator memory up front) keeps x2.5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Neural Engine is degraded on iOS 26 for these palettized .mlmodelc: the load either fails instantly (iPhone 15 Pro: 'Failed to load model') or loads but stalls at step 0 (UNet never finishes compiling for the ANE; iPhone 15). On iOS 26+ the pipeline now loads CPU+GPU-first (GPU-accelerated, so palettized weights still decode correctly - no gray images); older iOS keeps ANE-first with a CPU+GPU fallback on load failure. Also logs the real native load error instead of the generic 'Failed to load model'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The STT-retry change pushed useDownloadManager over two eslint limits (handleRetryDownload complexity 22 > 20; file 533 > 500 lines), which blocked the push gate. Move the per-platform retry helpers (text/image/STT, mmproj sidecar, finalization-resume) into retryHandlers.ts and expose a single runRetryDownload dispatcher; handleRetryDownload is now a thin wrapper. No behavior change — 72 related tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR updates memory handling, discovery, retry flows, iOS/Android loading behavior, and multiple Off Grid AI Desktop links. It also adds new stability guidance and five markdown articles. ChangesStability, memory fixes, and UI updates
Marketing Articles
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
The iOS 2.0x multiplier under-budgeted the GPU-primary load path: the GPU keeps diffusion buffers in system RAM (more than the ANE), so the real peak is ~2.5x the file size. 2.0x let the residency gate allow a load that then OOM/jetsam- crashed on a 6GB iPhone 15. Restore the flat 2.5x so the gate correctly blocks the load on devices that can't fit it (graceful 'not enough memory') while 8GB+ devices still fit. Reverts 96b82e5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decide GPU vs Neural Engine for iOS image gen from device RAM (iOS 26+: GPU when >=7GB so the 8GB iPhone 15 Pro avoids the failing ANE load; ANE otherwise so the 6GB iPhone 15 uses the lower-system-RAM path instead of OOMing on GPU). The residency estimate follows the same decision (GPU 2.5x, ANE 1.8x) so the gate admits an ANE load that fits rather than refusing it. Native wiring follows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 10
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (16)
src/services/modelResidency/policy.ts-80-83 (1)
80-83: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAlign the sidecar comment with the eviction algorithm.
Line 80 says these sidecars are "never evicted for capacity", but Lines 107-118 still select
SIDECAR_TYPESas last-resort eviction candidates. Either update the comment to reflect "evicted last" behavior or filter sidecars out of the candidate set if they must be pinned.Proposed comment fix
- // Speech (whisper), TTS, and the RAG/MCP embedding model are small always-resident - // sidecars: never evicted for capacity, and they never trigger eviction of the - // active generation model. Only text and image are heavy enough to swap. + // Speech (whisper), TTS, and the RAG/MCP embedding model are small sidecars: + // loading them never evicts the active generation model. When a generation + // model needs memory, they are retained until non-sidecars are exhausted and + // may then be evicted as a last resort.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/modelResidency/policy.ts` around lines 80 - 83, The comment for SIDECAR_TYPES in policy.ts does not match the eviction behavior in the residency policy. Update the comment near the SIDECAR_TYPES definition to say these models are evicted last instead of “never evicted for capacity,” or, if they truly must be pinned, adjust the eviction logic in the residency selection code that uses SIDECAR_TYPES so they are excluded from candidate eviction.src/services/rag/embedding.ts-22-34 (1)
22-34: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winClear the timeout when native init fails before the deadline.
If
initLlamarejects before Line 26 fires, thepromise.then(...)branch rejects and the timer is never cleared. Track whether the timeout actually fired, attach orphan cleanup only then, and clear the timer infinally.Proposed timeout cleanup fix
function withTimeout<T>(promise: Promise<T>, opts: { ms: number; message: string; onOrphan: (v: T) => void }): Promise<T> { const { ms, message, onOrphan } = opts; - let timer: ReturnType<typeof setTimeout>; + let timedOut = false; + let timer: ReturnType<typeof setTimeout> | null = null; const timeout = new Promise<never>((_, reject) => { - timer = setTimeout(() => reject(new Error(message)), ms); + timer = setTimeout(() => { + timedOut = true; + reject(new Error(message)); + }, ms); }); return Promise.race([ - promise.then(v => { clearTimeout(timer); return v; }), + promise, timeout, ]).catch(err => { - promise.then(onOrphan).catch(() => { /* underlying load failed too — nothing to clean up */ }); + if (timedOut) { + promise.then(onOrphan).catch(() => { /* underlying load failed too — nothing to clean up */ }); + } throw err; + }).finally(() => { + if (timer) clearTimeout(timer); }); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/rag/embedding.ts` around lines 22 - 34, The withTimeout helper leaves its timer running when the wrapped promise rejects before the deadline, so update the cleanup logic in withTimeout to always clear the timeout in a finally path and only run the orphan-handling promise.then(onOrphan) when the timeout has actually fired. Use the existing withTimeout, timeout, and onOrphan flow to track whether the deadline was reached, and make sure initLlama rejection does not leave a dangling timer.__tests__/unit/services/rag/embedding.test.ts-89-101 (1)
89-101: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winRestore timers in
finallyand cover orphan release.If an assertion fails before Line 100, later tests inherit fake timers. Wrap this test in
try/finally; while touching it, makeinitLlamaresolve after the timeout and assertmockReleaseruns so the orphan cleanup behavior is covered. Based on learnings, test every approved behavior change in the same pass, including bug fixes, new branches/conditions, and contract changes that other code or tests depend on.Proposed test hardening
it('rejects and does not register if the native load times out (F5)', async () => { jest.useFakeTimers(); - // initLlama never resolves → the timeout must fire and release the lock. - mockInitLlama.mockReturnValue(new Promise(() => {}) as any); - const loadPromise = embeddingService.load().catch((e: Error) => e); - await jest.advanceTimersByTimeAsync(31000); - const result = await loadPromise; - expect(result).toBeInstanceOf(Error); - expect((result as Error).message).toMatch('timed out'); - expect(embeddingService.isLoaded()).toBe(false); - expect(modelResidencyManager.isResident('embedding')).toBe(false); - jest.useRealTimers(); + try { + let resolveNative!: (ctx: unknown) => void; + mockInitLlama.mockReturnValue(new Promise(resolve => { resolveNative = resolve; }) as any); + const loadPromise = embeddingService.load().catch((e: Error) => e); + + await jest.advanceTimersByTimeAsync(31000); + const result = await loadPromise; + expect(result).toBeInstanceOf(Error); + expect((result as Error).message).toMatch('timed out'); + expect(embeddingService.isLoaded()).toBe(false); + expect(modelResidencyManager.isResident('embedding')).toBe(false); + + resolveNative({ release: mockRelease }); + await Promise.resolve(); + expect(mockRelease).toHaveBeenCalled(); + } finally { + jest.useRealTimers(); + } });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/unit/services/rag/embedding.test.ts` around lines 89 - 101, The embedding timeout test leaves fake timers enabled if an assertion fails, so wrap the body of the `embeddingService.load()` scenario in a try/finally and always restore real timers in `finally`. While updating `__tests__/unit/services/rag/embedding.test.ts`, also add coverage for the orphan cleanup path by having `mockInitLlama` resolve after the timeout and asserting `mockRelease` is called, alongside the existing `load`, `isLoaded`, and `modelResidencyManager` expectations.Source: Learnings
src/services/toolEmbeddingRouter.ts-45-48 (1)
45-48: 🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick winSerialize first-time cache hydration.
Setting
hydrated = truebeforeAsyncStorage.getItemcompletes lets a concurrent first routing call skip hydration and proceed with an empty in-memory cache, reintroducing the cold embedding burst despite persisted data. Store and await a shared hydration promise.Proposed hydration guard
const toolEmbeddingCache = new Map<string, CacheEntry>(); let hydrated = false; +let hydratePromise: Promise<void> | null = null; let saveTimer: ReturnType<typeof setTimeout> | null = null; @@ async function hydrateCache(): Promise<void> { if (hydrated) return; - hydrated = true; - try { - const raw = await AsyncStorage.getItem(CACHE_STORAGE_KEY); - if (!raw) return; - const parsed = JSON.parse(raw) as Record<string, CacheEntry>; - for (const [name, entry] of Object.entries(parsed)) { - if (entry && typeof entry.h === 'string' && Array.isArray(entry.v)) toolEmbeddingCache.set(name, entry); + if (hydratePromise) return hydratePromise; + hydratePromise = (async () => { + try { + const raw = await AsyncStorage.getItem(CACHE_STORAGE_KEY); + if (!raw) return; + const parsed = JSON.parse(raw) as Record<string, CacheEntry>; + for (const [name, entry] of Object.entries(parsed)) { + if (entry && typeof entry.h === 'string' && Array.isArray(entry.v)) toolEmbeddingCache.set(name, entry); + } + logger.log(`[ToolRouter] hydrated ${toolEmbeddingCache.size} cached tool embeddings`); + } catch (e) { + logger.warn(`[ToolRouter] failed to hydrate embedding cache: ${String(e)}`); + } finally { + hydrated = true; + hydratePromise = null; } - logger.log(`[ToolRouter] hydrated ${toolEmbeddingCache.size} cached tool embeddings`); - } catch (e) { - logger.warn(`[ToolRouter] failed to hydrate embedding cache: ${String(e)}`); - } + })(); + return hydratePromise; } @@ export function _resetToolEmbeddingCache(): void { toolEmbeddingCache.clear(); hydrated = false; + hydratePromise = null; if (saveTimer) { clearTimeout(saveTimer); saveTimer = null; } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/toolEmbeddingRouter.ts` around lines 45 - 48, The first-time cache hydration in hydrateCache is being marked complete before AsyncStorage.getItem finishes, which can let concurrent callers skip loading persisted data and use an empty cache. Change toolEmbeddingRouter so hydrateCache stores a shared hydration promise and awaits it for all callers, setting the hydrated state only after the async load and cache population completes.__tests__/unit/services/hardware.branches.test.ts-76-97 (1)
76-97: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winRestore
Platform.OSafter this suite.These tests mutate a process-global and leave the last case as iOS, so any later tests in this file can inherit the wrong platform. Save the original value and restore it in
afterEachto keep the suite order-independent.Suggested fix
describe('estimateImageModelRam', () => { + const originalOS = Platform.OS; + + afterEach(() => { + Platform.OS = originalOS; + }); + it('budgets 2.5x the model total size on Android (ONNX/QNN reserves NPU memory)', () => { Platform.OS = 'android';🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/unit/services/hardware.branches.test.ts` around lines 76 - 97, The hardware.branches test suite mutates the global Platform.OS and leaves it set to iOS, which can leak into later tests. In the hardwareService estimateImageModelRam branch tests, save the original Platform.OS before changing it and restore it in an afterEach so each case stays isolated and order-independent.src/screens/ModelSettingsScreen/TextGenerationSection.tsx-72-72 (1)
72-72: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winMake the warning copy concrete and guideline-compliant.
Both warnings still use vague "significant RAM / some devices" wording, and Line 72 also introduces an em dash. This screen already knows the threshold, so the warning should state that threshold directly instead of using generic copy.
Suggested fix
- warning={maxTokens > contextWarnThreshold ? 'High context uses significant RAM — may slow or crash on some devices' : null} + warning={maxTokens > contextWarnThreshold ? `Values above ${formatContext(contextWarnThreshold)} use more RAM and can slow generation or fail to load on lower-memory devices` : null} ... - warning={contextLength > 8192 ? 'High context uses significant RAM and may crash on some devices' : null} + warning={contextLength > 8192 ? 'Context lengths above 8K use more RAM and can fail to load on lower-memory devices' : null}As per coding guidelines, "Content must use proof-first language, stating measurable specifics instead of vague claims" and "Do not use em dashes in copy."
Also applies to: 127-127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/screens/ModelSettingsScreen/TextGenerationSection.tsx` at line 72, The warning copy in TextGenerationSection should be made concrete and guideline-compliant instead of using vague RAM/device language and an em dash. Update the warning text used for the maxTokens/contextWarnThreshold check, and the matching warning at the other referenced location, to state the actual threshold or measurable condition directly using proof-first language. Keep the warning tied to the existing maxTokens and contextWarnThreshold logic, and remove any em dash from the copy.Source: Coding guidelines
marketing/devto/connect-phone-ollama-lmstudio.md-9-9 (1)
9-9: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winQualify the 14B vs 4B claim. The current wording reads like a fixed cutoff, but the app's own guidance ties results to device memory and quantization. Add the benchmark setup or soften the numbers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@marketing/devto/connect-phone-ollama-lmstudio.md` at line 9, Soften the 14B vs 4B comparison in the opening copy and avoid presenting it as a hard cutoff. In the markdown content, update the wording in the intro paragraph to either qualify the claim with the benchmark/setup used (device memory, GPU, and quantization) or rephrase it as a general example of expected performance. Keep the language aligned with the guidance in the article so the statement matches the behavior described by the phone-to-desktop workflow.Source: Coding guidelines
marketing/devto/vision-ai-phone-camera-offline.md-31-32 (1)
31-32: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winSoften the seven-second claim. The line appears twice; tie it to a specific device, model, prompt length, and OS version, or drop the number.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@marketing/devto/vision-ai-phone-camera-offline.md` around lines 31 - 32, The performance claim in the article is too absolute and is repeated without enough context; update the wording in the recommendation section to avoid stating “about seven seconds” as a general result. Use a specific benchmark tied to a concrete device, model, prompt length, and OS version if you keep the timing, or remove the number and replace it with a softer, qualified statement in the relevant copy.Source: Coding guidelines
marketing/devto/connect-phone-ollama-lmstudio.md-101-103 (1)
101-103: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winRemove the Anthropic-compatible claim.
marketing/devto/connect-phone-ollama-lmstudio.md:102only ships an OpenAI-compatible remote provider path; the Anthropic support here is just shared parsing/types, so this FAQ overstates the feature set.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@marketing/devto/connect-phone-ollama-lmstudio.md` around lines 101 - 103, The FAQ in connect-phone-ollama-lmstudio.md overstates provider support by claiming Anthropic-compatible servers work. Update the “Which servers work?” answer to mention only the OpenAI-compatible API path currently supported here (for example Ollama, LM Studio, LocalAI, and similar), and remove the Anthropic-compatible statement so the marketing copy matches the actual remote provider behavior.Source: Coding guidelines
docs/STABILITY_FIX_PLAN.md-1-163 (1)
1-163: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winReplace em dashes throughout this doc.
This file uses em dashes in the title, bullets, and section text, but the docs style guide forbids them. Please convert them to plain hyphens consistently. As per coding guidelines, "Do not use em dashes in copy; use a hyphen instead."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/STABILITY_FIX_PLAN.md` around lines 1 - 163, Replace all em dashes in STABILITY_FIX_PLAN.md with plain hyphens, including the title, bullets, and section prose, and keep the wording otherwise unchanged. Review the document for any remaining "—" characters and update the surrounding text consistently so the style guide rule is followed throughout.Source: Coding guidelines
docs/STABILITY_FIX_PLAN.md-163-164 (1)
163-164: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winRemove the stray closing tags at the end of the doc.
</content>and</invoke>look accidental and will render as garbage in the markdown output.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/STABILITY_FIX_PLAN.md` around lines 163 - 164, The document ends with stray closing tags that should not be in the markdown output. Remove the accidental `</content>` and `</invoke>` text from the end of STABILITY_FIX_PLAN so the rendered doc is clean; this is just a cleanup in the markdown content, not a functional change.src/screens/SettingsScreen.tsx-188-188 (1)
188-188: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAvoid
and morein this nav copy.
and moreis vague. Please replace it with the concrete supported category, e.g.another OpenAI-compatible server, so the row describes an actual capability instead of a catch-all. As per coding guidelines, "Content must use proof-first language, stating measurable specifics instead of vague claims."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/screens/SettingsScreen.tsx` at line 188, Update the Remote Servers nav copy in SettingsScreen so it removes the vague “and more” phrasing and replaces it with a concrete supported capability, such as “another OpenAI-compatible server.” Keep the change localized to the settings row object with title Remote Servers so the description uses proof-first, specific language instead of a catch-all claim.Source: Coding guidelines
docs/STABILITY_FIX_PLAN.md-6-6 (1)
6-6: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winFix the heading level jump at Line 6.
### Progressskips##, which breaks markdown structure and matches the linter warning.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/STABILITY_FIX_PLAN.md` at line 6, The markdown heading in the fix plan has a level jump because Progress is written as a third-level heading without an intermediate second-level section. Update the heading for Progress in the document to use the correct hierarchy so it follows the surrounding section structure and satisfies the markdown linter.Source: Linters/SAST tools
src/screens/RemoteServersScreen.tsx-141-143 (1)
141-143: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winReplace the vague server wording.
other LLM serversis too broad for the copy rule here. Please name the supported class directly, e.g.another OpenAI-compatible server on your network, so the UI states the contract instead of a catch-all phrase. As per coding guidelines, "Content must use proof-first language, stating measurable specifics instead of vague claims."Also applies to: 247-249
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/screens/RemoteServersScreen.tsx` around lines 141 - 143, The empty-state copy in RemoteServersScreen should avoid the vague phrase “other LLM servers” and instead state the supported contract directly. Update the text rendered in the empty state to name the class of supported endpoint explicitly, using the same wording consistently in the other affected copy block, so the messaging is proof-first and specific (for example, “another OpenAI-compatible server on your network”).Source: Coding guidelines
src/screens/ModelsScreen/VoiceModelsUpsell.tsx-35-43 (1)
35-43: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winSurface desktop-link failures instead of swallowing them.
If
openURLrejects here, the CTA becomes a dead tap. This PR already addsshowToast()for short cross-platform hints, so this should report the failure instead of using an empty catch.Suggested fix
import { View, Text, TouchableOpacity, Linking } from 'react-native'; import Icon from 'react-native-vector-icons/Feather'; import { Button } from '../../components'; import { useTheme, useThemedStyles } from '../../theme'; import type { ThemeColors } from '../../theme'; import { TYPOGRAPHY, SPACING, OFF_GRID_DESKTOP_URL } from '../../constants'; +import { showToast } from '../../utils/toast'; @@ <TouchableOpacity style={styles.desktopLink} - onPress={() => Linking.openURL(OFF_GRID_DESKTOP_URL).catch(() => {})} + onPress={() => Linking.openURL(OFF_GRID_DESKTOP_URL).catch(() => { + showToast('Could not open Off Grid AI Desktop link.'); + })} accessibilityRole="link" accessibilityLabel="Get Off Grid AI Desktop" >🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/screens/ModelsScreen/VoiceModelsUpsell.tsx` around lines 35 - 43, The desktop CTA in VoiceModelsUpsell is swallowing Linking.openURL failures, which leaves the tap with no feedback. Update the TouchableOpacity onPress handler in VoiceModelsUpsell to catch the rejection and surface a short user-facing error via showToast instead of an empty catch, using the existing OFF_GRID_DESKTOP_URL and the current desktop link action for context.src/screens/ModelDownloadScreen.tsx-250-250 (1)
250-250: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winReplace the em dash in the alert title.
Line 250 adds user-facing copy with an em dash, which violates the copy rules for this repo.
As per coding guidelines, "Do not use em dashes in copy; use a hyphen instead."
Suggested fix
- setAlertState(showAlert('Connected — No Models Found', `${server.name} is reachable but has no models loaded. Start a model in Off Grid AI Desktop, Ollama, or LM Studio, then reconnect.`)); + setAlertState(showAlert('Connected - No Models Found', `${server.name} is reachable but has no models loaded. Start a model in Off Grid AI Desktop, Ollama, or LM Studio, then reconnect.`));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/screens/ModelDownloadScreen.tsx` at line 250, The user-facing alert title in ModelDownloadScreen’s showAlert call uses an em dash, which violates the repo copy rule; update the connected/no-models message to use a hyphen instead of the em dash while keeping the rest of the text unchanged. Locate the string in the setAlertState(showAlert(...)) call within ModelDownloadScreen and replace the title copy accordingly.Source: Coding guidelines
🧹 Nitpick comments (6)
__tests__/integration/stores/remoteServerDiscovery.test.ts (1)
571-585: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winCover the no-
kindfallback with a non-generative model.This only proves that a generative ID survives when
kindis missing. Add an embedding or reranker ID to the same payload and assert it is still excluded, otherwise a regression inisTextModel()can leak non-chat models intodiscoveredModels. Based on learnings, test every approved behavior change in the same pass, including new branches/conditions and contract changes that other code or tests depend on.Suggested test extension
mockFetch.mockImplementation((url: string) => { if (url.endsWith('/v1/models')) { return Promise.resolve( - jsonResponse({ object: 'list', data: [{ id: 'llama-3.2' }] }), + jsonResponse({ + object: 'list', + data: [{ id: 'llama-3.2' }, { id: 'bge-base-en-v1.5' }], + }), ); } return Promise.resolve(jsonResponse({}, false, 404)); }); const models = await useRemoteServerStore.getState().discoverModels('srv-plain'); expect(models.map((m) => m.id)).toEqual(['llama-3.2']); + expect(models.some((m) => m.id === 'bge-base-en-v1.5')).toBe(false);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/integration/stores/remoteServerDiscovery.test.ts` around lines 571 - 585, The no-kind fallback test in remoteServerDiscovery should also verify non-generative models are excluded, not just that a chat model survives. Update the discovery payload in the existing `discoverModels('srv-plain')` test to include an embedding or reranker entry alongside `llama-3.2`, then assert only the generative model is returned. Keep the check focused on the `useRemoteServerStore` discovery path and the `isTextModel()` filtering behavior so regressions in `discoveredModels` are caught.Source: Learnings
__tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts (1)
192-220: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd the STT early-failure regression case.
These tests cover the happy path, but not the branch where
whisperService.downloadModel()rejects before a new row is registered. That is the risky edge in the new retry flow, and a regression test here would lock it down. Based on learnings, "Test every approved behavior change in the same pass, including bug fixes, new branches/conditions, and contract or copy changes that other code or tests depend on."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts` around lines 192 - 220, Add a regression test in useDownloadManager.branches.test.ts for the STT retry path where useDownloadManager.handleRetryDownload calls whisperService.downloadModel and that promise rejects before a new row is created. Reuse the existing handleRetryDownload, mockWhisperService.downloadModel, and mockBackgroundDownloadService assertions to verify the failed retry is surfaced/handled as expected, while ensuring the stale row cleanup and cancel behavior are still covered by the stt retry branch.Source: Learnings
__tests__/rntl/components/TranscriptionModelsTab.test.tsx (1)
165-170: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAssert the card itself cannot re-trigger downloads.
This test says the card is not tappable, but only checks that the nested download button is absent. Add a press on
transcription-model-card-0and assertdownloadModelwas not called.Proposed test strengthening
it('treats an active STT download-store entry as downloading (no re-download affordance)', () => { seedSttDownload('tiny.en', 'running', 0.6); - const { queryByTestId } = render(<TranscriptionModelsTab />); + const { getByTestId, queryByTestId } = render(<TranscriptionModelsTab />); // Downloading → no download button and the card is not tappable to re-download. expect(queryByTestId('transcription-model-card-0-download')).toBeNull(); + fireEvent.press(getByTestId('transcription-model-card-0')); + expect(mockWhisperActions.downloadModel).not.toHaveBeenCalled(); });Based on learnings, test every approved behavior change in the same pass.
<retrieved_learnings>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/rntl/components/TranscriptionModelsTab.test.tsx` around lines 165 - 170, The current TranscriptionModelsTab test only verifies that the nested download affordance is hidden, but it does not prove the card itself cannot start a re-download. Strengthen the existing test in TranscriptionModelsTab.test.tsx by triggering a press on the transcription-model-card-0 element and asserting downloadModel is not called, using the same seeded running STT state so the behavior is covered in one pass.Source: Learnings
__tests__/rntl/screens/RemoteServersScreen.test.tsx (1)
143-145: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the new desktop actions actually open the URL.
These checks only prove the label exists. If the
onPresshandler is removed or wired to the wrong URL, the tests still pass. Please invoke the captured alert action and the visible link, then assertLinking.openURL(OFF_GRID_DESKTOP_URL)was called. Based on learnings, "Test every approved behavior change in the same pass, including bug fixes, new branches/conditions, and contract or copy changes that other code or tests depend on."Also applies to: 527-533
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/rntl/screens/RemoteServersScreen.test.tsx` around lines 143 - 145, The empty-state desktop action test only checks that the “Get Off Grid AI Desktop” label renders, so it won’t catch a broken or miswired press handler. Update the affected tests around RemoteServersScreen to trigger the captured alert action and the visible link, then assert that Linking.openURL is called with OFF_GRID_DESKTOP_URL. Use the existing RemoteServersScreen and OFF_GRID_DESKTOP_URL symbols so the test verifies the actual behavior, not just the copy.Source: Learnings
src/screens/ProDetailScreen/index.tsx (1)
54-54: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winExtract the desktop CTA into a shared component/helper.
This same
OFF_GRID_DESKTOP_URL+Linking.openURL(...).catch(() => {})+ "Get Off Grid AI Desktop" pattern already exists insrc/screens/RemoteServersScreen.tsx:144-152, and this PR is adding it to several more surfaces. Keeping each copy inline will make text, accessibility props, and failure handling drift. Based on learnings, "Before writing any new component, style, hook, or service, search for an existing one and reuse it instead of building a parallel version" and "two screens that show the same kind of thing must use the same component."Also applies to: 165-184, 305-330
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/screens/ProDetailScreen/index.tsx` at line 54, The desktop CTA logic is duplicated across ProDetailScreen and RemoteServersScreen, so extract the shared OFF_GRID_DESKTOP_URL / Linking.openURL(...).catch(() => {}) / “Get Off Grid AI Desktop” behavior into a reusable helper or component and have ProDetailScreen use it instead of inline copies. Centralize the tap handler and CTA text/props in a shared symbol (for example a dedicated desktop CTA component or helper used by ProDetailScreen and RemoteServersScreen) so future surfaces can reuse the same implementation consistently.Source: Learnings
__tests__/rntl/components/RemoteServerModal.test.tsx (1)
114-114: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse stable selectors for these inputs.
Lines 114, 206, 244, 313, and 410 couple the tests to placeholder copy, so every marketing/example-text tweak forces unrelated test churn. Prefer a stable
testIDor accessibility label for the server name and endpoint fields, and keep placeholder assertions separate if you want copy coverage.Also applies to: 206-213, 243-245, 312-314, 408-410
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/rntl/components/RemoteServerModal.test.tsx` at line 114, The RemoteServerModal tests are coupled to placeholder text via the server name and endpoint field queries, so they will churn when copy changes. Update the relevant assertions in RemoteServerModal.test.tsx to use stable selectors from the component, such as testID or accessibility labels, by locating the inputs used around VALID_ENDPOINT and the related render/getBy* checks in the affected test cases. Keep any placeholder-specific assertions separate if you still want coverage for the copy itself.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@android/app/src/main/java/ai/offgridmobile/litert/LiteRTModule.kt`:
- Around line 595-597: The vision backend selection in visionBackendFor() still
returns Backend.GPU() even when the existing skipGpu guard should block GPU
usage, which can reintroduce the Pixel 10 crash path. Update
visionBackendFor(mainBackend) in LiteRTModule to consult the same device gate
used by buildBackendChain() and return CPU vision whenever GPU is disallowed,
including for the NPU-backed path.
- Around line 80-84: The clamp logic in LiteRTModule.clampMaxTokens still forces
MIN_TOKEN_FLOOR when kvBudgetMb is non-positive, which can overcommit memory.
Update the early return so it does not default to 1024 tokens; instead, either
reject the load path or cap to the highest affordable token count based on
availMb, modelMb, TOKEN_BUDGET_HEADROOM_MB, and KV_MB_PER_TOKEN. Keep the fix
localized to clampMaxTokens and preserve the existing affordability calculation
used for the requested token clamp.
In `@ios/CoreMLDiffusionModule.swift`:
- Around line 192-199: The `cpuOnly` flag is being ignored in the
`CoreMLDiffusionModule` pipeline setup, so the current `primaryUnits` selection
can still use GPU-backed compute. Update the logic around `preferNeuralEngine`
and `primaryUnits` so `cpuOnly` maps directly to `.cpuOnly`, and only
non-CPU-only loads choose between `.cpuAndNeuralEngine` and `.cpuAndGPU` in the
same `StableDiffusionPipelineProtocol` setup path.
In `@ios/DownloadManagerModule.swift`:
- Around line 285-302: The persisted download state in DownloadManagerModule’s
restoreTasksFromSession/getActiveDownloads flow can outlive the foreground
URLSessionDownloadTask after relaunch, so missing active records are never
reconciled. Update restoreTasksFromSession to detect persisted running/pending
downloads that are not returned by the session, and mark those records inactive
(failed/paused) or remove them before JS hydration so stale work is not
rehydrated as still running.
In `@src/components/ModelSelectorModal/TextTab.tsx`:
- Around line 106-119: The active-row highlighting in TextTab is still using
selectedModelPath even after a remote model becomes current, so a stale local
model can appear selected alongside the remote one. Update the isSelected logic
in the model row rendering to only apply the deferred local highlight when
currentRemoteModelId is null, keeping the load-on-tap behavior intact while
ensuring only the correct model is marked active.
In `@src/screens/DownloadManagerScreen/retryHandlers.ts`:
- Around line 185-198: The STT retry flow in retryWhisperDownload removes the
existing store row before whisperService.downloadModel has successfully
registered the replacement, so a startup failure can leave the Download Manager
with no row at all. Update retryWhisperDownload to keep the failed item in
useDownloadStore until the new download is registered, or add catch-path
recovery that restores a failed row if downloadModel rejects; use the existing
retryWhisperDownload, useDownloadStore.getState().remove, and
whisperService.downloadModel hooks to keep the row visible during retry.
In `@src/screens/RemoteServersScreen.styles.ts`:
- Around line 57-67: The `desktopLink` and `desktopLinkText` styles use
hardcoded spacing and font sizing instead of design-system tokens. Replace
`gap`, `marginTop`, and `paddingVertical` in `desktopLink` with the appropriate
`SPACING` values, and replace the `fontSize` in `desktopLinkText` with the
matching `TYPOGRAPHY` token so the CTA stays consistent with the rest of the UI.
In `@src/services/llm.ts`:
- Around line 97-103: resolveSafeContext() is only testing a fixed fallback list
capped at 4096, so it can miss larger safe contexts and return too small a
value. Update the fallback generation in llm.ts to search downward from
requestedCtx in sensible steps (instead of hardcoding [4096, 3072, 2048, 1024])
so values like 12288 or 8192 are tried before smaller ones. Keep the existing
checkMemoryForModel and logger.warn flow, but ensure the first safe ctxLen
returned is the largest fitting context.
- Around line 71-73: The memory guard in llm loading is using the stored cache
setting only, so OpenCL paths can still be checked as quantized even when the
effective KV cache becomes f16. Update the guard setup around
checkMemoryForModel in the llm service to derive quantizedCache from the actual
load mode, and force it to false for OpenCL loads so resolveSafeContext() uses
the f16 estimate. Use the existing llm.ts flow and symbols like
settings.cacheType, checkMemoryForModel, and resolveSafeContext to locate the
fix.
In `@src/stores/appStore.ts`:
- Around line 198-205: The rehydrate migration in appStore is too broad because
it uses >= against MCP_BOOST_CTX_CEILING and MCP_BOOST_MAX_OUTPUT_TOKENS, which
can clobber valid persisted user settings. Update the migration logic around the
contextLength/maxTokens and liteRTMaxTokens reset paths to only undo the old MCP
boost when the stored values exactly match the boosted constants, preserving any
legitimately higher user-configured values. Keep the change localized to the
rehydration block in the appStore migration.
---
Minor comments:
In `@__tests__/unit/services/hardware.branches.test.ts`:
- Around line 76-97: The hardware.branches test suite mutates the global
Platform.OS and leaves it set to iOS, which can leak into later tests. In the
hardwareService estimateImageModelRam branch tests, save the original
Platform.OS before changing it and restore it in an afterEach so each case stays
isolated and order-independent.
In `@__tests__/unit/services/rag/embedding.test.ts`:
- Around line 89-101: The embedding timeout test leaves fake timers enabled if
an assertion fails, so wrap the body of the `embeddingService.load()` scenario
in a try/finally and always restore real timers in `finally`. While updating
`__tests__/unit/services/rag/embedding.test.ts`, also add coverage for the
orphan cleanup path by having `mockInitLlama` resolve after the timeout and
asserting `mockRelease` is called, alongside the existing `load`, `isLoaded`,
and `modelResidencyManager` expectations.
In `@docs/STABILITY_FIX_PLAN.md`:
- Around line 1-163: Replace all em dashes in STABILITY_FIX_PLAN.md with plain
hyphens, including the title, bullets, and section prose, and keep the wording
otherwise unchanged. Review the document for any remaining "—" characters and
update the surrounding text consistently so the style guide rule is followed
throughout.
- Around line 163-164: The document ends with stray closing tags that should not
be in the markdown output. Remove the accidental `</content>` and `</invoke>`
text from the end of STABILITY_FIX_PLAN so the rendered doc is clean; this is
just a cleanup in the markdown content, not a functional change.
- Line 6: The markdown heading in the fix plan has a level jump because Progress
is written as a third-level heading without an intermediate second-level
section. Update the heading for Progress in the document to use the correct
hierarchy so it follows the surrounding section structure and satisfies the
markdown linter.
In `@marketing/devto/connect-phone-ollama-lmstudio.md`:
- Line 9: Soften the 14B vs 4B comparison in the opening copy and avoid
presenting it as a hard cutoff. In the markdown content, update the wording in
the intro paragraph to either qualify the claim with the benchmark/setup used
(device memory, GPU, and quantization) or rephrase it as a general example of
expected performance. Keep the language aligned with the guidance in the article
so the statement matches the behavior described by the phone-to-desktop
workflow.
- Around line 101-103: The FAQ in connect-phone-ollama-lmstudio.md overstates
provider support by claiming Anthropic-compatible servers work. Update the
“Which servers work?” answer to mention only the OpenAI-compatible API path
currently supported here (for example Ollama, LM Studio, LocalAI, and similar),
and remove the Anthropic-compatible statement so the marketing copy matches the
actual remote provider behavior.
In `@marketing/devto/vision-ai-phone-camera-offline.md`:
- Around line 31-32: The performance claim in the article is too absolute and is
repeated without enough context; update the wording in the recommendation
section to avoid stating “about seven seconds” as a general result. Use a
specific benchmark tied to a concrete device, model, prompt length, and OS
version if you keep the timing, or remove the number and replace it with a
softer, qualified statement in the relevant copy.
In `@src/screens/ModelDownloadScreen.tsx`:
- Line 250: The user-facing alert title in ModelDownloadScreen’s showAlert call
uses an em dash, which violates the repo copy rule; update the
connected/no-models message to use a hyphen instead of the em dash while keeping
the rest of the text unchanged. Locate the string in the
setAlertState(showAlert(...)) call within ModelDownloadScreen and replace the
title copy accordingly.
In `@src/screens/ModelSettingsScreen/TextGenerationSection.tsx`:
- Line 72: The warning copy in TextGenerationSection should be made concrete and
guideline-compliant instead of using vague RAM/device language and an em dash.
Update the warning text used for the maxTokens/contextWarnThreshold check, and
the matching warning at the other referenced location, to state the actual
threshold or measurable condition directly using proof-first language. Keep the
warning tied to the existing maxTokens and contextWarnThreshold logic, and
remove any em dash from the copy.
In `@src/screens/ModelsScreen/VoiceModelsUpsell.tsx`:
- Around line 35-43: The desktop CTA in VoiceModelsUpsell is swallowing
Linking.openURL failures, which leaves the tap with no feedback. Update the
TouchableOpacity onPress handler in VoiceModelsUpsell to catch the rejection and
surface a short user-facing error via showToast instead of an empty catch, using
the existing OFF_GRID_DESKTOP_URL and the current desktop link action for
context.
In `@src/screens/RemoteServersScreen.tsx`:
- Around line 141-143: The empty-state copy in RemoteServersScreen should avoid
the vague phrase “other LLM servers” and instead state the supported contract
directly. Update the text rendered in the empty state to name the class of
supported endpoint explicitly, using the same wording consistently in the other
affected copy block, so the messaging is proof-first and specific (for example,
“another OpenAI-compatible server on your network”).
In `@src/screens/SettingsScreen.tsx`:
- Line 188: Update the Remote Servers nav copy in SettingsScreen so it removes
the vague “and more” phrasing and replaces it with a concrete supported
capability, such as “another OpenAI-compatible server.” Keep the change
localized to the settings row object with title Remote Servers so the
description uses proof-first, specific language instead of a catch-all claim.
In `@src/services/modelResidency/policy.ts`:
- Around line 80-83: The comment for SIDECAR_TYPES in policy.ts does not match
the eviction behavior in the residency policy. Update the comment near the
SIDECAR_TYPES definition to say these models are evicted last instead of “never
evicted for capacity,” or, if they truly must be pinned, adjust the eviction
logic in the residency selection code that uses SIDECAR_TYPES so they are
excluded from candidate eviction.
In `@src/services/rag/embedding.ts`:
- Around line 22-34: The withTimeout helper leaves its timer running when the
wrapped promise rejects before the deadline, so update the cleanup logic in
withTimeout to always clear the timeout in a finally path and only run the
orphan-handling promise.then(onOrphan) when the timeout has actually fired. Use
the existing withTimeout, timeout, and onOrphan flow to track whether the
deadline was reached, and make sure initLlama rejection does not leave a
dangling timer.
In `@src/services/toolEmbeddingRouter.ts`:
- Around line 45-48: The first-time cache hydration in hydrateCache is being
marked complete before AsyncStorage.getItem finishes, which can let concurrent
callers skip loading persisted data and use an empty cache. Change
toolEmbeddingRouter so hydrateCache stores a shared hydration promise and awaits
it for all callers, setting the hydrated state only after the async load and
cache population completes.
---
Nitpick comments:
In `@__tests__/integration/stores/remoteServerDiscovery.test.ts`:
- Around line 571-585: The no-kind fallback test in remoteServerDiscovery should
also verify non-generative models are excluded, not just that a chat model
survives. Update the discovery payload in the existing
`discoverModels('srv-plain')` test to include an embedding or reranker entry
alongside `llama-3.2`, then assert only the generative model is returned. Keep
the check focused on the `useRemoteServerStore` discovery path and the
`isTextModel()` filtering behavior so regressions in `discoveredModels` are
caught.
In `@__tests__/rntl/components/RemoteServerModal.test.tsx`:
- Line 114: The RemoteServerModal tests are coupled to placeholder text via the
server name and endpoint field queries, so they will churn when copy changes.
Update the relevant assertions in RemoteServerModal.test.tsx to use stable
selectors from the component, such as testID or accessibility labels, by
locating the inputs used around VALID_ENDPOINT and the related render/getBy*
checks in the affected test cases. Keep any placeholder-specific assertions
separate if you still want coverage for the copy itself.
In `@__tests__/rntl/components/TranscriptionModelsTab.test.tsx`:
- Around line 165-170: The current TranscriptionModelsTab test only verifies
that the nested download affordance is hidden, but it does not prove the card
itself cannot start a re-download. Strengthen the existing test in
TranscriptionModelsTab.test.tsx by triggering a press on the
transcription-model-card-0 element and asserting downloadModel is not called,
using the same seeded running STT state so the behavior is covered in one pass.
In `@__tests__/rntl/screens/RemoteServersScreen.test.tsx`:
- Around line 143-145: The empty-state desktop action test only checks that the
“Get Off Grid AI Desktop” label renders, so it won’t catch a broken or miswired
press handler. Update the affected tests around RemoteServersScreen to trigger
the captured alert action and the visible link, then assert that Linking.openURL
is called with OFF_GRID_DESKTOP_URL. Use the existing RemoteServersScreen and
OFF_GRID_DESKTOP_URL symbols so the test verifies the actual behavior, not just
the copy.
In
`@__tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts`:
- Around line 192-220: Add a regression test in
useDownloadManager.branches.test.ts for the STT retry path where
useDownloadManager.handleRetryDownload calls whisperService.downloadModel and
that promise rejects before a new row is created. Reuse the existing
handleRetryDownload, mockWhisperService.downloadModel, and
mockBackgroundDownloadService assertions to verify the failed retry is
surfaced/handled as expected, while ensuring the stale row cleanup and cancel
behavior are still covered by the stt retry branch.
In `@src/screens/ProDetailScreen/index.tsx`:
- Line 54: The desktop CTA logic is duplicated across ProDetailScreen and
RemoteServersScreen, so extract the shared OFF_GRID_DESKTOP_URL /
Linking.openURL(...).catch(() => {}) / “Get Off Grid AI Desktop” behavior into a
reusable helper or component and have ProDetailScreen use it instead of inline
copies. Centralize the tap handler and CTA text/props in a shared symbol (for
example a dedicated desktop CTA component or helper used by ProDetailScreen and
RemoteServersScreen) so future surfaces can reuse the same implementation
consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e16ebeda-bff1-493d-b1a5-9a308e57bed3
📒 Files selected for processing (60)
CLAUDE.mdREADME.md__tests__/integration/stores/remoteServerDiscovery.test.ts__tests__/rntl/components/ModelSelectorModal.test.tsx__tests__/rntl/components/RemoteServerModal.test.tsx__tests__/rntl/components/TranscriptionModelsTab.test.tsx__tests__/rntl/screens/ChatScreen.test.tsx__tests__/rntl/screens/ModelDownloadScreen.test.tsx__tests__/rntl/screens/ProDetailScreen.test.tsx__tests__/rntl/screens/RemoteServersScreen.test.tsx__tests__/rntl/screens/SettingsScreen.test.tsx__tests__/unit/screens/DownloadManagerScreen/useDownloadManager.branches.test.ts__tests__/unit/services/audioRecorderService.test.ts__tests__/unit/services/hardware.branches.test.ts__tests__/unit/services/llm.test.ts__tests__/unit/services/llmSafetyChecks.test.ts__tests__/unit/services/networkDiscovery.test.ts__tests__/unit/services/rag/embedding.test.ts__tests__/unit/services/toolEmbeddingRouter.test.ts__tests__/unit/stores/appStore.test.ts__tests__/unit/utils/toast.test.tsandroid/app/src/main/java/ai/offgridmobile/litert/LiteRTModule.ktandroid/app/src/test/java/ai/offgridmobile/litert/LiteRTTokenBudgetTest.ktdocs/STABILITY_FIX_PLAN.mdios/CoreMLDiffusionModule.swiftios/DownloadManagerModule.swiftmarketing/devto/chat-with-pdfs-phone-offline.mdmarketing/devto/connect-phone-ollama-lmstudio.mdmarketing/devto/local-ai-tools-web-search-phone.mdmarketing/devto/talk-to-local-ai-voice-phone.mdmarketing/devto/vision-ai-phone-camera-offline.mdpatches/@kesha-antonov+react-native-background-downloader+4.5.6.patchprosrc/components/ModelSelectorModal/TextTab.tsxsrc/components/ModelSelectorModal/index.tsxsrc/components/RemoteServerModal/index.tsxsrc/components/settings/ProUpsellBanner.tsxsrc/constants/index.tssrc/screens/ChatScreen/index.tsxsrc/screens/DownloadManagerScreen/retryHandlers.tssrc/screens/DownloadManagerScreen/useDownloadManager.tssrc/screens/ModelDownloadScreen.tsxsrc/screens/ModelSettingsScreen/TextGenerationSection.tsxsrc/screens/ModelsScreen/TranscriptionModelsTab.tsxsrc/screens/ModelsScreen/VoiceModelsUpsell.tsxsrc/screens/ProDetailScreen/index.tsxsrc/screens/RemoteServersScreen.styles.tssrc/screens/RemoteServersScreen.tsxsrc/screens/SettingsScreen.tsxsrc/services/hardware.tssrc/services/llm.tssrc/services/llmSafetyChecks.tssrc/services/mcpContextBoost.tssrc/services/modelResidency/policy.tssrc/services/networkDiscovery.tssrc/services/rag/embedding.tssrc/services/toolEmbeddingRouter.tssrc/stores/appStore.tssrc/stores/remoteServerHelpers.tssrc/utils/toast.ts
| let preferNeuralEngine: Bool | ||
| if #available(iOS 26.0, *) { | ||
| preferNeuralEngine = false | ||
| } else { | ||
| pipe = try StableDiffusionPipeline( | ||
| resourcesAt: url, | ||
| controlNet: [], | ||
| configuration: config, | ||
| reduceMemory: true | ||
| ) | ||
| preferNeuralEngine = true | ||
| } | ||
|
|
||
| // Skip prewarm for 'original' variant (low-memory devices): prewarm | ||
| // loads the full Unet into memory just to unload it, causing an OOM spike. | ||
| // With reduceMemory=true the pipeline lazily loads each submodel during | ||
| // generateImages(), so prewarming is unnecessary. | ||
| if attentionVariant != "original" { | ||
| try pipe.loadResources() | ||
| let primaryUnits: MLComputeUnits = (cpuOnly || !preferNeuralEngine) ? .cpuAndGPU : .cpuAndNeuralEngine | ||
| let pipe: StableDiffusionPipelineProtocol |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
cpuOnly currently routes to .cpuAndGPU.
Line 198 ignores the CPU-only request and still enables the GPU. That breaks the parameter contract and can keep callers on the exact fallback path they were trying to avoid. Split this into three cases so cpuOnly maps to .cpuOnly, then choose ANE vs CPU+GPU only for non-CPU-only loads.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ios/CoreMLDiffusionModule.swift` around lines 192 - 199, The `cpuOnly` flag
is being ignored in the `CoreMLDiffusionModule` pipeline setup, so the current
`primaryUnits` selection can still use GPU-backed compute. Update the logic
around `preferNeuralEngine` and `primaryUnits` so `cpuOnly` maps directly to
`.cpuOnly`, and only non-CPU-only loads choose between `.cpuAndNeuralEngine` and
`.cpuAndGPU` in the same `StableDiffusionPipelineProtocol` setup path.
| desktopLink: { | ||
| flexDirection: 'row' as const, | ||
| alignItems: 'center' as const, | ||
| gap: 6, | ||
| marginTop: 12, | ||
| paddingVertical: 4, | ||
| }, | ||
| desktopLinkText: { | ||
| fontSize: 14, | ||
| color: colors.primary, | ||
| }, |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Replace the hardcoded spacing and font size with design-system tokens.
gap: 6, marginTop: 12, paddingVertical: 4, and fontSize: 14 bypass the required SPACING/TYPOGRAPHY tokens, so this new CTA can drift from the rest of the UI. As per coding guidelines, "Use design-system TYPOGRAPHY tokens only; do not hardcode font sizes" and "Use design-system SPACING tokens only; do not hardcode margin or padding values."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/screens/RemoteServersScreen.styles.ts` around lines 57 - 67, The
`desktopLink` and `desktopLinkText` styles use hardcoded spacing and font sizing
instead of design-system tokens. Replace `gap`, `marginTop`, and
`paddingVertical` in `desktopLink` with the appropriate `SPACING` values, and
replace the `fontSize` in `desktopLinkText` with the matching `TYPOGRAPHY` token
so the CTA stays consistent with the rest of the UI.
Source: Coding guidelines
| const quantizedCache = settings.cacheType !== 'f16'; | ||
| const getMem = () => hardwareService.getAppMemoryUsage(); | ||
| let memCheck = await checkMemoryForModel({ modelFileSize: fileSize, contextLength: params.ctxLen, getAvailableMemory: getMem, quantizedCache }); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Treat OpenCL loads as f16 in the memory guard.
quantizedCache is derived from the stored setting only, but OpenCL loads can force the effective cache type to f16. On Android that makes the guard use the cheaper quantized estimate even when the actual KV cache is f16, so resolveSafeContext() can still approve contexts that later OOM during load.
Suggested fix
- const quantizedCache = settings.cacheType !== 'f16';
+ const quantizedCache =
+ settings.inferenceBackend !== INFERENCE_BACKENDS.OPENCL &&
+ settings.cacheType !== 'f16';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const quantizedCache = settings.cacheType !== 'f16'; | |
| const getMem = () => hardwareService.getAppMemoryUsage(); | |
| let memCheck = await checkMemoryForModel({ modelFileSize: fileSize, contextLength: params.ctxLen, getAvailableMemory: getMem, quantizedCache }); | |
| const quantizedCache = | |
| settings.inferenceBackend !== INFERENCE_BACKENDS.OPENCL && | |
| settings.cacheType !== 'f16'; | |
| const getMem = () => hardwareService.getAppMemoryUsage(); | |
| let memCheck = await checkMemoryForModel({ modelFileSize: fileSize, contextLength: params.ctxLen, getAvailableMemory: getMem, quantizedCache }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/services/llm.ts` around lines 71 - 73, The memory guard in llm loading is
using the stored cache setting only, so OpenCL paths can still be checked as
quantized even when the effective KV cache becomes f16. Update the guard setup
around checkMemoryForModel in the llm service to derive quantizedCache from the
actual load mode, and force it to false for OpenCL loads so resolveSafeContext()
uses the f16 estimate. Use the existing llm.ts flow and symbols like
settings.cacheType, checkMemoryForModel, and resolveSafeContext to locate the
fix.
Thread preferGpu (from hardwareService.preferGpuForImageGen) through the image load path into the native loadModel params, so the native compute path matches the residency estimate the gate sized the load against. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pick compute units from the JS-supplied preferGpu instead of guessing natively: GPU when chosen (devices with RAM; iOS 26 ANE load fails on the 8GB Pro), ANE otherwise (low-RAM devices where the GPU OOMs). Fall back only GPU -> ANE (lower system RAM); never ANE -> GPU, which would OOM the device the ANE was protecting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
The public repo's CI does not check out the private pro/ submodule, so any __tests__ file importing ../pro/* fails to resolve there. Several such tests were added to the branch without updating both CI configs, so tsc --noEmit (typecheck job) errored on integration/audio/*, PlaybackControls, audioProgressCaption and unit/mcp/McpToolExtension, and the test job would have failed on the latter two (absent from jest's ignore list). Bring tsconfig.exclude and jest.config testPathIgnorePatterns into lockstep so both cover the same full set of pro-importing tests; they run in pro's CI where the submodule is present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pleted) Regression for the exact device case: every Kokoro asset basename is present on disk (executorch creates files before their bytes finish) yet a download is live — the engine must report downloading, not completed, so the Download Manager and Voice panel agree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reframe the card from a feature list to the outcomes people get: chat, on-device image and voice, live artifacts, projects that answer from your own docs with citations, plus the private layer that remembers what you see, rewinds your screen, runs one search across your day, connects your tools, dictates anywhere, keeps a searchable clipboard, and turns activity into approval-gated to-dos and actions - all on-device. Follows the brand arc (recognition → return → freedom); no em dashes, no forbidden words. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…RegisteredPro The Pro "aha" sheet and the settings upsell banner showed even when Pro was unlocked, because they gated on hasRegisteredPro (set only when a license is registered in-app) while loadProFeatures actually unlocks Pro from the keychain entitlement OR a __DEV__ unlock — and never reflected that in the store. So a keychain/dev-unlocked Pro user saw the upgrade prompt. Add an authoritative isProActive flag set by loadProFeatures (the same `active` signal that activates paid features), and gate both upsells on hasRegisteredPro || isProActive. Not persisted — recomputed each launch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…he freeze Firing every requested download at the native layer at once split bandwidth across all of them and, with multi-GB models, drove iOS into a freeze (25 concurrent observed on device). backgroundDownloadService.startDownload — the single chokepoint every native download (text/image/stt) passes through — now admits at most MAX_CONCURRENT_DOWNLOADS (3); the rest wait in a FIFO queue and begin as running downloads complete, error, or are cancelled. A reservation token is added synchronously so a same-tick burst can't over-admit; duplicate queued starts for the same model coalesce. restore adopts resumed downloads against the cap so a relaunch can't admit a fresh batch on top of them. Because only ≤3 ever start, the native store never holds more than that, so a relaunch can't resume a storm either. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d of failing them
After an app-kill, iOS foreground downloads can't resume, so reconcile() used to mark
every interrupted download 'failed' ("Interrupted — tap retry"), forcing a manual tap
per model. Now it re-queues them: marks 'pending' (→ 'queued' in the service
vocabulary) and re-issues each through restartIosTextDownload — the SAME path retry
uses (modelManager.downloadModelBackground → backgroundDownloadService.startDownload),
so they flow through the 3-slot concurrency cap, auto-resuming up to 3 and queueing the
rest. Re-issue is fire-and-forget so launch isn't blocked behind the cap. Shared helper
keeps retry and reconcile DRY; no new status, no caller branching, no cap bypass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ve and Downloaded A model that completed and was then re-started (a stale restore / re-download) showed in the Download Manager as BOTH a failed "Active Downloads" row and a "Downloaded Models" row — two guys doing the same thing (SDXL Core ML in both sections). The DM builds those sections from separate stores (downloadStore vs registered models) with no cross-dedup. Fix at two seams: - Display: a downloaded (on-disk, registered) model is authoritative, so drop any in-flight/failed row for the same model from activeItems, keyed by the shared uniformDownloadId. - State: textProvider.reconcile now drops a stale in-flight row when the model is already registered (never re-download something we already have) instead of re-queuing it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…wnload Manager A download waiting for a concurrency slot has no native downloadId and no store row (performBackgroundDownload adds the row only after startDownload resolves), so it was invisible while queued. Surface the waiting starts from their owner (backgroundDownloadService.getQueuedItems) and render them as 'pending' (→ "Queued"). Merged into Active Downloads after the started rows, deduped against started + already downloaded models so a model never appears twice. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Selecting a new model in the chat picker loaded it with the sheet still open, then closed the sheet only once the load finished — because proceedWithModelLoadFn closed the picker in its finally block (after awaiting loadTextModel). Move the close to the start so the sheet dismisses immediately and the minimal in-chat loading card shows during the load, matching the normal flow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mic) Guards the single bottom-bar view-model: mic when idle, tts-stop while playing/paused (disabled only while preparing), generation-stop outranks playback, and voice-switch is reported alongside without competing with the center action. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The outcome-led copy was too long on the card. Trim to the core outcomes — chat, image, voice, on-device projects, plus the private layer that remembers, rewinds, searches, and drafts actions — kept brand-voice clean (no em dashes, no forbidden words). Nothing leaves the device. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ressure (OOM/jetsam) The app was jetsam-killed (code 9) when a whisper sidecar auto-loaded onto an actively- generating CoreML image model: makeRoomFor said fits=true (7279+142 ≤ 0.78*12GB) and stacked them, blind to the "~120s GPU optimization" compile spike that had already consumed the real RAM. Root cause: budgetForSpec only consulted the live os_proc_available memory for DIRTY specs; an mmap sidecar took the static physical-cap path and skipped the live check entirely. Fix, in ONE owner (budgetForSpec — no duplicated memory math in makeRoomFor): apply the live os_proc budget whenever there is dirty-memory PRESSURE — the incoming model is dirty OR a dirty model is already resident. A dirty model's working set can't be paged out like clean mmap weights, so while one is present every load (even a sidecar) must fit real free RAM. With no dirty pressure, mmap GGUF stays bounded by physical RAM only (a big LLM still loads on a high-RAM phone when instantaneous available is low). Adds dirtyMemory to the Resident type so residents carry it. Regression tests cover the sidecar-refused-under-dirty-pressure case and the no-false-refusal case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cision A refusal on a 12GB device is only explainable with the raw numbers: log os_procAvailMB + totalMB + dirty alongside the budget, so we can tell whether real free RAM is genuinely low (correct refusal) or the app footprint is bloated/leaking (the real bug) rather than deriving it from the budget. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The memory-pressure failure card offers "Free memory & Retry", but the handler just re-ran the same request into the same wall (makeRoomFor returns fits=false without evicting, so nothing was freed). Now a memory-pressure retry ejects resident models (activeModelService.ejectAll, lazy-required to avoid the import cycle) BEFORE re-running, so the label is honest and the retry can actually recover memory when models are loaded. (It still can't conjure RAM that isn't model-held — when the real os_proc headroom is below the model's need, the message correctly points to a smaller model.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ocess Limit Total RAM alone is misleading — iOS caps a process well below physical, and model loads are gated on the live os_proc figure. Surface the real picture on Device Information: Available Now, App Footprint, and the derived Process Limit (available + footprint) — the actual OS cap on the app. This makes a "not enough memory" refusal explainable from the UI (and readable in release, where the __DEV__ file log sink is off), and confirms whether the increased-memory entitlement is in effect: a distribution build should show a much higher Process Limit than a development-signed one. Adds hardwareService.getProcessMemory(). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y gate fires The live-os_proc gate has two triggers: incoming-is-dirty (worked) and a-dirty-model- is-resident. The second was a runtime no-op because activeModelService registered the image model WITHOUT dirtyMemory: true, so residents.some(r => r.dirtyMemory) was always false — exactly the jetsam case (a whisper sidecar stacking onto a resident, generating image) went ungated. Tests passed only because they hand-set the flag. Set it on the real register so the resident-dirty arm actually engages. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…permanent slot leak) adoptActive was fed every restored id including 'completed' ones. A completed download never emits another terminal event, so its concurrency slot never released — after a few background-completions the cap starved and all downloads silently stalled. Adopt only genuinely in-flight (running/pending/retrying/waiting_for_network) restored downloads. Also repairs two mocks that were left missing adoptActive/getQueuedItems (bypassed the pre-commit test gate). Adds a regression test that a completed restored download is not adopted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e memory & Retry" The card's "Free memory & Retry" label (from reasonFromLoadError → memoryPressure) and the retry's decision to actually eject used TWO different regexes that could disagree — so the label could promise a free the retry never performed. Derive both from the one owner (reasonFromLoadError === 'insufficient-memory'); drop the second regex and the dead lazy-require (no import cycle exists — activeModelService is already imported). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… them The review fixes route through backgroundDownloadService.adoptActive (restore) and appStore.setProActive (loadProFeatures); update the parallelMmproj and proBootFlow mocks to include them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t mmproj watchBackgroundDownload only reconciled the mmproj sidecar against native 'completed' status after subscribing — the main GGUF had no catch-up. Under the new 3-concurrent cap the listener can be registered AFTER the main completes in native (setup delayed behind an awaited-queued sidecar start), so its single DownloadComplete event fires with no subscriber and is lost: mainCompleted stays false and tryFinalize hangs at 100% forever. Extract handleMainComplete/handleMmProjComplete as the single completion handlers invoked by EITHER the live onComplete event OR one reconcile that checks BOTH downloads against native status (the *CompleteHandled flags guard double-runs). One owner per completion, no duplicated move/finalize logic. Regression test: a non-vision main reported 'completed' before listener registration finalizes via the reconcile alone, with no live event fired. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A start waiting for a concurrency slot lived only in backgroundDownloadService's queue — no native downloadId, no store row — so tapping remove on a "Queued" row hit the provider's findEntry, missed, and no-opped: the item stayed queued and its startDownload() promise never settled until a slot freed. Fix at the seam, keeping single ownership: - backgroundDownloadService.cancelQueued(key) — the queue owner removes the waiting start and settles its promise as a user cancellation (reusing the `.cancelled` convention the onError path already uses), touching no native download. - ModelDownloadService.dispatch routes a not-found cancel/remove to the queue, mapping the uniform id onto the queued item via the SAME uniformDownloadId the providers' list() and the View use — so queued and started rows route identically. retry is not routed (a not-yet-started item can't be retried). - startModelDownload swallows a `.cancelled` rejection (no store row to fail, not an error) instead of firing onError. - The Download Manager's queued projection now refreshes on the service's notify, so a cancel drops the "Queued" row immediately rather than on the 1s poll. Tests: cancelQueued removes+settles+frees (and returns false for an unknown key); dispatch routes a queued-only id to cancelQueued by uniform id and refuses retry; startModelDownload does not surface onError on a cancellation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
handleStopFn ended the generation session and stopped LLM + image generation but never fired HOOKS.audioStop. Streaming TTS runs ahead of playback (synthesis is ~100ms, playback is seconds), so at Stop time there are always buffered sentences — the phone kept talking through them after the user aborted. The new-turn path (handleSendFn) already fired audio.stop; the manual-stop path was missing it. Regression test asserts handleStopFn fires HOOKS.audioStop (fails against the prior code). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pe it as image A HuggingFace multi-file image download runs its parts sequentially through downloadFileTo. When the current part is queued behind the 3-download cap it has no native downloadId, so cancelSyntheticImageDownload (which only had currentDownloadId to work with) could not reach it — the part sat in the queue holding a slot, then promoted, briefly started, and was only cancelled by wireCurrentDownloadPromise. A cancel that visibly does nothing until a slot frees. - cancelSyntheticImageDownload now also calls backgroundDownloadService.cancelQueued with the part's key (== makeImageModelKey) to drop a queued part at once. - The part now carries modelType:'image' (it defaulted to 'text'), so getQueuedItems types it correctly and the service's uniform-id cancel routing matches the View. - isCancelledError recognizes the cross-service `.cancelled` convention (message "Download cancelled") in addition to the local sentinel, so a cancelled part is treated as a cancel, not surfaced as a download failure — this also hardens the existing active-cancel path. Regression tests: cancel drops a queued part via cancelQueued (no cancelDownload, no native id); multi-file parts are typed image. Both fail against the prior code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… vision finalize-hang startModelDownload attaches watchDownload only AFTER downloadModelBackground resolves, and startBgDownload started the MAIN first, then blocked awaiting the sidecar's startDownload. Under the 3-concurrent cap the sidecar can queue for minutes; during that wait the main (already started) can complete and fire its single DownloadComplete with no listener attached yet — the event is lost, ctx.mainCompleted never flips, and finalization hangs at 100%. The reconcile added in fd3b413 only recovers this when native still reports the main 'completed' at watcher-attach time, which is not guaranteed after a long unwatched wait (iOS URLSession evicts completed tasks). Fix: start the sidecar FIRST, so the main is the last download started before the function returns and the caller attaches the watcher. Nothing long is awaited after the main starts, so the watcher is live before the main can complete (the reconcile still covers the microtask gap). Behavior-neutral otherwise: same two downloads, same store row, same listeners, same finalize path — only the start order swaps and setMmProjDownloadId moves just after the store row is added. Tests: assert the sidecar's startDownload precedes the main's (fails against the prior order). stubStartDownload now keys ids by role (main vs mmproj fileName) not call order, so the suite is agnostic to start ordering. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… queue contract The dispatch queue-fallback (74ff0e9) consults backgroundDownloadService.getQueuedItems on a not-found cancel/remove, but this integration test mocked the service without it, so remove of an unknown id threw instead of refusing cleanly. Mirror the real contract (getQueuedItems/cancelQueued) so the mock can't diverge from the code under test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nload-state fixes) Records the pro-side commits made this session so core builds against them: - single audio view-model (deriveAudioActivity) for the bottom bar; paused clip shows the mic, not a stop button - coordinator resets the voice-switch on engine loss - Kokoro: a live download wins over disk-presence (DM no longer shows completed at 3%); a fetch collision is treated as benign; [KOKORO-DL] decision logging Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
iOS + Android stability, performance, and image-generation fixes. Pairs with pro PR off-grid-ai/mobile-pro#8.
Stability / performance regression (the primary fix)
da072bd5) — enabling tools no longer forces a 32k context + model reload, the root cause of the post-Pro slowdown and the iOS Metal-buffer / Android litert OOM crashes on flagship devices. The embedding tool-router already fits schemas in the default window.10236e32).Memory correctness
ccbb7fce).a802efd8).Performance
df6619fa).Android engine hardening
a7ee32f5).iOS image generation (iOS 26)
96b82e53,76fcf1f7,bea0e1b7,bd302781,df6d39b0): on iOS 26 the Neural Engine is degraded for palettized diffusion models — it fails to load on the 8 GB iPhone 15 Pro and stalls on smaller devices. So iOS 26 devices with ≥7 GB load on the GPU (the working path), and lower-RAM devices load on the ANE (far smaller system-RAM footprint) instead of OOMing on the GPU. The residency estimate follows the same choice (GPU 2.5×, ANE 1.8×) so the gate admits a load that fits rather than refusing it; the native loader falls back only GPU→ANE (never the reverse, which would OOM).Downloads / discovery / UX
27884cdb).c2173e65).94f27d8f).421be2f7).Refactor
9af76d50) — keepsuseDownloadManagerunder the complexity/line limits after the STT-retry change.Test plan
Download-manager sync, Kokoro fix, Desktop card (latest)
33ab25a0, tests8c1a4dda): the Download Manager showed Kokoro TTS completed (82 MB) on the first tap while the Voice panel correctly showed it downloading (4%). Root cause: the engine defined "downloaded" as just the two core.ptemodels, which survive a prior interrupted download, so the disk scan flipped to completed conclusively before the active voice's assets finished. Completeness is now defined once, from the same_activeVoiceSources()the downloader fetches (core.pte+ the active voice's embedding/tagger/lexicon), readinglistDownloadedFiles()instead oflistDownloadedModels()(which filters to.pteand never sees the voice assets). Regression test reproduces the exact case.http://getoffgridai.co/desktopvia the singleOFF_GRID_DESKTOP_URLconstant (all 8 call sites updated) (c2b9dd3f,d9b9a810).DesktopPromoCardcomponent that owns its copy/dismiss state (d878f403,c506cce6).63d1cf52(Kokoro fix + merge of origin's audio fixes) (66ea85c7).Concurrency, memory-gate, audio & entitlement hardening (latest)
Downloads - concurrency + lifecycle (fixes the ~25-parallel freeze and the finalize hang)
98fe0719) - starting a whole catalog no longer spawns ~25 parallel transfers that OOM-froze the device; extra starts wait for a slot. Queued (not-yet-started) downloads render as "Queued" in the Download Manager (a11eac5b), and interrupted iOS downloads are re-queued on relaunch instead of marked failed (a36399f3).56b34002); adopt only in-flight restored downloads on relaunch (a completed one would leak a concurrency slot forever) (bfdc4cf0).74ff0e9b) - a "Queued" row had no native id, so Cancel/Remove no-opped; the queue owner now exposescancelQueuedand the service routes a not-found cancel/remove to it by the same uniform id. Queued multi-file image parts are cancellable too and typedimage(36eb775b).fd3b4136), and start the sidecar before the main so the watcher is live before the main can complete under queue pressure (73eace72).Memory - live os_proc gating (turns OOM/jetsam into graceful refusal)
os_proc_available_memoryunder dirty-memory pressure (8c349f25), register the image model asdirtyMemoryso the resident-dirty arm of the gate actually fires (08acaf31), and log the raw available/total in the[MEM-SM]decision (f94e9b4b).1dd64cd5), single-sourced viareasonFromLoadError(b9aee47b). Device Info shows per-process Available / Footprint / Process Limit (7dab9f1e).Audio - one owning view-model (bottom bar can't desync)
deriveAudioActivityview-model drives the bottom bar (generation-stop > tts-stop > mic; a paused clip shows the mic) and a coordinator resets the voice-switch on engine loss (pro side; testsffeb86b8,dd47aadc,3f8d4434).d6f2b194) - the Stop button ended the LLM but left buffered-ahead sentences playing.Pro / chat
hasRegisteredPro(d2412a5e).5313ecd1).pro submodule bumped to
5ed8e4b9(single audio view-model + Kokoro download-state fixes) (d20d006d).Full JS suite green (247 suites / 6241 tests), tsc + eslint clean. Each fix ships with a regression test that fails against the pre-fix code. Remaining known items (tracked, not blocking): the dirty/clean residual-RAM accounting in the memory gate and CI coverage for the pro audio suite.
🤖 Generated with Claude Code
Summary by CodeRabbit