Releases: roryford/ManifoldKit
v0.49.1
v0.49.0
Highlights
Route individual turns to a secondary backend without unloading the primary (#1799) — InferenceService gains a deepBackend property and a GenerationRoute enum. Setting route: .deep on any enqueue call dispatches that turn through the host-owned secondary backend while the primary model stays loaded; the existing cancel/stop path targets the correct backend automatically. Existing callers are unaffected — route defaults to .primary and the byte-identical code path is preserved.
inferenceService.deepBackend = myCloudBackend
let stream = try await inferenceService.enqueue(
messages: history,
systemPrompt: systemPrompt,
config: config,
route: .deep // primary stays loaded; this turn goes to deepBackend
)Resumable runs are now persisted end-to-end (#1795) — ConversationRun is a full SwiftData @Model type. Runs are written through RunStore, survive app restart, and can be rehydrated via resume(from:) on ConversationRuntime. The ResumableRunDriver wires reconnect logic — partial output already delivered to the UI is not re-streamed.
Selection-time model profiles for Apple Foundation Models (#1783) — ModelProfile is computed at selection time from DeviceCapability and the new MoE-aware recommender inputs, giving the Apple FM tier (Tier 0) an accurate capability signal before the model loads. This feeds hasDeepBackend detection and will drive auto-routing once Fireside P1 lands.
Performance
- Streaming render is O(n) again (#1788) — the UI streaming path was rebuilding the full message array on every token event; it now appends to an existing buffer.
- Session fetch by ID instead of full table scan (#1789) —
ConversationRuntimewas scanning the entire sessions table to locate the active session on every turn; it now fetches by primary key. - Cloud stream frames parsed once (#1800) — each SSE frame was decoded 8–12 times through the provider chain; a single parse result is now threaded through.
- Model management no longer scans disk on every open (#1798) —
ModelManagementSheetdropped its per-appearinvalidateModelCache()call; the GGUF syscall storm on sheet open is gone.
Fixes
- Branch flow uses a transactional copy with rollback to prevent partial-branch corruption on error (#1792).
- HuggingFace download delegate and path handling hardened against missing-file and redirect edge cases (#1793).
- Model-load memory budgeting now accounts for MoE sparse activation, preventing over-allocation on large mixture models (#1794).
CloudImageEncoding.encodeHookaccess made race-free under concurrent generation (#1791).- MCP OAuth and SwiftData encoding errors now surface their underlying message instead of being swallowed (#1790).
v0.48.2
Highlights
The Model Management sheet opens instantly again (#1775) — Opening the model browser re-scanned the on-disk GGUF catalog synchronously on the main thread every time the sheet appeared, stalling the UI for ~2 seconds behind a spinner. The blanket per-open rescan is gone: ModelManagementSheet.onAppear no longer calls invalidateModelCache(), and the discovery cache is instead invalidated only on the events that actually change it — download completion, delete, and import. Reopening the sheet is now immediate, with regression coverage asserting the cache survives a re-appear when nothing changed. No API change.
Documentation
- Architecture plan reflects shipped v0.48 reality (#1776) —
docs/plans/target-architecture.mdgains an Implementation Status table mapping each migration phase (P0–P7) to its verified state inSources/, and the superseded P2c de-tangle brief is archived.
v0.48.1
Highlights
Streaming-completion wait no longer busy-polls (#1772) — ChatGenerationCoordinator.awaitStreamCompletion() previously spun a 1 ms Task.sleep loop on every turn while waiting for the active stream handle to clear. It now suspends on a continuation that resumes the instant the handle is cleared — eliminating the per-turn polling and closing a latent hang where a caller parked across stream teardown would never wake.
Fixes
v0.48.0
ManifoldKit's packaging is rebuilt. SwiftPM traits are retired in favor of library products, and the heavy MLX and llama.cpp backends move to companion packages — swift build just works, in every configuration, with no trait matrix. Full upgrade guide: docs/MIGRATION-0.48.md.
This release lands automatically if you depend on ManifoldKit with
from:. SwiftPM resolvesfrom: "0.47.0"as0.47.0..<1.0.0— there is no 0.x minor-pinning special case. Pin.upToNextMinor(from: "0.47.0")to stay behind; follow the migration guide to move forward.
Highlights
Traits are gone — products are the new build switch (#1764, #1765, #1768, #1769) — The MCP, MCPBuiltinCatalog, Voice, Tools, AppIntents, Skills, Ollama, CloudSaaS, and AnyLanguageModel traits no longer exist; passing any of them in a traits: array is now a resolve error. Those modules either compile unconditionally (MCP, Voice, Tools, AppIntents, Skills) or became products you opt into by importing (ManifoldOllama, ManifoldCloudSaaS, ManifoldAnyLanguageModel). Only Server and Macros remain as build switches.
// Before (v0.47)
.package(url: "…/ManifoldKit", from: "0.47.0",
traits: ["CloudSaaS", "Ollama", "MCP"])
// After (v0.48) — no traits; pick products instead
.package(url: "…/ManifoldKit", from: "0.48.0"),
// target deps: "ManifoldKit", .product(name: "ManifoldOllama", package: "ManifoldKit")MLX and llama.cpp move to companion packages (#1771, #1749) — ManifoldMLX (with the vendored FluxSwift/StableDiffusion diffusion backends) and ManifoldLlama now live at roryford/manifold-mlx and roryford/manifold-llama, tagged 0.1.0 alongside this release. They plug back in through one registration call. The ManifoldBackends umbrella remains for one release as a deprecated Foundation+Cloud shim.
// Package.swift
.package(url: "https://github.com/roryford/ManifoldKit", from: "0.48.0"),
.package(url: "https://github.com/roryford/manifold-llama", from: "0.1.0"),
// App entry point
import ManifoldKit
import ManifoldLlama
let kit = try await ManifoldKit.quickStart(backends: [LlamaBackends.self])quickStart(backends:) and runtime capability checks (#1766) — quickStart accepts companion registrars and folds them in before the availability guard, starter-model seed, and model selection run. Capability checks that used to reflect compile-time traits now reflect live registration: on-disk models with no registered backend are flagged instead of auto-selected, the starter seed gates on whether a registered backend can actually load it, and a configuration with no usable backend produces an actionable diagnostic naming the companion packages.
A frozen seam and a TestKit for backend authors (#1762, #1767) — ManifoldBackendTestKit and ManifoldTestSupport are now products: third-party backends run the same BackendContractChecks conformance suite the built-in families do. The cross-package seam (registration surface, Contract kernel, and the @_spi(BackendInternals) internals the families need) is pinned by a compile-time freeze fixture, and scripts/split-proof.sh proves the family sources build and pass their contracts out-of-package.
Why: SwiftPM traits can't do this job — The investigation of #1737 showed trait-conditional product edges are evaluated inconsistently between resolution and test-graph derivation upstream (swift-package-manager#8350), Xcode's trait support is broken through 26.x, and the per-combination build matrix could only ever be sampled. Products and companion packages eliminate the bug class structurally.
Features
ManifoldOllama and ManifoldCloudSaaS products (#1761) — the cloud families are now real products with explicit registrars; shared SSE/TLS plumbing stays in ManifoldCloudCore.
Faster prompt assembly (#1759) — PromptContextPipeline queries its providers concurrently; wall time is now the slowest provider, not the sum.
Fixes
Package.resolved freshness for the unconditional AnyLanguageModel edge (#1770) — lockfile updated alongside the trait retirement, plus README Hello World gate repairs.
v0.47.0
Highlights
BackendName is now an extensible struct (#1742) — BackendName was a closed enum; it is now a struct with a String raw value so third-party backends can register names without forking the library. CaseIterable is removed — use BackendName.wellKnown (or the allCases alias). BackendName(rawValue:) is now non-failable. Exhaustive switch statements must add a default: arm.
// Before — exhaustive switch compiled; BackendName(rawValue:) returned Optional
switch backendName {
case .ollama: …
case .anthropic: …
} // ❌ now needs default:
// After
switch backendName {
case .ollama: …
case .anthropic: …
default: … // required for extensibility
}
// Non-failable init
let name = BackendName(rawValue: "my-backend") // BackendName, not BackendName?TurnDriver seam and resumable ConversationRun (#1744) — The turn loop is now driven through a TurnDriver protocol so the execution strategy can be swapped or tested independently of ConversationRuntime. Runs are represented as a ConversationRun value that carries enough state to be resumed after an interruption (background kill, context window swap).
Seed a starter model on first launch (#1735) — ManifoldBootstrap.quickStart() now writes a default model entry into the model registry the first time it executes, so new app installs have a working model without any extra setup.
// One-call bootstrap now includes a starter model
let runtime = try await ManifoldBootstrap.quickStart()
// Model registry is pre-populated — no additional seeding requiredFeatures
- ManifoldHardware: add structured content sidecar to
ToolResult(#1741) - ManifoldServer:
brew install manifold-serversupport and command rename (#1734) - Start pre-1.0 deprecation clocks for flagged back-compat aliases (#1743)
Fixes
- ManifoldVoice: fix
@MainActorisolation crash inAppleSpeechTranscriberon first Voice tap (#1758) - Close connect-time DNS-rebinding TOCTOU in cloud transport (#1756)
- Close DNS-rebinding TOCTOU in MCP HTTP/SSE transport
- Security hardening bundle — action pins, file protection, output bounds (#1750)
- ManifoldMLX: load diffusion model from its correct directory
- Extract slow-to-type-check SwiftUI bodies (478ms/292ms/272ms → <200ms)
- Inline MLX tokenizer loader to drop
swift-syntaxfrom default builds
v0.46.0
Deprecated turn-input and cloud-backend APIs are removed, the turn loop is decomposed into per-turn seams behind a thin ManifoldContract leaf, and background generation lands a BGContinuedProcessingTask bridge for iOS.
Highlights
Remove deprecated turn-input and cloud-backend API surface (#1717) — The SendInput/RegenerateInput/EditInput/BranchInput structs and their ConversationRuntime overloads are removed; use processTurn(TurnInput(...)). InferenceService's currentCloudBackend/registerCloudBackendFactory/loadCloudBackend(from:) and CloudBackendFactory are removed; use the …EndpointBackend… equivalents. NoResponseError is renamed SendMessageError.
// Before (removed)
try await runtime.send(SendInput(text: "hello"))
// After
try await runtime.processTurn(TurnInput(text: "hello"))ManifoldContract extracted as a thin leaf module (#1723) — The core turn-loop contract (TurnInput, TurnOutput, TurnDriver) now lives in a dependency-free ManifoldContract target that sits below ManifoldRuntime. This lets local backends, MCP, and voice components depend on the contract without pulling in SwiftData or persistence ports.
Background generation bridge for iOS (#1715) — BGContinuedProcessingTask is wired into ConversationRuntime so long-running inference requests can survive an app moving to the background on iOS 26. The bridge requests background processing time via BGContinuedProcessingTask when a turn starts and cancels it cleanly on completion or cancellation.
Features
- ManifoldHardware: expose M5 Neural Accelerator availability probe (#1714)
- ManifoldHardware: registry-driven backend descriptor routing —
BackendDescriptorRegistryreplaces per-siteswitchstatements onModelType/APIProviderfor display and routing metadata (#1733)
Fixes
- Declare missing target dependencies, drop dead edges, capability-based cloud detection (#1727)
v0.45.0
Glass Box observability wiring completes across the full turn loop and media timelines, the framework ships a unified DocC documentation site, and the fuzz harness gains cloud targeting and block-rotation for broader coverage.
Highlights
Glass Box event wiring complete (#1672) — All previously dangling emits across the turn loop, image generation, and video timeline are now wired into the Glass Box observability layer. Turn-loop scenarios, per-chunk image-gen progress checkpoints, and video-timeline markers are all instrumented and available to the inspector. XCUITest smoke coverage for the Glass Box inspector panel ships alongside. (#1689)
Unified ManifoldKit DocC site (#1687) — ManifoldKit now ships a unified DocC documentation site with an umbrella root, cross-catalog curation, and hosting on GitHub Pages. The full public API surface — inference, runtime, persistence, cloud, UI, and all specialty modules — is browsable from one root, with curated article groups that map the module graph into a reader-friendly hierarchy.
Fuzz harness reaches cloud endpoints (#1690, #1676, #1700) — fuzz-chat can now target any OpenAI-compatible cloud endpoint (including OpenRouter) via --endpoint, broadening coverage beyond local Ollama. Block-rotation cycles through a pool of fuzz models each campaign to amortize per-model load cost. --request-timeout bounds per-request hangs so a stalled cloud provider doesn't lock the harness.
scripts/fuzz.sh --endpoint https://openrouter.ai/api/v1 \
--api-key "$OPENROUTER_KEY" \
--request-timeout 30Features
- Fuzz block-rotate models — amortize model-load cost across campaigns (#1676)
- Fuzz
--request-timeout— bound cloud fuzz request hangs (#1700) - Fuzz OpenAI-compatible cloud endpoints — target OpenRouter and compatible providers in fuzz-chat (#1690)
Bug Fixes
deleteSessionatomicity — message purge and session delete now commit in a single transaction (#1686)- First-run onboarding hardened — robust BYO default selection, clearer model-gating error messages, and gated BYO snippets (#1680)
- HuggingFace download reliability — background
URLSessionand per-chunk progress inHuggingFaceDownloadService(#1692) StreamActionswitch exhaustiveness and doc drift — corrects accumulated doc drift and adds a path-existence audit to prevent future drift (#1697)
v0.44.0
ManifoldUI gains a full theming and customization system, and the framework's provider reach widens through a graduated AnyLanguageModel bridge and a cross-encoder rerank stage for RAG. Under the hood, the P1 kernel-thinning continues with ManifoldModelCatalog, and a pre-v1 naming pass tightens the public API surface ahead of 1.0.
Highlights
Theming and UI customization for ManifoldUI — Consumers can now restyle the chat UI without forking ChatView or dropping to full BYO-UI, through a three-layer environment-driven stack. Layer 1 is a ChatTheme token struct (per-role bubble fills, corner radius, padding, spacing, fonts) applied with .chatTheme(_:). Layer 2 is a MessageBubbleStyle protocol with .plain (the themed default), .iMessage, and .card recipes applied with .messageBubbleStyle(_:). Layer 3 is a per-message renderer slot, .chatMessageRenderer(_:). Every layer is a thin shell over SwiftUI's native resolution, so Dark Mode, Dynamic Type, and Increase Contrast keep working. (#1640)
ChatView(…)
.chatTheme(ChatTheme(userBubbleBackground: AnyShapeStyle(.blue), cornerRadius: 20))
.messageBubbleStyle(.iMessage) // or .card, or .plain (the default, which reads ChatTheme)AnyLanguageModel provider-breadth bridge — The bridge graduates from a hidden trait into a documented, contract-tested path for providers without a native backend — Gemini, xAI, Groq, Mistral, OpenRouter, and any OpenAI/Anthropic-compatible endpoint. It advertises a conservative capability floor (isRemote on; tools, structured output, native JSON mode, thinking, and grammar all off) and fail-closes on unsupported requests rather than silently dropping them, so the capability router never routes those requests here. Ships docs/PROVIDER-BRIDGE.md and an env-gated conformance suite that needs no API key to compile or pass. (#1638)
import ManifoldBackends // behind the `AnyLanguageModel` trait
let backend = AnyLanguageModelBackend()
let url = URL(string: "gemini://gemini-2.0-flash?apiKey=\(key)")!
try await backend.loadModel(from: url, plan: plan)Cross-encoder rerank stage in RAG — RAGService gains an optional rerank stage between retrieval and prompt injection — the single biggest RAG-quality lever still open. When a reranker is configured and ready, retrieval widens the first-stage pool to 3× and reranks down to limit; with no reranker it is a byte-for-byte passthrough, keyword fallback included. The Reranker port lives in ManifoldInference alongside EmbeddingBackend; LlamaReranker scores [query, document] pairs through a RANK-pooling cross-encoder GGUF (e.g. bge-reranker). (#1637)
let rag = RAGService(
documentStore: documents,
vectorStore: vectors,
embeddingBackend: embedder,
reranker: reranker // any Reranker (e.g. LlamaReranker); omit for prior behaviour
)Pre-v1 API surface — naming pass and ManifoldModelCatalog extraction — Two threads of pre-1.0 surface work. The P1 kernel-thinning continues: ManifoldModelCatalog (model descriptors, catalog, logging) is extracted from ManifoldInference into a standalone zero-dependency product, following the ManifoldSecrets/ManifoldHardware/ManifoldNetworking split from v0.43.0 (#1611) — transparent to consumers via @_exported import. Separately, a naming pass tightens the public API ahead of v1: the Record suffix is dropped from inference-layer DTOs (#1650), EndpointBackend protocols are renamed, GenerationStream gains AsyncSequence conformance, and the deprecated configure* shims are removed in favour of configure(bootstrap:) (#1614). The renames and the shim removal are breaking — update affected call sites.
Features
- Adaptive prefill memory headroom —
PrefillFootprintEstimatoradapts the memory budget at prefill time using a measured per-model resident-byte-per-token EWMA, aborting before a prefill exceeds headroom instead of relying solely on the static 40% heuristic. Dormant (behaviour unchanged) until the first accepted sample. (#1592) - Per-layer MLX prompt-cache reuse for hybrid architectures — Prompt-cache reuse is now decided per layer instead of being disqualified wholesale by a single non-
KVCacheSimplelayer, so mixed and recurrent-hybrid models reuse KV where they previously re-prefilled every turn. Falls back to a full prefill whenever a layer cannot be reduced byte-exactly. (#1597) quickStart()backend-selection policy — First-launchquickStart()now applies a Foundation-first → first-local → labeled-empty-state selection policy, wiring the built-in Foundation model into the candidate list before the policy runs. (#1612)- Developer-journey quickstarts — New BYO-UI, tool-calling, and AppIntents quickstart guides. (#1658)
Bug Fixes
- Gemma GBNF grammar disabled — Gemma models truncate structured (JSON-object) grammars under llama.cpp — they open the object, stall on whitespace, and never complete — so grammar-constrained sampling is now disabled for the Gemma family (detected by GGUF architecture), routing them to JSON-mode parsing. (#1670)
- Ollama Gemma 4 thinking-flag backfill and fuzz marker accuracy (#1664)
OllamaBackendregistrar init made package-visible — fixes a cross-module registration call under the Ollama trait. (#1660)ModelLoadPlanreview fixes — if-let unwrap inModelLoadPlan,XCTSkipon network reclaim, and an explicitselfcapture. (#1667)- Backend conformance claim methods made parallel-safe (#1601)
- DX cleanups —
OllamaBackendregistrar warning,GenerationStreamAsyncSequenceconformance, and a thinking-token sample fix. (#1649)
v0.43.0
The P1 kernel-thinning pass completes its first three modules — ManifoldSecrets, ManifoldHardware, and ManifoldNetworking — each now a zero-dependency leaf product. The release also ships a configurable idle timeout for cloud/LAN backends and closes four resource-correctness bugs.
Highlights
P1 kernel thinning — ManifoldSecrets, ManifoldHardware, and ManifoldNetworking extracted — Three clusters of types that had no dependency on the inference kernel are now standalone zero-dependency SwiftPM products. ManifoldSecrets holds the Keychain service and Secure Enclave key manager (#1609); ManifoldHardware holds device-capability probes, GGUF readers, memory-pressure broadcast, and ModelLoadPlan (#1610); ManifoldNetworking holds all URLSession/SSE infrastructure (#1608). Existing import ManifoldInference consumers keep compiling without changes — the kernel shims each module via @_exported import.
Configurable idle timeout for cloud and LAN backends — GenerationConfig now accepts an idleTimeout: Duration? that fires when no SSE bytes arrive within the window, surfacing a GenerationError.streamTimeout instead of leaving the UI stalled on a slow or saturated model. The default is nil, preserving existing behaviour. (#1633)
var config = GenerationConfig()
config.idleTimeout = .seconds(30) // fires if prefill stalls for 30 s
try await runtime.send("Hello", config: config)Features
- Device-aware model recommendation UI — The model browser surfaces a ranked recommendation tailored to the current device's memory and compute profile, built on the
ModelFitScorerlayer introduced in v0.42.0.
Bug Fixes
SessionToolSourcetool dispatch — Tools advertised viaSessionToolSourcewere registered inToolRegistrybut never dispatched at call sites; the routing gap is closed. (#1620)- Grammar constraints corrupting thinking blocks — Applying a grammar constraint (JSON mode or BNF grammar) to a request with
enableThinking: trueinjected the grammar sampler into the thinking-block phase, producing malformed<think>output. Grammar constraints are now suppressed during thinking-token emission. (#1624) - Resource-correctness fixes (MLX, RAG, search, MCP) — Four separate bugs: the MLX backend
deinitdropped its strong reference before async cleanup finished; RAG document deletion left orphaned chunk records and crashed under concurrent access; embedding search exhausted memory on large corpora; MCP tool calls leaked the per-call timeout handle. (#1627) - SSE error message sanitization on all cloud paths — In-stream SSE error payloads were forwarded to the UI unsanitized on partial-stream paths.
CloudErrorSanitizeris now applied consistently across every cloud backend error surface. (#1628)