12 Jun 12:23

8204d95

Merlin v2.4.0 Latest

Latest

Merlin v2.4.0

Release gate status: gates #1-#16 completed.

This release publishes the v2.4.0 evidence-backed build. The attached evidence report summarizes the full release battery, and the repository now includes the public release notes and README screenshot assets on main.

Electronics/KiCad boundary: the electronics domain is released as evidence-gated workflow infrastructure. It includes deterministic KiCad generation, routing, DRC/SPICE/fab gates, and visual KiCad evidence. It is not a blanket fabrication-ready claim for every generated board; high-stakes signoff remains explicitly gated.

Screenshots

Documentation

Attached Assets

Release assets include REPORT.md, RELEASE-RUN.md, the Merlin UI screenshots, and the KiCad screenshots.

Assets 11

20 May 03:57

j-zuilkowski

v2.2.5

a3bbd3f

v2.2.5 — Repetition-stall escalation rung + E2E robustness

Patch release. The escalation feature shipped this round adds a new
capability-failure rung — EscalationReason.repetitionStall — that
detects a model emitting the same response verbatim (now including
identical tool-call signatures) across a 6-turn window and routes
straight to the designated stronger provider, skipping refinement. The
fingerprint is conservative: a productive model varies either its
narration or its tool-call args, so only a genuine loop trips it.

Five other defects fixed alongside, each caught by S1's end-to-end run:

EvalShell.run had no timeout. A transient filesystem stall once
hung the proving suite for 40 minutes; now bounded by a watchdog and
SIGKILLed on timeout.
LiveShellRunner deadlocked on its pipe (read after wait) AND had no
timeout, so xcodebuild test hung the critic for the full 1800 s
test window. Drains on a background queue with a 300 s deadline.
Fixture extraction no longer chdir's into ~/Documents via git -C;
uses --git-dir from a temp cwd, sidestepping the TCC getcwd wedge
on a freshly-rebuilt ad-hoc-signed test host.
cannotDecompose on a preflight overflow now routes only to a
provider whose usableInputTokens actually fits minContextRequired,
instead of the strongest capability target regardless of budget.
consecutiveCriticFailures bumps only when the escalation truly
gives up, not on every routed-provider retry exhaustion — fixes the
circuit-breaker double-counting.

Plus a documented local-signing strategy: MerlinTests-Live test
invocations now use the project's Merlin Dev Signing identity so the
macOS TCC Full Disk Access grant survives rebuilds; compile gates and
CI keep CODE_SIGNING_ALLOWED=NO as before. See CLAUDE.md and
merlin-eval/HANDOFF.md for the runbook.

Verified against the full proving suite: S1 passes legitimately in
1240 s (preflightOverflow → DeepSeek handoff fixes TaskBoard, its
xcodebuild test green at the end). All 1828 unit tests pass; both
schemes compile clean.

Assets 2

16 May 03:13

j-zuilkowski

v2.2.4

cb728bf

v2.2.4 — Context-overflow fix, tool detection, vision launchpad

Merlin v2.2.4

Summary

v2.2.4 makes the provider context-overflow class of failures structurally
impossible, adds first-use detection of missing external tools, lets you target a
specific loaded local model per role slot, and introduces vision.md as the first
artifact of the Project Discipline pipeline.

What's new

Context-overflow HTTP 400s are fixed at the source. Three layers, end to end:
tool output (run_shell, read_file) is capped before it can enter the model
context (phase 284); the per-request budget is discovered from the active model's
real context window — queried live for local runners and OpenRouter, learned from
the first 400 and persisted for commercial providers (phase 285); and every LLM
request on every engine path — planner, critic, subagents, summariser, memory,
KAG, vision — is sized to fit the provider window before it is sent, not just the
main turn loop (phase 286).
Local model picker. When a local runner has several models loaded, each can be
assigned to a role slot directly from the chat HUD and the slot picker (phase 283).
Missing-tool detection. When a feature needs an external CLI tool that is not
installed, Merlin detects it on first use and offers a one-click brew install for
the Homebrew-safe tools, or shows the install command/URL for the rest — instead of
a raw "command not found" (phase 287).
Vision launchpad. vision.md is now the first artifact of the discipline
pipeline — vision → architecture → phase → code. project:init seeds it,
project:adopt incorporates an existing one, project:revise grows and promotes
ideas from it (phase 288).

Internal changes

New types: ToolOutput, ContextBudgetResolver / ContextBudgetStore,
PreflightGuard, ToolRequirement / ToolRequirements / ToolRequirementChecker.
All 14 provider.complete send sites now route through PreflightGuard.
Learned context windows persist to ProviderConfig.budget in providers.json —
the same field a manually-entered budget uses.

Migration

None. No configuration changes are required; context-budget discovery and tool
detection are automatic.

Assets 2

15 May 21:43

j-zuilkowski

v2.2.3

38742a3

v2.2.3 — Built-in Skill Installation Fix

Merlin v2.2.3 — Built-in Skill Installation Fix

Released: 2026-05-15

Summary

v2.2.3 fixes built-in skill installation. The Merlin/Skills/Builtin/ directory is now
bundled inside the app, so a fresh install ships every skill and installs them to
~/.merlin/skills/ on first launch — on any machine, not just the machine the app was
built on.

What's new

All 13 built-in skills now ship inside the app bundle: the 8 core skills (commit,
debug, explain, plan, refactor, review, summarise, test) and the 5
project:* discipline skills (project:init, project:phase, project:revise,
project:release, project:adopt).
installBuiltinSkills() copies any missing skill to ~/.merlin/skills/ at launch;
skills already present — including ones you have customised — are left untouched.

Internal changes

project.yml adds Merlin/Skills/Builtin as a folder-reference resource on the
Merlin target, so the directory is copied into Merlin.app/Contents/Resources/Builtin/.
Previously the directory was excluded from the target and never bundled —
installBuiltinSkills() only resolved its input via a build-machine #filePath
fallback, so a distributed build installed no skills at all.
The 5 project:* SKILL.md files are now version-controlled in
Merlin/Skills/Builtin/ rather than living only in ~/.merlin/ and in phase files.

Migration

No user data migration required. installBuiltinSkills() skips any skill folder that
already exists in ~/.merlin/skills/, so existing and customised skills are preserved.

Assets 2

15 May 20:27

j-zuilkowski

v2.2.2

c651ba3

v2.2.2 — Project Discipline: CI Readiness & Regression Fixes

Merlin v2.2.2 — Project Discipline: CI Readiness & Regression Fixes

Released: 2026-05-15

Summary

v2.2.2 makes the v2.2 Project Discipline subsystem real and the test suite green on a
headless runner. It wires the discipline engine and pending-attention chip into the
running app, gates environment-dependent engine tests behind an opt-in so GitHub CI
passes, and fixes two genuine engine regressions found in code review. It also adds a
full external-dependency inventory.

What's new

The Project Discipline subsystem is now wired into the running app: DisciplineEngine
is constructed in AppState, the pending-attention chip/panel appear in ChatView,
the SessionStart hook surfaces findings, and a scan runs after each turn.
Live-environment test gate: engine tests that need a real LLM endpoint are gated
behind RUN_LIVE_TESTS=1 (skipUnlessLiveEnvironment()), so CI and headless sandboxes
run green; developers opt in for full coverage.
Requirements.md — a complete external-dependency inventory (toolchain, providers,
local runners, models, LoRA, KiCad, doc tools, services, MCP, frameworks) with a
source link for every dependency.

Internal changes

Fixed the pending-attention chip showing stale data — the view model now reads through
the shared DisciplineEngine instead of a separate queue instance.
Fixed an unbounded context-overrun retry: EscalationHandler now consumes its
per-turn budget on every escalation attempt, closing a loop that retried ~199 times
without a terminal event.
Fixed parseSteps silently dropping a planner step (and a downstream crash):
ComplexityTier now decodes high_stakes / highStakes / high-stakes and falls
back to .standard for unknown values.
Removed the dead TelemetryRecorder / TelemetrySink / TelemetryEmitter.sink test
seam; telemetry tests use the file-based resetForTesting / flushForTesting API via
a shared readTelemetryEvents(fromFile:) helper.
CI workflow: the build step now uses set -o pipefail so a failed build fails the job.

Migration

No user data migration required.
The v2.2.1 tag remains at the Phase 273b commit as an unreleased intermediate;
v2.2.2 is the published successor to v2.2.0.

Assets 2

15 May 20:27

j-zuilkowski

v2.2.0

ecbae6f

v2.2.0 — Project Discipline Subsystem

Merlin v2.2.0 — Project Discipline Subsystem

Released: 2026-05-14

What's New

Project Discipline Subsystem (v2.2.0) — 25 phase pairs (241a–265b) building the
construction-discipline layer directly into Merlin.

Adapter System (241–242)

AdapterRegistry + ProjectAdapter — per-language/per-toolchain configuration consumed
by every discipline component. Seed adapters for Swift/Xcode and Rust/Cargo.
.merlin/project.toml + ProjectConfigLoader — per-project adapter selection and
decaying-baseline configuration.

Phase Validation (243)

PhaseScanner — reads phases/ and cross-checks declared surfaces against the current
codebase. Four-colour drift report: green / yellow / red / orange.

Pending Attention Queue (244)

PendingAttentionQueue — persisted, deduplicated queue of discipline findings.
Finding, FindingCategory, Severity types.

DisciplineEngine (245)

DisciplineEngine actor — central coordinator. Runs all scanners, accumulates findings,
integrates with the hook engine. Circuit breaker: 3 consecutive failures disable the
engine for the session.

Hook Integration (246–248)

SessionStart hook event + system-reminder injection — top-3 findings surfaced at
session open.
UserPromptSubmit discipline check — flags unscoped feature requests without phase files.
GitHookInstaller — post-commit and pre-push hook installer / uninstaller.

Manual Coverage (249–250)

ManualCoverageScanner — enumerates user-facing surfaces via adapter regex patterns;
reads  doc blocks; returns gaps.
ManualBaselineManager + ManualSectionTemplateWriter — decaying baseline enforcement;
template section writer for uncovered surfaces.

Doc Reference Graph (251)

DocReferenceGraph automatic mode — greps doc files for symbol-shaped identifiers;
cross-checks against source symbol index; returns stale references.

API & Guide Generation (252–253)

APIDocGenerator — drives DocC (Swift) or rustdoc (Rust) for API doc regeneration.
DevGuideGenerator — regenerates mechanical sections of developer-guide.md from
the adapter; preserves prose outside  markers.

WHY-Comment Enforcement (254–255)

WhyCommentScanner — trigger-pattern scanning with ±3-line comment check.
rationale-not-needed: annotation suppresses individual triggers.
WHYCommentGate + OverrideAnnotationParser — pre-commit gate blocks on missing
WHY comments; parses override annotations.

Prose Readability (256–257)

ProseReadabilityChecker — Vale integration; dry-run mode for tests.
ValeStyleWriter — writes Merlin Vale style files (readability, accept, passive-voice,
weasel).
ProseGate — pre-commit gate blocks doc files exceeding target Flesch-Kincaid grade.

Override Audit (258)

OverrideAuditLog — JSONL override log; weekly review adds
overrideAuditAccumulation finding when any category exceeds 5 overrides/week.

Project Skills (259–263)

/project:init — scaffold a new project with full discipline support.
/project:phase — build an NNa/NNb phase pair with structured questioning.
/project:revise — scan for drift, present findings, apply patches.
/project:release — consolidated release gate with 14-check checklist.
/project:adopt — apply discipline to an existing project; first target: Merlin itself.

Discipline UI (264)

PendingAttentionViewModel — @MainActor ObservableObject backed by the queue.
PendingAttentionChipView — compact count chip in the chat toolbar.
PendingAttentionPanelView — expandable panel with per-finding dismiss affordances.

Known Issues

DocReferenceGraph automatic mode has a false-positive rate on short identifiers (< 4
characters). Mitigated by minimum length heuristic; explicit mode (future) will be more
precise.
ProseReadabilityChecker requires vale to be installed as a dev tool. Graceful
degradation: checker returns grade 0 (always passes) when vale is not found.
WhyCommentScanner does not yet scan Rust test files — restricted to *.swift and
*.rs in non-test directories.
Skill files (259–263) require the ~/.merlin/skills/ directory to be writable. On
sandboxed deployments the skills cannot be installed.

Upgrade Notes

From v2.1.0: No breaking changes to existing v2.1.0 APIs. The v2.2 subsystem is additive.

To activate the Project Discipline Subsystem on your project:

Run /project:adopt in a Merlin session with your project open.
Follow the adoption report recommendations.
Run /project:revise to start working through the backlog.

The discipline subsystem is opt-in at the project level (.merlin/project.toml must exist).
Sessions on projects without .merlin/project.toml are unaffected.

Build number: 17 (was 16 in v2.1.0)

Assets 2

14 May 22:52

j-zuilkowski

v2.1.0

f4c8332

v2.1.0 — Budget-Aware Execution

Release v2.1.0 - Budget-Aware Execution

Summary

Budget-Aware Execution. Merlin now sizes every request to the active provider's input window,
decomposes oversized work, and stops cleanly on unrecoverable overflow. Works regardless of
provider/model/context.

What's new

Per-provider ProviderBudget registered as configuration data.
Pre-flight estimator gates every LLM call.
Working-set caps for system prompt, RAG, recent turns, and tool bursts.
Adaptive RAG injection sized to the active budget.
Enriched PlanStep with token budget, success criteria, critic mode, and minimum context.
PlannerEngine.refineStep(...) as the single decomposition entry point.
EscalationHandler as the single bounded retry and escalation policy. No recursion anywhere.
Critic gating by skill frontmatter, per-step policy, and deterministic short-circuit.
Decompose-first overflow handling with cross-provider routing as the last-resort fallback.
New telemetry: engine.preflight.*, engine.escalation.*, planner.refine.*,
engine.rag.selected, critic.stage1.short_circuit.

Internal changes

PlanStep.successCriteria now uses [StepCriterion]. The decoder still accepts the legacy
single-string form, so existing serialized plans continue to load.
AgenticEngine no longer uses contextLengthRetryCount, maxContextOverrunRecoveryAttempts,
or contextOverrunRecoveryDirective. Recovery now flows through EscalationHandler.
New .cleanStop case on AgentEvent. Existing UI consumers can keep falling through to the
.systemNote rendering path until a dedicated affordance ships.

Migration

Existing skills without critic: frontmatter continue to use the heuristic unchanged.
Existing config without ProviderBudget falls through to the conservative default
(maxInputTokens: 32_000, reservedOutputTokens: 4_096).
No user data migration is required.

Assets 2

14 May 15:15

j-zuilkowski

v2.0.0

dc1452d

v2.0.0 — Electronics Domain, Multi-Domain Sessions, Memory Backend

Merlin 2.0.0

New in this release

Electronics / KiCad Domain — full electronics workflow via merlin-kicad-mcp: schematic ingestion, KiCad project generation, FreeRouting autoroute, ERC/DRC/SPICE/fab verification gates, BOM and order workflows. High-stakes signoff boundaries block irreversible manufacturing actions.
Multi-Domain Sessions — activate multiple domains simultaneously (e.g. software + electronics); DomainRegistry scopes tool sets and task types per session.
Local Memory Backend — project-scoped vector search via MemoryBackendPlugin with search(query:topK:projectPath:) overload.
Session Hardening — LiveSession.lifecycleTasks startup sequence, isClosed double-teardown guard, AuthMemory chmod 0600.
Provider Reliability — per-provider ephemeral URLSession, 4-attempt retry with 5/10/20s backoff, context-length auto-recovery.

Bug fixes (phases 219b / 220b / 221b)

ContextLengthRecoveryTests: fixed wrong systemNote format check and case-sensitivity issues.
MCPHTTPTransport: JSON decode errors now throw typed MCPTransportError.decodeError instead of escaping as raw NSError.
MCPSSETransportTests: fixed raw-string-literal \n syntax bug.
DomainRegistry.taskTypes(): now mirrors activeDomain() non-software preference; fixed test inconsistency.

Assets 2

13 May 15:30

j-zuilkowski

v1.9.1

d486b68

v1.9.1 — Native tool call collapse, window resize fix

v1.9.1 — Native tool call collapse, resize fix, prompt compression

UI fixes

Tool call rows now use native <details>/<summary> HTML elements — no JavaScript onclick handlers, arrow indicator via CSS ::before
Fixed duplicate bubble bug: removed addMessage fallback from appendChunk JS that created phantom second bubbles during streaming
Fixed window resize reflow: dispatches JS resize event on WKWebView frame change; added width: 100% to CSS body
Fixed content order: tool groups render above assistant text in the bubble

Prompt compression (three-layer, phases 205–207)

Mid-loop compaction — ContextManager tracks tokens after every tool result; compacts automatically at 40,000 tokens inside the execute loop (before the next LLM call) to keep per-turn cost linear
LLM summarisation — mid-loop compaction now calls the active provider once to produce a short narrative digest of removed exchanges rather than inserting a static truncation marker
Instruction distillation — compact built-in core system prompt (~80 tokens vs ~350); optional CLAUDE.md compression via Settings → Agent → Prompt Compression (cached on SHA256 hash, re-distils only on file change)

Config

Enable CLAUDE.md distillation: prompt_compression_enabled = true in ~/.merlin/config.toml or Settings → Agent → Prompt Compression.

Assets 3

11 May 18:10

j-zuilkowski

v1.9.0

e2ad5f4

v1.9.0 — Performance optimizations

What's new

Stable system prompt prefix cache — The stable portion of the system prompt is now cached and reused across loop iterations. llama.cpp's KV prefix cache gets a consistent byte-identical prefix every turn, eliminating redundant prefill work. Invalidates automatically when CLAUDE.md, memories, standing instructions, permission mode, or working directory change.

Async batch tool dispatch — All tool calls from a single LLM response are now dispatched in one parallel batch via ToolRouter's existing TaskGroup, rather than sequentially one at a time. Reading 4 files now takes the time of 1.

Parallel worker execution — spawn_agent calls in one response now launch all subagents concurrently instead of sequentially. PlannerEngine now annotates plan steps with parallel_safe, and independent steps are grouped into parallel batches rather than forced into sequential continuation turns.

Assets 2

Releases: j-zuilkowski/merlin

Merlin v2.4.0

Merlin v2.4.0

Screenshots

Documentation

Attached Assets

Uh oh!

v2.2.5 — Repetition-stall escalation rung + E2E robustness

Uh oh!

v2.2.4 — Context-overflow fix, tool detection, vision launchpad

Merlin v2.2.4

Summary

What's new

Internal changes

Migration

Uh oh!

v2.2.3 — Built-in Skill Installation Fix

Merlin v2.2.3 — Built-in Skill Installation Fix

Summary

What's new

Internal changes

Migration

Uh oh!

v2.2.2 — Project Discipline: CI Readiness & Regression Fixes

Merlin v2.2.2 — Project Discipline: CI Readiness & Regression Fixes

Summary

What's new

Internal changes

Migration

Uh oh!

v2.2.0 — Project Discipline Subsystem

Merlin v2.2.0 — Project Discipline Subsystem

What's New

Adapter System (241–242)

Phase Validation (243)

Pending Attention Queue (244)

DisciplineEngine (245)

Hook Integration (246–248)

Manual Coverage (249–250)

Doc Reference Graph (251)

API & Guide Generation (252–253)

WHY-Comment Enforcement (254–255)

Prose Readability (256–257)

Override Audit (258)

Project Skills (259–263)

Discipline UI (264)

Known Issues

Upgrade Notes

Uh oh!

v2.1.0 — Budget-Aware Execution

Release v2.1.0 - Budget-Aware Execution

Summary

What's new

Internal changes

Migration

Uh oh!

v2.0.0 — Electronics Domain, Multi-Domain Sessions, Memory Backend

Merlin 2.0.0

New in this release

Bug fixes (phases 219b / 220b / 221b)

Uh oh!

v1.9.1 — Native tool call collapse, window resize fix

v1.9.1 — Native tool call collapse, resize fix, prompt compression

UI fixes

Prompt compression (three-layer, phases 205–207)

Config

Uh oh!

v1.9.0 — Performance optimizations

What's new

Uh oh!