Releases: peteromallet/desloppify
v0.9.14
This release overhauls the plan/execute lifecycle — consolidating phase derivation into a single canonical function, fixing a force-rescan bug that re-queued completed subjective reviews, and simplifying the internal state machine from fine-grained phase names down to just plan and execute. Also fixes stale dashboard counts, graph normalization sampling, and Windows UTF-8 encoding issues.
64 files changed | 11 commits | 5,660 tests passing
Headline Feature
Lifecycle Consolidation
The plan/execute lifecycle — the state machine that decides whether you're planning work or executing it — has been significantly refactored for clarity and correctness:
- Shared phase derivation — both the reconciliation pipeline and the work queue snapshot now delegate to a single
derive_display_phase()pure function with a documented priority chain. Previously, two independent implementations had to be kept manually in sync. - Pure reader —
current_lifecycle_phase()no longer mutates plan data on read. Legacy phase name migration now runs once at plan-load time. - Marker invariants documented — the three scan-count markers that drive lifecycle transitions (
lifecycle_phase,postflight_scan_completed_at_scan_count,subjective_review_completed_at_scan_count) are now documented with valid values, transitions, and single-writer functions. - No more bypass paths — snapshot phase resolution now routes all derivation through the shared function with no short-circuit returns.
This was driven by a recurring class of bugs where lifecycle markers got out of sync — most recently, force-rescan re-queuing completed subjective reviews.
Other Features
Score Checkpoint with Sparkline
plan_checkpoint progression events now include a sparkline showing score trajectory across checkpoints. This makes it easier to see at a glance whether the score is trending up or plateauing.
Simplified User-Facing Lifecycle
Users now see just "plan mode" and "execute mode" instead of internal phase names like workflow_postflight or triage_postflight. The communicate-score workflow step auto-resolves when no prior baseline exists, eliminating a confusing manual step on first use.
Bug Fixes
- Force-rescan no longer resets subjective reviews — When
--force-rescanran during plan mode, the scan count increment causedsubjective_review_completed_at_scan_countto go stale, re-queuing all 20 subjective reviews. The newcarry_forward_subjective_review()promotes the marker when the old review matches the cycle being replaced. - Stale focus counts —
status,next, andscancommands now show current focus counts instead of stale cached values. Closes #503. - Graph normalization sampling —
check_all_graph_keysnow inspects all keys for normalization, not just the first 3. Previously, graphs with only late-position abnormal keys could pass validation. Closes #502. - UTF-8 encoding for external tool reports —
read_text()calls injscpd_adapter.py,complexity.py, andtest_coverage/io.pynow specifyencoding="utf-8"explicitly. On Windows, the system codepage default (cp1252) would crash when reports contained non-ASCII characters. Closes #505, reported by @pietrondo. - Tree-sitter CI stability — Spec tests now skip gracefully when grammar files aren't available, instead of failing the entire suite.
Refactoring & Internal
- Single-writer lifecycle enforcement — eliminated side-channel phase writes that could put the lifecycle into inconsistent states.
- Legacy phase name removal — all fine-grained persisted phase names (
review_initial,assessment_postflight,workflow_postflight, etc.) are now migrated to coarseplan/executemodes at load time. - Snapshot signal shaping — mode-aware suppression of postflight signals (assessment/workflow/triage/review) now lives in the caller (
_phase_for_snapshot) rather than inside the shared delegation layer, keeping_derive_display_phaseas a pure items-to-bools mapper.
Community
Thanks to @pietrondo for reporting the Windows UTF-8 encoding issue (#505).
v0.9.13
This release bundles all 11 skill overlays for one-step global install via desloppify setup, and fixes false positives across three detectors — orphaned, hardcoded_secret_name, and cycles — based on community reports.
31 files changed | 7 commits | 6,539 tests passing
Headline Feature
Bundled Skill Overlays with desloppify setup
All 11 agent skill files (Claude, Cursor, Copilot, Codex, Droid, Windsurf, Gemini, AMP, Hermes, OpenCode, Skill) are now bundled in the package and installed globally via desloppify setup. No network access needed — the command copies bundled docs to the right locations so agents discover desloppify across all projects. A pre-commit hook and make sync-docs target keep bundled copies in sync with docs/.
The README now points to desloppify setup as the primary install path, with update-skill as a per-project fallback.
Bug Fixes
- Orphaned detector false positives — Files with
__all__exports are now recognized as intentional public API surfaces and excluded from orphan detection. Closes #496, reported by @Git-on-my-level. - hardcoded_secret_name false positives — Added entropy heuristic to filter non-secret values: field name constants (
token_usage), sentinel strings, and label prefixes (agent_workspace@) are no longer flagged. Closes #496. - Cycles detector TYPE_CHECKING false positives — Imports inside
if TYPE_CHECKING:blocks are now marked as deferred and excluded from cycle detection. From @Git-on-my-level's comment on #496. - Assessment crash on corrupted state —
store_assessmentsnow guards against raw int values in state withisinstancecheck before calling.get(). Closes #465, reported by @Vuk97. - fix_debug_logs negative-index corruption — Added lower-bound guard (
start < 0) matching the pattern in all sibling fixers. Prevents silent wrong-line edits whenentry["line"]is 0. Cherry-picked from PR #499 by @cpjet64. - .gitignore CLAUDE.md/AGENTS.md scope — Patterns now anchored to repo root (
/CLAUDE.md) so subdirectory copies aren't ignored.
Refactoring & Internal
- Review pipeline improvements — Added bias-to-action guidance: confirmed bugs get implemented immediately instead of deferred. Stage 3 now collects open questions for the maintainer instead of guessing. Hard rule: decisions are presented for approval before execution.
Community
Thanks to @Git-on-my-level for the detailed false-positive reports across three detectors (#496), @Vuk97 for reproducing the assessment crash (#465) and language detection bug (#466), and @cpjet64 for the fix_debug_logs guard (PR #499).
v0.9.12
This release adds the strategist triage role for trend-aware strategic oversight, a desloppify setup command for one-step global skill installation, and a cluster of fixes that finally get stale subjective reviews unstuck from queue ordering issues. It also includes community-contributed fixes for Python src-layout projects, Windows encoding, and Knip hangs.
149 files changed | 21 commits | 6,485 tests passing
Headline Feature
Strategist Triage Role
The triage pipeline gains a new "strategist" stage that acts as a CEO-level overseer for the cleanup cycle. Instead of blindly trusting agent-reported trends, the strategist cross-checks score and debt trends against computed trajectory data, overriding mismatches. It can create high-priority strategy:: work items that jump to the front of the queue, and it supports an explicit confirmation gate (like other triage stages) so humans can review strategic assessments before they take effect.
The underlying ScoreTrajectory now tracks all-time highs from full scan history (not just the 5-scan window), introduces a "recovering" trend for scores that are improving but haven't reached their previous peak, and detects cross-cycle regression where a plateau masks a post-reset decline.
Other Features
desloppify setup Command
New command: pip install desloppify && desloppify setup. Copies bundled skill definitions to ~/.claude/ and ~/.cursor/ without network access, so agents discover desloppify globally across all projects. Per-project installs remain via update-skill.
Subjective Anti-Gaming Policy Disabled
The integrity policy that zeroed dimension scores when they converged near the target was producing false positives — blind-packet subagent reviews have no way to anchor to target scores, so legitimate convergence was being penalized. The policy now passes assessments through unchanged. It can be re-enabled later if a better detection strategy emerges.
Bug Fixes
- Stale subjective reviews blocked by queue state — Three related fixes ensure stale reviews are always detected and injected regardless of whether mechanical items remain in the queue. Previously, the
live_planned_queue_emptyguard blocked all reconciliation when objective items existed, leaving review dimensions perpetually stale. - Phase ordering for stale reviews — Stale subjective reviews now take priority over non-critical workflows (communicate score) and triage items. Pre-review workflows (deferred disposition, run scan, import scores) still jump ahead of everything.
- Force-rescan stale review injection — Force-rescan now correctly injects stale reviews by bypassing both queue-empty guards and the cycle-just-completed deferral logic. Also adds
_refresh_plan_start_baseline()that reseeds scores without clearing workflow sentinels. - Clusters not marked done — Clusters stayed
activeeven when all their items were resolved, causing completed work to reappear after rescan. Now setsexecution_status="done"on completion and sweeps active clusters during reconciliation. - Resolved items superseded from clusters —
_supersede_dead_referenceswas stripping resolved/fixed/wontfix items from clusters, making completed clusters appear incomplete. Now only supersedes items that are truly gone from state. - Python src-layout test coverage —
resolve_import_specnow triessrc/-prefixed candidates, and module name computation strips thesrc/prefix. Fixes false "transitive_only" reports for PEP 621 src-layout projects. Cherry-picked from PR #489 by @AreboursTLS. - Python multi-line import regex —
PY_IMPORT_REnow handles parenthesized imports (from pkg import (\n name, ...)), fixing the root cause of false transitive-only coverage reports. - Test coverage graph supplementation — Source parsing now always runs as a supplement to the import graph, catching submodule imports the graph resolves to
__init__.pyinstead of the actual file. - Knip hang on missing dependency — Added
stdin=subprocess.DEVNULLand--yesflag to prevent npx from blocking on interactive prompts when Knip is not a local dependency. Closes #494, reported by @goobsnake. - Windows UTF-8 encoding in review runner — Explicit
encoding="utf-8", errors="replace"on all file reads in the review pipeline. Prevents charmap decode errors when Codex runners emit UTF-8 on Windows. Cherry-picked from PR #495 by @pietrondo. - Force-rescan queue-empty guard bypass —
reconcile_plan()was double-guarded by queue emptiness, preventing stale review injection when any objective items remained.
Refactoring & Internal
- Skill doc improvements — Tightened skill descriptions to reduce false activations on generic programming questions. Added explicit "run next after scan" instruction so agents follow the tool's workflow instead of interpreting scan output themselves.
- Review pipeline results — Stage 1/2/3 assessments for PRs #495, #493, #489, #189 and issues #494-#490. Backfilled Stage 2 files for older items.
- Setup command scope trim — Removed
--localmode and global skill discovery integration from the initial implementation, keeping the command focused on a single responsibility.
Community
Thanks to @AreboursTLS for the Python src-layout test coverage fix (PR #489), @pietrondo for the Windows UTF-8 encoding fix (PR #495), and @goobsnake for reporting the Knip hang issue (#494).
v0.9.11
This release adds the progression log — an append-only lifecycle event timeline that gives AI agents persistent memory across cycles — along with a batch of bug fixes from the community covering cross-platform issues, regex safety, serialization crashes, and triage quality.
62 files changed | 18 commits | 5,495 tests passing
Progression Log
The biggest addition in this release is .desloppify/progression.jsonl — an append-only event log that records lifecycle boundary events as they happen. Each line is a self-contained JSON object with a discriminated event_type, timestamps, scores, and a structured payload.
The problem this solves: desloppify's lifecycle is a loop (execute → scan → review → triage → execute), and until now there was no persistent record of what happened at each boundary. The scan history had the last 20 scans, the execution log had plan actions, and query.json was ephemeral. A CEO agent guiding triage had no way to answer "what improved since last cycle?" without reconstructing it from scattered sources.
The progression log records 7 boundary events:
scan_preflight— gate decision (allowed/blocked/bypassed) with queue statescan_complete— scores, dimension deltas, scan diff, execution summary with resolved/skipped IDs, suppression metricspostflight_scan_completed— scan marker flip for the current cyclesubjective_review_completed— reviewer observations: which dimensions were covered, evidence summaries, new review issue IDs and summaries, import provenancetriage_complete— strategy summary, cluster names and theses, verdict counts, organized/total countsentered_planning_mode— phase transition into any planning phase, with triggerexecution_drain— queue drained via resolve or workflow resolve, with scores at drain
Key design decisions:
- Timestamps as join keys — events carry enough summary for quick reads, but timestamps let a reader query
state.jsonand the plan'sexecution_logfor full detail. The progression log is a timeline index, not a data copy. - Idempotent marker triggers — events fire on
mark_postflight_scan_completed()andmark_subjective_review_completed()returning True. These are idempotent per scan_count, so no double-fire. - Best-effort, never break parent — all hooks are wrapped in try/except. Advisory file locking with 2s timeout. On lock failure, appends without the lock and logs a warning.
- Corruption-resilient reads — corrupt JSONL lines are skipped with a warning; the file is never erased. Periodic trim keeps it under 2000 lines.
Triage: Stop Obsessing Over Test Coverage
Multiple users reported the tool pushes test writing over actual code cleanup. The root cause: triage LLMs promote test_coverage clusters because they appear first (sorted by issue count) with no guidance to deprioritize.
Changes:
- Triage prompt now explicitly says: clean up code quality BEFORE test coverage — writing tests for sloppy code locks in the slop
- Added
deferaction type for auto-clusters (keeps in backlog for later cycles) - Scan coaching and catalog guidance reworded to "review coverage gaps" instead of "add tests"
Bug Fixes
- Phantom cluster membership — action_step refs are traceability metadata, not membership. Merging them into
issue_idscaused bare shorthand IDs from triage runners to become phantom cluster members that reappear after every reconcile cycle. - Update-skill duplicate detection — raises
CommandErrorwhen skill content is already present but begin/end markers are missing, preventing silent duplicate appends. - Generic fixer crash — four autofix pipeline sites assumed FixResult entries always have a
removedkey. Generic fixers (e.g., eslint-warning) return{file, line}or{file, fixed}without it. Cherry-picked from PR #484 by @AugusteBalas. - JSON serialization crash —
EcosystemFrameworkDetectiondataclass instances leaked intoreview_cachevia shared dict references, causingTypeErroron state serialization. Adds a dataclass handler tojson_default. Bug identified by @0-CYBERDYNE-SYSTEMS-0 in PR #486. - Synthetic ID deferred skip loop — synthetic queue IDs (
workflow::*,triage::*) in the skipped dict caused phantom deferred-disposition items. Cherry-picked from PR #485 by @ryexLLC. - Dart regex catastrophic backtracking — the annotation sub-pattern had overlapping whitespace consumption between a character class and trailing
\s*, wrapped in()*, causing exponential backtracking. Possessive quantifiers prevent it. Cherry-picked from PR #477 by @AvoMandjian. - Framework cache state bloat — framework detection wrote dataclass objects into
review_cache, which shares a dict reference with persisted state. Newruntime_cachefield separates ephemeral scan-scoped data from persisted review state. Cherry-picked from PR #483 by @maciej-trebacz. - Zone reclassification stale issues — when zone rules change (e.g., adding JS test patterns), existing open issues for reclassified files now auto-resolve instead of persisting forever. Bug identified by @claytona500 in PR #478.
- macOS SSL for update-skill — uses certifi CA bundle instead of the system cert store, which on Homebrew Python often has no CA certificates. Closes #468, reported by @Vuk97.
- Windows font fallbacks — scorecard image generator now has Consolas, Georgia, Segoe UI, and Arial as Windows fallbacks instead of falling through to Pillow's bitmap font.
- Windows .exe process execution —
_resolve_executable()no longer wraps.exebinaries incmd /c, which caused\"escaping errors when prompts contain spaces. Closes #487, reported by @Dteyn. - Snyk false positive — renamed
session_tokenplaceholder tosession_hmacin review JSON examples to avoid Snyk W007 credential-detection heuristic. Closes #473, reported by @mark-major.
R Code Smell Detectors
Contributed by @sims1253 (PR #450). Ten R-specific smell checks detecting common anti-patterns: setwd(), <<- global assignment, attach(), rm(list=ls()), debug leftovers, T/F ambiguity, 1:n() off-by-one risk, deprecated stringsAsFactors, and library() inside functions. The library_in_function check uses tree-sitter when available with a regex fallback.
Also extends the generic language framework with custom_phases support — any generic plugin can now inject language-specific detector phases without converting to a full plugin. 22 tests covering detection, false positive suppression, and edge cases.
Documentation
- Work queue README — explains how items flow from scan to execution queue, why
test_coveragedominates pre-triage ordering, that tier is display-only metadata, and the full sort order with filter chain - Main README process overview — new "How it works" section explaining the scan → score → review → triage → execute loop
- Docs reorganization — internal documentation (
DEVELOPMENT_PHILOSOPHY,QUEUE_LIFECYCLE,ci_plan) and release infrastructure moved fromdocs/todev/. Website separated as its own repo.
Community
Cherry-picks and bug reports from @AugusteBalas (generic fixer crash), @AvoMandjian (Dart regex DoS), @maciej-trebacz (framework cache bloat), @ryexLLC (synthetic ID loop), @0-CYBERDYNE-SYSTEMS-0 (dataclass serialization), @claytona500 (zone reclassification), @Vuk97 (macOS SSL), @Dteyn (Windows .exe execution), and @mark-major (Snyk false positive).
@sims1253 continues to drive R language support forward — this is the fifth R-focused PR across the last few releases, and the custom_phases framework extension benefits every language plugin.
Thank you all — every report and PR made this release better.
v0.9.10
This release adds experimental Hermes Agent integration for fully autonomous cleanup loops, framework-aware detection with a full Next.js spec, SCSS language support, significant R language improvements, and a scan performance boost from detector prefetch + caching — alongside a batch of bug fixes from the community.
152 files changed | 54 commits | 5,466 tests passing
Hermes Agent Integration (Experimental)
We've been exploring what it looks like when a codebase health tool can actually drive an AI agent — not just generate reports, but orchestrate the entire cleanup loop autonomously. This release ships our first experimental integration with Hermes Agent.
The core idea: desloppify already knows what needs to be done (scan, triage, review, fix). Instead of printing instructions for a human, it can now tell the agent directly — switch to a cheap model for mechanical fixes, switch to an expensive one for architectural review, reset context between tasks, and keep the agent working via /autoreply, all without a human in the loop.
What the integration enables:
- Autonomous review loops — desloppify orchestrates blind reviews via
delegate_tasksubagents (up to 3 concurrent), no human needed - Model switching at phase boundaries — cheap models for execution, expensive for planning/review, switched automatically
- Context management — automatic resets between tasks to keep the agent focused on long sessions
- Lifecycle transitions — desloppify tells Hermes what to do next via the Control API
How to try it
This requires the Control API branch of Hermes (NousResearch/hermes-agent#1508), which hasn't been merged upstream yet. Without it, Hermes works as a normal harness but can't do autonomous model switching or self-prompting.
Copy-paste this into your agent (it handles both fresh installs and existing Hermes setups):
Install desloppify with Hermes Control API support. The Control API lives on a fork branch until NousResearch/hermes-agent#1508 merges upstream.
1. If Hermes is already installed (check with `which hermes`), switch it to the Control API branch:
cd $(pip show hermes-agent 2>/dev/null | grep -i location | cut -d' ' -f2)/hermes-agent
git remote add peteromallet https://github.com/peteromallet/hermes-agent.git 2>/dev/null || true
git fetch peteromallet
git checkout peteromallet/feat/control-api-autoreply
pip install -e .
IMPORTANT: If Hermes is currently running, tell the user they need to restart it with `HERMES_CONTROL_API=1 hermes` after this step — the new branch won't take effect until they do.
If Hermes is NOT installed, clone fresh:
git clone -b feat/control-api-autoreply https://github.com/peteromallet/hermes-agent.git
cd hermes-agent
pip install -e .
2. Install desloppify and set up the skill doc:
pip install desloppify[full]
cd /path/to/your/project
desloppify update-skill hermes
3. Start Hermes with the Control API enabled:
cd /path/to/your/project
HERMES_CONTROL_API=1 hermes
4. In the Hermes session, run:
Run desloppify scan, then follow its coaching output to clean up the codebase.
Desloppify will guide Hermes through the full lifecycle — scanning, triaging, blind reviews with subagents, and fixing. It switches models and resets context automatically at phase boundaries.
This is experimental and we're iterating fast. We'd love feedback on the approach, rough edges, and what you'd want to see next. If you try it, please open an issue — every report helps.
Framework-Aware Detection
Massive contribution from @MacHatter1 (PR #414). A new FrameworkSpec abstraction layer for framework-specific detection, shipping with a full Next.js spec that understands App Router conventions, server components, use client/use server directives, and Next.js-specific lint rules. This means dramatically fewer false positives when scanning Next.js projects — framework idioms are recognized, not flagged. The spec system is extensible, so adding support for other frameworks (Remix, SvelteKit, etc.) is now a matter of writing a spec, not changing the engine.
SCSS Language Plugin
Thanks to @klausagnoletti for adding SCSS/Sass support via stylelint integration (PR #428). Detects code smells, unused variables, and style issues in .scss and .sass files. @klausagnoletti has also submitted a follow-up PR (#452) with bug fixes, tests, and honest documentation — expected to land shortly after release.
Plugin Tests, Docs, and Ruby Improvements
@klausagnoletti also contributed across multiple language plugins:
- Ruby plugin improvements (PR #462) — expanded exclusions, detect markers (
Gemfile,Rakefile,.ruby-version,*.gemspec),default_src="lib",spec/+test/support, and 13 wiring tests. Also addsexternal_test_dirsandtest_file_extensionsparams to the generic plugin framework. - JavaScript plugin tests + README (PR #458) — 12 sanity tests covering ESLint integration, command construction, fixer registration, and output parsing.
- Python plugin README (PR #459) — user-facing documentation covering phases, requirements, and usage.
R Language Improvements
@sims1253 has been steadily building out R support and contributed four PRs to this release:
- Jarl linter with autofix support (PR #425) — adds a fast R linter as an alternative to lintr
- Shell quote escaping fix for lintr commands (PR #424) — prevents command injection on paths with special characters
- Tree-sitter query improvements (PR #449) — captures anonymous functions in
lapply/sapplycalls andpkg::fnnamespace imports - Factory Droid harness support (PR #451) — adds Droid as a new skill target, following the existing harness pattern exactly
Scan Performance: Detector Prefetch + Cache
Another big one from @MacHatter1 (PR #432). Cold and full scan times reduced significantly. Detectors now prefetch file contents and cache results across detection phases, avoiding redundant I/O. On large codebases this is a noticeable improvement.
Lifecycle & Triage
- Lifecycle transition messages — the tool now tells agents what phase they're in and what to do next, with structured directives for each transition
- Unified triage pipeline with step detail display
- Staged triage now requires explicit decisions for auto-clusters before proceeding — no more accidentally skipping triage steps
Bug Fixes
- Binding-aware unused import detection for JS/TS — @MacHatter1 (PR #433). No longer flags imports used via destructuring,
asrenames, or re-export patterns. This was a significant source of false positives in real JS/TS projects. - Rust dep graph hangs — @fluffypony (PR #429). String literals that look like import paths (e.g.,
"path/to/thing") no longer cause the dependency graph builder to hang. @fluffypony also contributed Rust inline-test filtering (PR #440), which prevents#[cfg(test)]diagnostic noise from inflating production debt scores. - Project root detection (PR #439) — fixed cases where the project root was derived incorrectly, plus force-rescan now properly wipes stale plan data, and manual clusters are visible in triage.
- workflow::create-plan re-injection — @cdunda-perchwell (PR #435). Resolved workflow items no longer reappear in the execution queue after reconciliation. @cdunda-perchwell also identified the related communicate-score cycle-boundary sentinel issue (#447, fix in PR #448).
- PHPStan parser fixes — @nickperkins (PR #420). stderr output and malformed JSON from PHPStan no longer crash the parser. Clean, focused fix.
- Preserve plan_start_scores during force-rescan — manual clusters are no longer wiped when force-rescanning.
- Import run project root —
--scan-after-importnow derives the project root correctly from the state file path. - Windows codex runner (PR #453) — proper
cmd /cargument quoting + UTF-8 log encoding for Windows. Reported by @DenysAshikhin. - Scan after queue drain (PR #454) —
score_display_modenow returns LIVE when queue is empty, fixing the UX contradiction wherenextsays "run scan" but scan refuses. Reported by @kgelpes. - SKILL.md cleanup (PR #455) — removes unsupported
allowed-toolsfrontmatter, fixes batch naming inconsistency (.raw.txtnot.json), adds pip fallback alongside uvx. Three issues all reported by @willfrey. - Batch retry coverage gate (PR #456) — partial retries now bypass the full-coverage requirement instead of being rejected. Reported by @imetandy.
- R anonymous function extraction (PR #461) — the tree-sitter anonymous function pattern from PR #449 now actually works (extractor handles missing
@namecapture with<anonymous>fallback).
Community
This release wouldn't exist without the community. Seriously — thank you all.
@MacHatter1 delivered three major PRs (framework-aware detection, detector prefetch + cache, binding-aware unused imports) that each individually would have been a headline feature. The framework spec system in particular opens up a whole new category of detection accuracy.
@fluffypony contributed both the Rust dep graph hang fix and the inline-test filtering — the latter being 1,000+ lines of carefully tested Rust syntax parsing with conservative cfg predicate handling and thorough edge-case coverage.
@sims1253 has been the driving force behind R language support, with four PRs spanning linting, tree-sitter queries, and harness support. The R plugin is becoming genuinely useful thanks to this sustained effort.
@klausagnoletti added SCSS support, imp...
v0.9.9
This release focuses on plan lifecycle robustness — fixing workflow deadlocks, auto-resolving stale issues, hardening the reconciliation pipeline, and replacing heuristics with explicit cluster semantics. It also includes C++ detector scoping improvements from a community contributor and several UX fixes that prevent agents from getting stuck mid-cycle.
366 files changed | 16 commits | 5,367 tests passing
Refactoring & Internal Cleanup
This release continues the pattern of tightening seams and reducing indirection across the codebase. Over half the 366 changed files are internal restructuring:
- Cluster and override → subpackages —
cluster_ops_display.py,cluster_ops_manage.py,cluster_ops_reorder.py,cluster_update.py, andcluster_steps.pymoved into acluster/subpackage. Same treatment foroverride_io.py,override_misc.py,override_skip.py, andoverride_resolve_*intooverride/. - Holistic cluster accessors inlined — ~8 small wrapper files in
context_holistic/deleted (_clusters_complexity.py,_clusters_consistency.py,_clusters_dependency.py,_clusters_security.py, etc.) and inlined into their callers - Plan sync pipeline extracted — new
sync/pipeline.pyandsync/phase_cleanup.pypulled out of the monolithic workflow, withreconcile.pyrenamed toscan_issue_reconcile.pyand review import reconcile moved intosync/review_import.py - Issue semantics centralized — new
issue_semantics.py(~225 lines) consolidating classification logic that was previously scattered across multiple modules - Plan reconcile simplified —
scan/plan_reconcile.pycut from ~470 lines to ~200 by extracting shared logic into the engine layer - Work queue snapshot overhaul —
snapshot.pygained ~470 lines of phase-aware partitioning and ranking refinements, replacing ad-hoc ordering logic - TS dead code removed —
helpers_blocks.pyandhelpers_line_state.pydeleted (~200 lines of unused smell detection helpers) - Broad type/schema updates — issue type references and state schema types updated across 130+ files for consistency with the new issue semantics
Auto-Resolve Issues for Deleted Files
When a scan runs and a previously-flagged file no longer exists on disk, its open issues are now automatically set to auto_resolved with a clear note. Previously, issues for deleted files would remain open and pollute the work queue indefinitely — particularly painful in Rust projects where module reorganization is common. Closes #412.
Triage Deadlock Fix
Fixed a deadlock where triage was stale (new review issues arrived mid-cycle), but triage couldn't start because objective backlog was still open, and objective resolves were blocked because triage was stale. The fix detects this "pending behind objective backlog" state and allows objective work to continue while keeping review resolves gated. The banner now shows TRIAGE PENDING instead of nudging toward a triage command that can't run yet. Community contribution from @imetandy (#413).
Batch Runner Stall Detection Fix
The review batch runner's stall detector was prematurely killing codex batches during their initialization phase — before any output file was written. This caused --import-run to fail with "missing result files for batches" errors. The stall detector now never declares a stall when no output file exists yet, while the hard timeout still catches truly hung batches. Closes #417 and #401.
Sequential Reconciliation Pipeline
Fixes a cluster tracker race condition on parallel updates. A new shared reconciliation pipeline runs all sync steps sequentially: subjective dimensions, auto-clustering, score communication, plan creation, triage, and lifecycle phase. This replaces the previous approach where parallel operations could produce inconsistent plan state.
Explicit Cluster Semantics
Clusters now carry explicit action_type (auto_fix, refactor, manual_fix, reorganize) and execution_policy (ephemeral_autopromote, planned_only) rather than relying on command-string sniffing. A new cluster_semantics.py module provides canonical semantic helpers, and the work queue uses these for phase-aware ordering instead of inferring intent from command strings.
C++ Detector Scoping Improvements
Three targeted fixes to the C++ plugin, contributed by @Dragoy (#415):
- Security findings scoped to first-party files — clang-tidy and cppcheck findings from vendor/external headers are now filtered out instead of being reported as project issues
- CMake-based test coverage mapping —
CMakeLists.txtfiles are parsed foradd_executable/add_library/target_sourcesto discover which source files a test target compiles, treating that as direct test coverage - Unused-imports phase disabled for C++ — the generic tree-sitter unused-import detector is unsound for
#includesemantics and now skips C++ projects - Header extension support —
_extract_import_namenow handles.h,.hh,.hppextensions correctly
Flexible Triage Attestations
Triage attestation validation for organize, enrich, and sense-check stages no longer requires literal cluster name references. Users can now provide substantive work-product descriptions as an alternative, making the triage workflow less rigid for both human and AI operators.
Triage Validation & Sense-Check Enhancements
- Sense-check stage gets a dedicated orchestrator with expanded prompts and evidence parsing
- Triage completion policy significantly enhanced with richer stage validation
- Stage prompt instruction blocks expanded for clearer agent guidance
- Evidence parsing extracted into a dedicated module
Other Improvements
.gitignorereminder added to README setup instructions (#416)- PyPI publish workflow push triggers restored while maintaining the main-branch gate
- Tweet release tests now properly stub the
requestsmodule for CI isolation
Community
Thanks to @imetandy for the triage deadlock fix and @Dragoy for the C++ detector scoping improvements. Issues and feedback from @guillaumejay, @wuurrd, @astappiev, @efstathiosntonas, @xliry, @kendonB, @WojciechBednarski, and @jakob1379 helped shape this release.
v0.9.8
This release adds full C++ and Rust language plugins, introduces two-phase review scoring, unified issue lifecycle status, anti-gaming safeguards, and delivers extensive triage validation, work queue, and cross-platform improvements — alongside continued code quality cleanup that removes 23 compat wrappers and tightens seams throughout the codebase.
609 files changed | 78 commits | 5,266 tests passing
C++ Language Support
C++ is now a full-depth language plugin with tree-sitter-based extraction, structural analysis, and tool-backed security scanning. The plugin includes:
- Function/class/include extraction from C++ source files
- Dependency graph analysis via
#includegraphs fromcompile_commands.jsonand Makefile projects - Structural and coupling phases with cppcheck integration and batch issue scanning
- Security detection with normalized findings from cppcheck and clang-tidy
- Review surfaces, test coverage hooks, and move support
- 14 test files with fixtures for CMake and Makefile sample projects
Rust Language Support
Rust gains full plugin parity with 13 Rust-specific detectors, 3 auto-fixers, and deep cargo toolchain integration:
- 13 detectors across 6 modules: API surface, cargo policy, safety, smells, dependencies, and custom rules
- 3 auto-fixers: crate imports, cargo features, readme doctests
- Cargo tool integration: clippy, cargo check, rustdoc
- Rust-aware dependency graphing and test coverage mapping with inline
#[cfg(test)]recognition - 117 tests across 12 test files
Two-Phase Review Scoring
Holistic review is restructured into two distinct phases:
- Phase 1 — Observe: collect characteristics and defects without scoring
- Phase 2 — Judge: synthesize dimension character from observations, then score
Positive observations now persist as context insights with full provenance (added_at, source, positive: true), replacing ephemeral strengths. A new context_schema.json defines the review data framework.
Unified Issue Lifecycle Status
DEFERRED and TRIAGED_OUT are added to the Status enum so state is always authoritative for issue disposition. Previously, temporary and triaged-out skips left issue.status as "open", causing overcounting in plan rendering and queue surfaces. Includes status migration on scan reconcile, new status icons, updated plan rendering, and surfaces history to reviewers with --retrospective True by default.
Anti-Gaming Safeguards
Two targeted fixes prevent AI agent score-anchoring:
- Numeric target redacted from penalty messages — "matched target 95.0" replaced with "clustered on the scoring target" so agents cannot infer and anchor on the exact number
- Blind-review workflow surfaced in the very first penalty message (previously only after streak ≥ 2), pointing agents to the blind packet and overlay docs immediately
Triage Validation Overhaul
- Reflect dispositions are now binding for organize — a structured
ReflectDispositiondataclass is parsed from the Coverage Ledger; organize validates that plan state matches every reflected disposition before submission - Organize validation extracted into dedicated
organize_policy.py, separated from batch context normalization - Suggestion and evidence surfaced in
showandclustercommands - Completion flow, observe batches, and stage queue gained new focused submodules
Work Queue Decomposition
The monolithic _work_queue/core.py and lifecycle.py are split into 5 focused modules (models.py, inputs.py, selection.py, finalize.py, snapshot.py). The engine/plan_queue.py facade is deleted. A canonical QueueSnapshot provides phase-aware partitioning.
Cross-Platform Hardening
- Cross-platform state locking:
fcntlon Unix,msvcrton Windows for atomic state file persistence - Windows tool argv parsing: generic tool commands now execute correctly on Windows
- Windows WinError 2 fix: codex exec spawning uses
shutil.which()to resolve.cmdbatch shims
Code Quality & Cleanup
- 23 compat wrapper files deleted — 13
_framework/wrappers, 8context_holistic/wrappers, 2helpers/wrappers — plus removal ofSimpleNamespacefake-module antipatterns - Go generated files now skipped during scanning, with improved import-run error messages
- Scan export and scoring impact crash paths fixed
- Rust workspace rustdoc execution repaired
- Extensive seam-tightening: holistic review prep, triage validation, scan sync workflow, language framework surface, tree-sitter runtime caches, and reporting/planning helpers all streamlined
v0.9.5
This release adds Julia language support, extends the tree-sitter framework, rebalances the health score toward subjective dimensions (now 75/25), and delivers a broad set of improvements to stability, triage process reliability, security hardening, and platform-specific fixes — alongside significant code quality cleanups that reduce indirection and remove over-extractions.
354+ files changed | 80+ commits | 5,022 tests passing
Julia Language Support
Julia is now a supported language. The initial plugin skeleton includes tree-sitter-based parsing and import resolution, following the same framework as the existing Python and TypeScript plugins.
Tree-sitter Framework Extensions
The tree-sitter framework gains functional specs and functional import resolvers — new extension points that language plugins can use to define analysis rules declaratively. These underpin the Julia plugin and will simplify future language additions.
Reviewer Finding Adjudication
Review subagents now receive structured access to judgment-required findings during batch construction. Per-detector finding counts are embedded in batches, concern signals carry fingerprints and finding IDs, and the CLI renders exploration commands (desloppify show <det> --no-budget). The concern signal cap is raised from 8 to 30 with overflow guidance, and the dismiss path is simplified to 2 fields.
Scoring Rebalance
Scoring weights shift to 75% subjective / 25% mechanical. Subjective design quality — review dimensions like naming, cohesion, and abstraction quality — is now the primary driver of the health score. All docs, reporting strings, tests, and snapshots updated to match.
Judgment-Required Detectors Excluded from Auto-Clustering
Detectors marked needs_judgment (e.g., structural, dict_keys, smells, responsibility_cohesion) now return None from the clustering grouping key. This means they flow through the review process as mechanical evidence rather than being auto-grouped into plan tasks. The cluster strategy is simplified: special-case grouping by file, subtype, and detector for judgment-required issues has been removed entirely, along with unused parameters in generate_description.
Living Plan & Queue Overhaul
The next command now follows the living plan directly rather than computing priorities independently. The queue system was substantially rearchitected:
- Execution vs. backlog queues are now separate surfaces —
nextpulls from the execution queue (current cluster work), while the backlog queue shows what's upcoming. - Queue lifecycle phases are explicitly persisted, giving clear visibility into where you are in the scan → triage → execute flow.
- Scan is a first-class postflight phase — the queue system recognizes and tracks post-scan state transitions.
- Queue output is labeled by surface so it's always clear which queue you're looking at.
- Prompts and coaching text aligned with the new execution queue semantics.
Triage & Review Improvements
- Triage dashboard fix: after completing triage, the dashboard no longer incorrectly shows "start with observe" restart guidance. Root cause was inferring status from empty
triage_stagesdict instead of checkingtriaged_ids. - No-op triage completion allowed for empty review batches — completing triage without findings no longer errors.
- Staged triage flow consolidated — routing, validation, and state contracts tightened across the triage pipeline.
- Review rerun preflight scoped — rerun checks are now properly gated.
- Review dimension metadata made monkeypatchable — enables test and plugin customization.
- Stale wontfix review tails are now cleared on completion.
State Recovery & Resilience
- Triage state recovery from saved plans — if triage state is lost (e.g., from a crash), it can be reconstructed from the persisted plan.
- State recovery from saved plans with deduplication of update-skill operations.
- Plan recovery consolidated on
scan_metadata, dropping the_saved_plan_recoverymarker. scan_metadataschema simplified —inventory_available/metrics_availableare now derived from scan source rather than stored as separate bools.- Stale cluster focus cleared on completion and on skip/cluster mutations.
Security Hardening
- Subprocess command paths hardened for detectors — command resolution tightened.
- Controlled subprocess security seams tightened — removed
# noseccomment noise and the copy-pasted_resolve_cli_executablehelper that existed in 8 files. - Source security findings hardened in detector outputs.
- Silent excepts removed from treesitter resolvers — errors are no longer swallowed.
- Defer policy key overwrites removed — policy values are now immutable once set.
Platform Fixes
- Windows WinError 2 fix (#383): codex subprocess spawning now uses
shutil.which()to resolve.cmdbatch shims on Windows. Also recognizes[WinError 2]in runner failure detection. - Monorepo path validation fix (#387): the
_PATH_REregex in triage enrich was anchored onsrc/and discarded leading path components likepackages/backend/, causing false-positive path failures in monorepos.
Code Quality & Cleanup
A major theme of this release is reversing over-extractions and reducing indirection. Several rounds of mechanical function splits were reverted:
- ~800+ net lines removed across 4 revert/inline commits — single-use helpers inlined back into their callers across triage stages, cluster display, queue flow, render, dispatch, and batch orchestration.
- Triage stage commands (observe/reflect/organize) now read top-to-bottom as linear validation chains with early returns.
QueueRenderContextdataclass removed — explicit kwargs proved more readable.review_qualityunified as the single canonical key (killed dual-key handling).- Import score provenance metadata reduced from 10 fields to 4.
- Scan workflow imports tightened — direct imports from
planning.scaninstead of going through the package namespace. - Remaining split-driver plan items bulk-skipped to prevent recurrence.
Tree-sitter Cleanup
- Import cycle broken in tree-sitter spec modules.
- Unused bridge modules deleted.
- Compatibility bridges centralized into a single location.
Contract & Registry Alignment
- Registry and subjective contracts aligned.
- Plugin scaffold contracts updated.
- TypeScript command surfaces normalized.
- Command registry annotations tightened.
- Schema drift payload builders normalized.
Test Improvements
- Over-mocked flow tests replaced with direct coverage.
- Direct coverage strengthened for import flows.
- New scan orchestration test verifying the planning scan surface integration.
- New review import support test helpers for plan sync runtime patching.
- Batch triage test helpers refreshed.
