Skip to content

Releases: peteromallet/desloppify

v0.9.14

24 Mar 01:10

Choose a tag to compare

Desloppify mascot

This release overhauls the plan/execute lifecycle — consolidating phase derivation into a single canonical function, fixing a force-rescan bug that re-queued completed subjective reviews, and simplifying the internal state machine from fine-grained phase names down to just plan and execute. Also fixes stale dashboard counts, graph normalization sampling, and Windows UTF-8 encoding issues.


64 files changed | 11 commits | 5,660 tests passing

Headline Feature

Lifecycle Consolidation

The plan/execute lifecycle — the state machine that decides whether you're planning work or executing it — has been significantly refactored for clarity and correctness:

  • Shared phase derivation — both the reconciliation pipeline and the work queue snapshot now delegate to a single derive_display_phase() pure function with a documented priority chain. Previously, two independent implementations had to be kept manually in sync.
  • Pure readercurrent_lifecycle_phase() no longer mutates plan data on read. Legacy phase name migration now runs once at plan-load time.
  • Marker invariants documented — the three scan-count markers that drive lifecycle transitions (lifecycle_phase, postflight_scan_completed_at_scan_count, subjective_review_completed_at_scan_count) are now documented with valid values, transitions, and single-writer functions.
  • No more bypass paths — snapshot phase resolution now routes all derivation through the shared function with no short-circuit returns.

This was driven by a recurring class of bugs where lifecycle markers got out of sync — most recently, force-rescan re-queuing completed subjective reviews.

Other Features

Score Checkpoint with Sparkline

plan_checkpoint progression events now include a sparkline showing score trajectory across checkpoints. This makes it easier to see at a glance whether the score is trending up or plateauing.

Simplified User-Facing Lifecycle

Users now see just "plan mode" and "execute mode" instead of internal phase names like workflow_postflight or triage_postflight. The communicate-score workflow step auto-resolves when no prior baseline exists, eliminating a confusing manual step on first use.

Bug Fixes

  • Force-rescan no longer resets subjective reviews — When --force-rescan ran during plan mode, the scan count increment caused subjective_review_completed_at_scan_count to go stale, re-queuing all 20 subjective reviews. The new carry_forward_subjective_review() promotes the marker when the old review matches the cycle being replaced.
  • Stale focus countsstatus, next, and scan commands now show current focus counts instead of stale cached values. Closes #503.
  • Graph normalization samplingcheck_all_graph_keys now inspects all keys for normalization, not just the first 3. Previously, graphs with only late-position abnormal keys could pass validation. Closes #502.
  • UTF-8 encoding for external tool reportsread_text() calls in jscpd_adapter.py, complexity.py, and test_coverage/io.py now specify encoding="utf-8" explicitly. On Windows, the system codepage default (cp1252) would crash when reports contained non-ASCII characters. Closes #505, reported by @pietrondo.
  • Tree-sitter CI stability — Spec tests now skip gracefully when grammar files aren't available, instead of failing the entire suite.

Refactoring & Internal

  • Single-writer lifecycle enforcement — eliminated side-channel phase writes that could put the lifecycle into inconsistent states.
  • Legacy phase name removal — all fine-grained persisted phase names (review_initial, assessment_postflight, workflow_postflight, etc.) are now migrated to coarse plan/execute modes at load time.
  • Snapshot signal shaping — mode-aware suppression of postflight signals (assessment/workflow/triage/review) now lives in the caller (_phase_for_snapshot) rather than inside the shared delegation layer, keeping _derive_display_phase as a pure items-to-bools mapper.

Community

Thanks to @pietrondo for reporting the Windows UTF-8 encoding issue (#505).

v0.9.13

22 Mar 17:47

Choose a tag to compare

Desloppify mascot

This release bundles all 11 skill overlays for one-step global install via desloppify setup, and fixes false positives across three detectors — orphaned, hardcoded_secret_name, and cycles — based on community reports.


31 files changed | 7 commits | 6,539 tests passing

Headline Feature

Bundled Skill Overlays with desloppify setup

All 11 agent skill files (Claude, Cursor, Copilot, Codex, Droid, Windsurf, Gemini, AMP, Hermes, OpenCode, Skill) are now bundled in the package and installed globally via desloppify setup. No network access needed — the command copies bundled docs to the right locations so agents discover desloppify across all projects. A pre-commit hook and make sync-docs target keep bundled copies in sync with docs/.

The README now points to desloppify setup as the primary install path, with update-skill as a per-project fallback.

Bug Fixes

  • Orphaned detector false positives — Files with __all__ exports are now recognized as intentional public API surfaces and excluded from orphan detection. Closes #496, reported by @Git-on-my-level.
  • hardcoded_secret_name false positives — Added entropy heuristic to filter non-secret values: field name constants (token_usage), sentinel strings, and label prefixes (agent_workspace@) are no longer flagged. Closes #496.
  • Cycles detector TYPE_CHECKING false positives — Imports inside if TYPE_CHECKING: blocks are now marked as deferred and excluded from cycle detection. From @Git-on-my-level's comment on #496.
  • Assessment crash on corrupted statestore_assessments now guards against raw int values in state with isinstance check before calling .get(). Closes #465, reported by @Vuk97.
  • fix_debug_logs negative-index corruption — Added lower-bound guard (start < 0) matching the pattern in all sibling fixers. Prevents silent wrong-line edits when entry["line"] is 0. Cherry-picked from PR #499 by @cpjet64.
  • .gitignore CLAUDE.md/AGENTS.md scope — Patterns now anchored to repo root (/CLAUDE.md) so subdirectory copies aren't ignored.

Refactoring & Internal

  • Review pipeline improvements — Added bias-to-action guidance: confirmed bugs get implemented immediately instead of deferred. Stage 3 now collects open questions for the maintainer instead of guessing. Hard rule: decisions are presented for approval before execution.

Community

Thanks to @Git-on-my-level for the detailed false-positive reports across three detectors (#496), @Vuk97 for reproducing the assessment crash (#465) and language detection bug (#466), and @cpjet64 for the fix_debug_logs guard (PR #499).

v0.9.12

21 Mar 03:54

Choose a tag to compare

Desloppify mascot

This release adds the strategist triage role for trend-aware strategic oversight, a desloppify setup command for one-step global skill installation, and a cluster of fixes that finally get stale subjective reviews unstuck from queue ordering issues. It also includes community-contributed fixes for Python src-layout projects, Windows encoding, and Knip hangs.


149 files changed | 21 commits | 6,485 tests passing

Headline Feature

Strategist Triage Role

The triage pipeline gains a new "strategist" stage that acts as a CEO-level overseer for the cleanup cycle. Instead of blindly trusting agent-reported trends, the strategist cross-checks score and debt trends against computed trajectory data, overriding mismatches. It can create high-priority strategy:: work items that jump to the front of the queue, and it supports an explicit confirmation gate (like other triage stages) so humans can review strategic assessments before they take effect.

The underlying ScoreTrajectory now tracks all-time highs from full scan history (not just the 5-scan window), introduces a "recovering" trend for scores that are improving but haven't reached their previous peak, and detects cross-cycle regression where a plateau masks a post-reset decline.

Other Features

desloppify setup Command

New command: pip install desloppify && desloppify setup. Copies bundled skill definitions to ~/.claude/ and ~/.cursor/ without network access, so agents discover desloppify globally across all projects. Per-project installs remain via update-skill.

Subjective Anti-Gaming Policy Disabled

The integrity policy that zeroed dimension scores when they converged near the target was producing false positives — blind-packet subagent reviews have no way to anchor to target scores, so legitimate convergence was being penalized. The policy now passes assessments through unchanged. It can be re-enabled later if a better detection strategy emerges.

Bug Fixes

  • Stale subjective reviews blocked by queue state — Three related fixes ensure stale reviews are always detected and injected regardless of whether mechanical items remain in the queue. Previously, the live_planned_queue_empty guard blocked all reconciliation when objective items existed, leaving review dimensions perpetually stale.
  • Phase ordering for stale reviews — Stale subjective reviews now take priority over non-critical workflows (communicate score) and triage items. Pre-review workflows (deferred disposition, run scan, import scores) still jump ahead of everything.
  • Force-rescan stale review injection — Force-rescan now correctly injects stale reviews by bypassing both queue-empty guards and the cycle-just-completed deferral logic. Also adds _refresh_plan_start_baseline() that reseeds scores without clearing workflow sentinels.
  • Clusters not marked done — Clusters stayed active even when all their items were resolved, causing completed work to reappear after rescan. Now sets execution_status="done" on completion and sweeps active clusters during reconciliation.
  • Resolved items superseded from clusters_supersede_dead_references was stripping resolved/fixed/wontfix items from clusters, making completed clusters appear incomplete. Now only supersedes items that are truly gone from state.
  • Python src-layout test coverageresolve_import_spec now tries src/-prefixed candidates, and module name computation strips the src/ prefix. Fixes false "transitive_only" reports for PEP 621 src-layout projects. Cherry-picked from PR #489 by @AreboursTLS.
  • Python multi-line import regexPY_IMPORT_RE now handles parenthesized imports (from pkg import (\n name, ...)), fixing the root cause of false transitive-only coverage reports.
  • Test coverage graph supplementation — Source parsing now always runs as a supplement to the import graph, catching submodule imports the graph resolves to __init__.py instead of the actual file.
  • Knip hang on missing dependency — Added stdin=subprocess.DEVNULL and --yes flag to prevent npx from blocking on interactive prompts when Knip is not a local dependency. Closes #494, reported by @goobsnake.
  • Windows UTF-8 encoding in review runner — Explicit encoding="utf-8", errors="replace" on all file reads in the review pipeline. Prevents charmap decode errors when Codex runners emit UTF-8 on Windows. Cherry-picked from PR #495 by @pietrondo.
  • Force-rescan queue-empty guard bypassreconcile_plan() was double-guarded by queue emptiness, preventing stale review injection when any objective items remained.

Refactoring & Internal

  • Skill doc improvements — Tightened skill descriptions to reduce false activations on generic programming questions. Added explicit "run next after scan" instruction so agents follow the tool's workflow instead of interpreting scan output themselves.
  • Review pipeline results — Stage 1/2/3 assessments for PRs #495, #493, #489, #189 and issues #494-#490. Backfilled Stage 2 files for older items.
  • Setup command scope trim — Removed --local mode and global skill discovery integration from the initial implementation, keeping the command focused on a single responsibility.

Community

Thanks to @AreboursTLS for the Python src-layout test coverage fix (PR #489), @pietrondo for the Windows UTF-8 encoding fix (PR #495), and @goobsnake for reporting the Knip hang issue (#494).

v0.9.11

19 Mar 00:18

Choose a tag to compare

Desloppify mascot

This release adds the progression log — an append-only lifecycle event timeline that gives AI agents persistent memory across cycles — along with a batch of bug fixes from the community covering cross-platform issues, regex safety, serialization crashes, and triage quality.


62 files changed | 18 commits | 5,495 tests passing

Progression Log

The biggest addition in this release is .desloppify/progression.jsonl — an append-only event log that records lifecycle boundary events as they happen. Each line is a self-contained JSON object with a discriminated event_type, timestamps, scores, and a structured payload.

The problem this solves: desloppify's lifecycle is a loop (execute → scan → review → triage → execute), and until now there was no persistent record of what happened at each boundary. The scan history had the last 20 scans, the execution log had plan actions, and query.json was ephemeral. A CEO agent guiding triage had no way to answer "what improved since last cycle?" without reconstructing it from scattered sources.

The progression log records 7 boundary events:

  • scan_preflight — gate decision (allowed/blocked/bypassed) with queue state
  • scan_complete — scores, dimension deltas, scan diff, execution summary with resolved/skipped IDs, suppression metrics
  • postflight_scan_completed — scan marker flip for the current cycle
  • subjective_review_completed — reviewer observations: which dimensions were covered, evidence summaries, new review issue IDs and summaries, import provenance
  • triage_complete — strategy summary, cluster names and theses, verdict counts, organized/total counts
  • entered_planning_mode — phase transition into any planning phase, with trigger
  • execution_drain — queue drained via resolve or workflow resolve, with scores at drain

Key design decisions:

  • Timestamps as join keys — events carry enough summary for quick reads, but timestamps let a reader query state.json and the plan's execution_log for full detail. The progression log is a timeline index, not a data copy.
  • Idempotent marker triggers — events fire on mark_postflight_scan_completed() and mark_subjective_review_completed() returning True. These are idempotent per scan_count, so no double-fire.
  • Best-effort, never break parent — all hooks are wrapped in try/except. Advisory file locking with 2s timeout. On lock failure, appends without the lock and logs a warning.
  • Corruption-resilient reads — corrupt JSONL lines are skipped with a warning; the file is never erased. Periodic trim keeps it under 2000 lines.

Triage: Stop Obsessing Over Test Coverage

Multiple users reported the tool pushes test writing over actual code cleanup. The root cause: triage LLMs promote test_coverage clusters because they appear first (sorted by issue count) with no guidance to deprioritize.

Changes:

  • Triage prompt now explicitly says: clean up code quality BEFORE test coverage — writing tests for sloppy code locks in the slop
  • Added defer action type for auto-clusters (keeps in backlog for later cycles)
  • Scan coaching and catalog guidance reworded to "review coverage gaps" instead of "add tests"

Bug Fixes

  • Phantom cluster membership — action_step refs are traceability metadata, not membership. Merging them into issue_ids caused bare shorthand IDs from triage runners to become phantom cluster members that reappear after every reconcile cycle.
  • Update-skill duplicate detection — raises CommandError when skill content is already present but begin/end markers are missing, preventing silent duplicate appends.
  • Generic fixer crash — four autofix pipeline sites assumed FixResult entries always have a removed key. Generic fixers (e.g., eslint-warning) return {file, line} or {file, fixed} without it. Cherry-picked from PR #484 by @AugusteBalas.
  • JSON serialization crashEcosystemFrameworkDetection dataclass instances leaked into review_cache via shared dict references, causing TypeError on state serialization. Adds a dataclass handler to json_default. Bug identified by @0-CYBERDYNE-SYSTEMS-0 in PR #486.
  • Synthetic ID deferred skip loop — synthetic queue IDs (workflow::*, triage::*) in the skipped dict caused phantom deferred-disposition items. Cherry-picked from PR #485 by @ryexLLC.
  • Dart regex catastrophic backtracking — the annotation sub-pattern had overlapping whitespace consumption between a character class and trailing \s*, wrapped in ()*, causing exponential backtracking. Possessive quantifiers prevent it. Cherry-picked from PR #477 by @AvoMandjian.
  • Framework cache state bloat — framework detection wrote dataclass objects into review_cache, which shares a dict reference with persisted state. New runtime_cache field separates ephemeral scan-scoped data from persisted review state. Cherry-picked from PR #483 by @maciej-trebacz.
  • Zone reclassification stale issues — when zone rules change (e.g., adding JS test patterns), existing open issues for reclassified files now auto-resolve instead of persisting forever. Bug identified by @claytona500 in PR #478.
  • macOS SSL for update-skill — uses certifi CA bundle instead of the system cert store, which on Homebrew Python often has no CA certificates. Closes #468, reported by @Vuk97.
  • Windows font fallbacks — scorecard image generator now has Consolas, Georgia, Segoe UI, and Arial as Windows fallbacks instead of falling through to Pillow's bitmap font.
  • Windows .exe process execution_resolve_executable() no longer wraps .exe binaries in cmd /c, which caused \" escaping errors when prompts contain spaces. Closes #487, reported by @Dteyn.
  • Snyk false positive — renamed session_token placeholder to session_hmac in review JSON examples to avoid Snyk W007 credential-detection heuristic. Closes #473, reported by @mark-major.

R Code Smell Detectors

Contributed by @sims1253 (PR #450). Ten R-specific smell checks detecting common anti-patterns: setwd(), <<- global assignment, attach(), rm(list=ls()), debug leftovers, T/F ambiguity, 1:n() off-by-one risk, deprecated stringsAsFactors, and library() inside functions. The library_in_function check uses tree-sitter when available with a regex fallback.

Also extends the generic language framework with custom_phases support — any generic plugin can now inject language-specific detector phases without converting to a full plugin. 22 tests covering detection, false positive suppression, and edge cases.

Documentation

  • Work queue README — explains how items flow from scan to execution queue, why test_coverage dominates pre-triage ordering, that tier is display-only metadata, and the full sort order with filter chain
  • Main README process overview — new "How it works" section explaining the scan → score → review → triage → execute loop
  • Docs reorganization — internal documentation (DEVELOPMENT_PHILOSOPHY, QUEUE_LIFECYCLE, ci_plan) and release infrastructure moved from docs/ to dev/. Website separated as its own repo.

Community

Cherry-picks and bug reports from @AugusteBalas (generic fixer crash), @AvoMandjian (Dart regex DoS), @maciej-trebacz (framework cache bloat), @ryexLLC (synthetic ID loop), @0-CYBERDYNE-SYSTEMS-0 (dataclass serialization), @claytona500 (zone reclassification), @Vuk97 (macOS SSL), @Dteyn (Windows .exe execution), and @mark-major (Snyk false positive).

@sims1253 continues to drive R language support forward — this is the fifth R-focused PR across the last few releases, and the custom_phases framework extension benefits every language plugin.

Thank you all — every report and PR made this release better.

v0.9.10

16 Mar 21:55

Choose a tag to compare

Desloppify mascot

This release adds experimental Hermes Agent integration for fully autonomous cleanup loops, framework-aware detection with a full Next.js spec, SCSS language support, significant R language improvements, and a scan performance boost from detector prefetch + caching — alongside a batch of bug fixes from the community.


152 files changed | 54 commits | 5,466 tests passing

Hermes Agent Integration (Experimental)

We've been exploring what it looks like when a codebase health tool can actually drive an AI agent — not just generate reports, but orchestrate the entire cleanup loop autonomously. This release ships our first experimental integration with Hermes Agent.

The core idea: desloppify already knows what needs to be done (scan, triage, review, fix). Instead of printing instructions for a human, it can now tell the agent directly — switch to a cheap model for mechanical fixes, switch to an expensive one for architectural review, reset context between tasks, and keep the agent working via /autoreply, all without a human in the loop.

What the integration enables:

  • Autonomous review loops — desloppify orchestrates blind reviews via delegate_task subagents (up to 3 concurrent), no human needed
  • Model switching at phase boundaries — cheap models for execution, expensive for planning/review, switched automatically
  • Context management — automatic resets between tasks to keep the agent focused on long sessions
  • Lifecycle transitions — desloppify tells Hermes what to do next via the Control API

How to try it

This requires the Control API branch of Hermes (NousResearch/hermes-agent#1508), which hasn't been merged upstream yet. Without it, Hermes works as a normal harness but can't do autonomous model switching or self-prompting.

Copy-paste this into your agent (it handles both fresh installs and existing Hermes setups):

Install desloppify with Hermes Control API support. The Control API lives on a fork branch until NousResearch/hermes-agent#1508 merges upstream.

1. If Hermes is already installed (check with `which hermes`), switch it to the Control API branch:

   cd $(pip show hermes-agent 2>/dev/null | grep -i location | cut -d' ' -f2)/hermes-agent
   git remote add peteromallet https://github.com/peteromallet/hermes-agent.git 2>/dev/null || true
   git fetch peteromallet
   git checkout peteromallet/feat/control-api-autoreply
   pip install -e .

   IMPORTANT: If Hermes is currently running, tell the user they need to restart it with `HERMES_CONTROL_API=1 hermes` after this step — the new branch won't take effect until they do.

   If Hermes is NOT installed, clone fresh:

   git clone -b feat/control-api-autoreply https://github.com/peteromallet/hermes-agent.git
   cd hermes-agent
   pip install -e .

2. Install desloppify and set up the skill doc:

   pip install desloppify[full]
   cd /path/to/your/project
   desloppify update-skill hermes

3. Start Hermes with the Control API enabled:

   cd /path/to/your/project
   HERMES_CONTROL_API=1 hermes

4. In the Hermes session, run:

   Run desloppify scan, then follow its coaching output to clean up the codebase.

Desloppify will guide Hermes through the full lifecycle — scanning, triaging, blind reviews with subagents, and fixing. It switches models and resets context automatically at phase boundaries.

This is experimental and we're iterating fast. We'd love feedback on the approach, rough edges, and what you'd want to see next. If you try it, please open an issue — every report helps.

Framework-Aware Detection

Massive contribution from @MacHatter1 (PR #414). A new FrameworkSpec abstraction layer for framework-specific detection, shipping with a full Next.js spec that understands App Router conventions, server components, use client/use server directives, and Next.js-specific lint rules. This means dramatically fewer false positives when scanning Next.js projects — framework idioms are recognized, not flagged. The spec system is extensible, so adding support for other frameworks (Remix, SvelteKit, etc.) is now a matter of writing a spec, not changing the engine.

SCSS Language Plugin

Thanks to @klausagnoletti for adding SCSS/Sass support via stylelint integration (PR #428). Detects code smells, unused variables, and style issues in .scss and .sass files. @klausagnoletti has also submitted a follow-up PR (#452) with bug fixes, tests, and honest documentation — expected to land shortly after release.

Plugin Tests, Docs, and Ruby Improvements

@klausagnoletti also contributed across multiple language plugins:

  • Ruby plugin improvements (PR #462) — expanded exclusions, detect markers (Gemfile, Rakefile, .ruby-version, *.gemspec), default_src="lib", spec/ + test/ support, and 13 wiring tests. Also adds external_test_dirs and test_file_extensions params to the generic plugin framework.
  • JavaScript plugin tests + README (PR #458) — 12 sanity tests covering ESLint integration, command construction, fixer registration, and output parsing.
  • Python plugin README (PR #459) — user-facing documentation covering phases, requirements, and usage.

R Language Improvements

@sims1253 has been steadily building out R support and contributed four PRs to this release:

  • Jarl linter with autofix support (PR #425) — adds a fast R linter as an alternative to lintr
  • Shell quote escaping fix for lintr commands (PR #424) — prevents command injection on paths with special characters
  • Tree-sitter query improvements (PR #449) — captures anonymous functions in lapply/sapply calls and pkg::fn namespace imports
  • Factory Droid harness support (PR #451) — adds Droid as a new skill target, following the existing harness pattern exactly

Scan Performance: Detector Prefetch + Cache

Another big one from @MacHatter1 (PR #432). Cold and full scan times reduced significantly. Detectors now prefetch file contents and cache results across detection phases, avoiding redundant I/O. On large codebases this is a noticeable improvement.

Lifecycle & Triage

  • Lifecycle transition messages — the tool now tells agents what phase they're in and what to do next, with structured directives for each transition
  • Unified triage pipeline with step detail display
  • Staged triage now requires explicit decisions for auto-clusters before proceeding — no more accidentally skipping triage steps

Bug Fixes

  • Binding-aware unused import detection for JS/TS@MacHatter1 (PR #433). No longer flags imports used via destructuring, as renames, or re-export patterns. This was a significant source of false positives in real JS/TS projects.
  • Rust dep graph hangs@fluffypony (PR #429). String literals that look like import paths (e.g., "path/to/thing") no longer cause the dependency graph builder to hang. @fluffypony also contributed Rust inline-test filtering (PR #440), which prevents #[cfg(test)] diagnostic noise from inflating production debt scores.
  • Project root detection (PR #439) — fixed cases where the project root was derived incorrectly, plus force-rescan now properly wipes stale plan data, and manual clusters are visible in triage.
  • workflow::create-plan re-injection@cdunda-perchwell (PR #435). Resolved workflow items no longer reappear in the execution queue after reconciliation. @cdunda-perchwell also identified the related communicate-score cycle-boundary sentinel issue (#447, fix in PR #448).
  • PHPStan parser fixes@nickperkins (PR #420). stderr output and malformed JSON from PHPStan no longer crash the parser. Clean, focused fix.
  • Preserve plan_start_scores during force-rescan — manual clusters are no longer wiped when force-rescanning.
  • Import run project root--scan-after-import now derives the project root correctly from the state file path.
  • Windows codex runner (PR #453) — proper cmd /c argument quoting + UTF-8 log encoding for Windows. Reported by @DenysAshikhin.
  • Scan after queue drain (PR #454) — score_display_mode now returns LIVE when queue is empty, fixing the UX contradiction where next says "run scan" but scan refuses. Reported by @kgelpes.
  • SKILL.md cleanup (PR #455) — removes unsupported allowed-tools frontmatter, fixes batch naming inconsistency (.raw.txt not .json), adds pip fallback alongside uvx. Three issues all reported by @willfrey.
  • Batch retry coverage gate (PR #456) — partial retries now bypass the full-coverage requirement instead of being rejected. Reported by @imetandy.
  • R anonymous function extraction (PR #461) — the tree-sitter anonymous function pattern from PR #449 now actually works (extractor handles missing @name capture with <anonymous> fallback).

Community

This release wouldn't exist without the community. Seriously — thank you all.

@MacHatter1 delivered three major PRs (framework-aware detection, detector prefetch + cache, binding-aware unused imports) that each individually would have been a headline feature. The framework spec system in particular opens up a whole new category of detection accuracy.

@fluffypony contributed both the Rust dep graph hang fix and the inline-test filtering — the latter being 1,000+ lines of carefully tested Rust syntax parsing with conservative cfg predicate handling and thorough edge-case coverage.

@sims1253 has been the driving force behind R language support, with four PRs spanning linting, tree-sitter queries, and harness support. The R plugin is becoming genuinely useful thanks to this sustained effort.

@klausagnoletti added SCSS support, imp...

Read more

v0.9.9

13 Mar 19:47
9c233a4

Choose a tag to compare

Desloppify mascot

This release focuses on plan lifecycle robustness — fixing workflow deadlocks, auto-resolving stale issues, hardening the reconciliation pipeline, and replacing heuristics with explicit cluster semantics. It also includes C++ detector scoping improvements from a community contributor and several UX fixes that prevent agents from getting stuck mid-cycle.


366 files changed | 16 commits | 5,367 tests passing

Refactoring & Internal Cleanup

This release continues the pattern of tightening seams and reducing indirection across the codebase. Over half the 366 changed files are internal restructuring:

  • Cluster and override → subpackagescluster_ops_display.py, cluster_ops_manage.py, cluster_ops_reorder.py, cluster_update.py, and cluster_steps.py moved into a cluster/ subpackage. Same treatment for override_io.py, override_misc.py, override_skip.py, and override_resolve_* into override/.
  • Holistic cluster accessors inlined — ~8 small wrapper files in context_holistic/ deleted (_clusters_complexity.py, _clusters_consistency.py, _clusters_dependency.py, _clusters_security.py, etc.) and inlined into their callers
  • Plan sync pipeline extracted — new sync/pipeline.py and sync/phase_cleanup.py pulled out of the monolithic workflow, with reconcile.py renamed to scan_issue_reconcile.py and review import reconcile moved into sync/review_import.py
  • Issue semantics centralized — new issue_semantics.py (~225 lines) consolidating classification logic that was previously scattered across multiple modules
  • Plan reconcile simplifiedscan/plan_reconcile.py cut from ~470 lines to ~200 by extracting shared logic into the engine layer
  • Work queue snapshot overhaulsnapshot.py gained ~470 lines of phase-aware partitioning and ranking refinements, replacing ad-hoc ordering logic
  • TS dead code removedhelpers_blocks.py and helpers_line_state.py deleted (~200 lines of unused smell detection helpers)
  • Broad type/schema updates — issue type references and state schema types updated across 130+ files for consistency with the new issue semantics

Auto-Resolve Issues for Deleted Files

When a scan runs and a previously-flagged file no longer exists on disk, its open issues are now automatically set to auto_resolved with a clear note. Previously, issues for deleted files would remain open and pollute the work queue indefinitely — particularly painful in Rust projects where module reorganization is common. Closes #412.

Triage Deadlock Fix

Fixed a deadlock where triage was stale (new review issues arrived mid-cycle), but triage couldn't start because objective backlog was still open, and objective resolves were blocked because triage was stale. The fix detects this "pending behind objective backlog" state and allows objective work to continue while keeping review resolves gated. The banner now shows TRIAGE PENDING instead of nudging toward a triage command that can't run yet. Community contribution from @imetandy (#413).

Batch Runner Stall Detection Fix

The review batch runner's stall detector was prematurely killing codex batches during their initialization phase — before any output file was written. This caused --import-run to fail with "missing result files for batches" errors. The stall detector now never declares a stall when no output file exists yet, while the hard timeout still catches truly hung batches. Closes #417 and #401.

Sequential Reconciliation Pipeline

Fixes a cluster tracker race condition on parallel updates. A new shared reconciliation pipeline runs all sync steps sequentially: subjective dimensions, auto-clustering, score communication, plan creation, triage, and lifecycle phase. This replaces the previous approach where parallel operations could produce inconsistent plan state.

Explicit Cluster Semantics

Clusters now carry explicit action_type (auto_fix, refactor, manual_fix, reorganize) and execution_policy (ephemeral_autopromote, planned_only) rather than relying on command-string sniffing. A new cluster_semantics.py module provides canonical semantic helpers, and the work queue uses these for phase-aware ordering instead of inferring intent from command strings.

C++ Detector Scoping Improvements

Three targeted fixes to the C++ plugin, contributed by @Dragoy (#415):

  • Security findings scoped to first-party files — clang-tidy and cppcheck findings from vendor/external headers are now filtered out instead of being reported as project issues
  • CMake-based test coverage mappingCMakeLists.txt files are parsed for add_executable/add_library/target_sources to discover which source files a test target compiles, treating that as direct test coverage
  • Unused-imports phase disabled for C++ — the generic tree-sitter unused-import detector is unsound for #include semantics and now skips C++ projects
  • Header extension support_extract_import_name now handles .h, .hh, .hpp extensions correctly

Flexible Triage Attestations

Triage attestation validation for organize, enrich, and sense-check stages no longer requires literal cluster name references. Users can now provide substantive work-product descriptions as an alternative, making the triage workflow less rigid for both human and AI operators.

Triage Validation & Sense-Check Enhancements

  • Sense-check stage gets a dedicated orchestrator with expanded prompts and evidence parsing
  • Triage completion policy significantly enhanced with richer stage validation
  • Stage prompt instruction blocks expanded for clearer agent guidance
  • Evidence parsing extracted into a dedicated module

Other Improvements

  • .gitignore reminder added to README setup instructions (#416)
  • PyPI publish workflow push triggers restored while maintaining the main-branch gate
  • Tweet release tests now properly stub the requests module for CI isolation

Community

Thanks to @imetandy for the triage deadlock fix and @Dragoy for the C++ detector scoping improvements. Issues and feedback from @guillaumejay, @wuurrd, @astappiev, @efstathiosntonas, @xliry, @kendonB, @WojciechBednarski, and @jakob1379 helped shape this release.

v0.9.8

12 Mar 22:25

Choose a tag to compare

Desloppify mascot

This release adds full C++ and Rust language plugins, introduces two-phase review scoring, unified issue lifecycle status, anti-gaming safeguards, and delivers extensive triage validation, work queue, and cross-platform improvements — alongside continued code quality cleanup that removes 23 compat wrappers and tightens seams throughout the codebase.


609 files changed | 78 commits | 5,266 tests passing

C++ Language Support

C++ is now a full-depth language plugin with tree-sitter-based extraction, structural analysis, and tool-backed security scanning. The plugin includes:

  • Function/class/include extraction from C++ source files
  • Dependency graph analysis via #include graphs from compile_commands.json and Makefile projects
  • Structural and coupling phases with cppcheck integration and batch issue scanning
  • Security detection with normalized findings from cppcheck and clang-tidy
  • Review surfaces, test coverage hooks, and move support
  • 14 test files with fixtures for CMake and Makefile sample projects

Rust Language Support

Rust gains full plugin parity with 13 Rust-specific detectors, 3 auto-fixers, and deep cargo toolchain integration:

  • 13 detectors across 6 modules: API surface, cargo policy, safety, smells, dependencies, and custom rules
  • 3 auto-fixers: crate imports, cargo features, readme doctests
  • Cargo tool integration: clippy, cargo check, rustdoc
  • Rust-aware dependency graphing and test coverage mapping with inline #[cfg(test)] recognition
  • 117 tests across 12 test files

Two-Phase Review Scoring

Holistic review is restructured into two distinct phases:

  • Phase 1 — Observe: collect characteristics and defects without scoring
  • Phase 2 — Judge: synthesize dimension character from observations, then score

Positive observations now persist as context insights with full provenance (added_at, source, positive: true), replacing ephemeral strengths. A new context_schema.json defines the review data framework.

Unified Issue Lifecycle Status

DEFERRED and TRIAGED_OUT are added to the Status enum so state is always authoritative for issue disposition. Previously, temporary and triaged-out skips left issue.status as "open", causing overcounting in plan rendering and queue surfaces. Includes status migration on scan reconcile, new status icons, updated plan rendering, and surfaces history to reviewers with --retrospective True by default.

Anti-Gaming Safeguards

Two targeted fixes prevent AI agent score-anchoring:

  • Numeric target redacted from penalty messages — "matched target 95.0" replaced with "clustered on the scoring target" so agents cannot infer and anchor on the exact number
  • Blind-review workflow surfaced in the very first penalty message (previously only after streak ≥ 2), pointing agents to the blind packet and overlay docs immediately

Triage Validation Overhaul

  • Reflect dispositions are now binding for organize — a structured ReflectDisposition dataclass is parsed from the Coverage Ledger; organize validates that plan state matches every reflected disposition before submission
  • Organize validation extracted into dedicated organize_policy.py, separated from batch context normalization
  • Suggestion and evidence surfaced in show and cluster commands
  • Completion flow, observe batches, and stage queue gained new focused submodules

Work Queue Decomposition

The monolithic _work_queue/core.py and lifecycle.py are split into 5 focused modules (models.py, inputs.py, selection.py, finalize.py, snapshot.py). The engine/plan_queue.py facade is deleted. A canonical QueueSnapshot provides phase-aware partitioning.

Cross-Platform Hardening

  • Cross-platform state locking: fcntl on Unix, msvcrt on Windows for atomic state file persistence
  • Windows tool argv parsing: generic tool commands now execute correctly on Windows
  • Windows WinError 2 fix: codex exec spawning uses shutil.which() to resolve .cmd batch shims

Code Quality & Cleanup

  • 23 compat wrapper files deleted — 13 _framework/ wrappers, 8 context_holistic/ wrappers, 2 helpers/ wrappers — plus removal of SimpleNamespace fake-module antipatterns
  • Go generated files now skipped during scanning, with improved import-run error messages
  • Scan export and scoring impact crash paths fixed
  • Rust workspace rustdoc execution repaired
  • Extensive seam-tightening: holistic review prep, triage validation, scan sync workflow, language framework surface, tree-sitter runtime caches, and reporting/planning helpers all streamlined

v0.9.5

11 Mar 13:20

Choose a tag to compare

Desloppify mascot

This release adds Julia language support, extends the tree-sitter framework, rebalances the health score toward subjective dimensions (now 75/25), and delivers a broad set of improvements to stability, triage process reliability, security hardening, and platform-specific fixes — alongside significant code quality cleanups that reduce indirection and remove over-extractions.


354+ files changed | 80+ commits | 5,022 tests passing

Julia Language Support

Julia is now a supported language. The initial plugin skeleton includes tree-sitter-based parsing and import resolution, following the same framework as the existing Python and TypeScript plugins.

Tree-sitter Framework Extensions

The tree-sitter framework gains functional specs and functional import resolvers — new extension points that language plugins can use to define analysis rules declaratively. These underpin the Julia plugin and will simplify future language additions.

Reviewer Finding Adjudication

Review subagents now receive structured access to judgment-required findings during batch construction. Per-detector finding counts are embedded in batches, concern signals carry fingerprints and finding IDs, and the CLI renders exploration commands (desloppify show <det> --no-budget). The concern signal cap is raised from 8 to 30 with overflow guidance, and the dismiss path is simplified to 2 fields.

Scoring Rebalance

Scoring weights shift to 75% subjective / 25% mechanical. Subjective design quality — review dimensions like naming, cohesion, and abstraction quality — is now the primary driver of the health score. All docs, reporting strings, tests, and snapshots updated to match.

Judgment-Required Detectors Excluded from Auto-Clustering

Detectors marked needs_judgment (e.g., structural, dict_keys, smells, responsibility_cohesion) now return None from the clustering grouping key. This means they flow through the review process as mechanical evidence rather than being auto-grouped into plan tasks. The cluster strategy is simplified: special-case grouping by file, subtype, and detector for judgment-required issues has been removed entirely, along with unused parameters in generate_description.

Living Plan & Queue Overhaul

The next command now follows the living plan directly rather than computing priorities independently. The queue system was substantially rearchitected:

  • Execution vs. backlog queues are now separate surfaces — next pulls from the execution queue (current cluster work), while the backlog queue shows what's upcoming.
  • Queue lifecycle phases are explicitly persisted, giving clear visibility into where you are in the scan → triage → execute flow.
  • Scan is a first-class postflight phase — the queue system recognizes and tracks post-scan state transitions.
  • Queue output is labeled by surface so it's always clear which queue you're looking at.
  • Prompts and coaching text aligned with the new execution queue semantics.

Triage & Review Improvements

  • Triage dashboard fix: after completing triage, the dashboard no longer incorrectly shows "start with observe" restart guidance. Root cause was inferring status from empty triage_stages dict instead of checking triaged_ids.
  • No-op triage completion allowed for empty review batches — completing triage without findings no longer errors.
  • Staged triage flow consolidated — routing, validation, and state contracts tightened across the triage pipeline.
  • Review rerun preflight scoped — rerun checks are now properly gated.
  • Review dimension metadata made monkeypatchable — enables test and plugin customization.
  • Stale wontfix review tails are now cleared on completion.

State Recovery & Resilience

  • Triage state recovery from saved plans — if triage state is lost (e.g., from a crash), it can be reconstructed from the persisted plan.
  • State recovery from saved plans with deduplication of update-skill operations.
  • Plan recovery consolidated on scan_metadata, dropping the _saved_plan_recovery marker.
  • scan_metadata schema simplified — inventory_available/metrics_available are now derived from scan source rather than stored as separate bools.
  • Stale cluster focus cleared on completion and on skip/cluster mutations.

Security Hardening

  • Subprocess command paths hardened for detectors — command resolution tightened.
  • Controlled subprocess security seams tightened — removed # nosec comment noise and the copy-pasted _resolve_cli_executable helper that existed in 8 files.
  • Source security findings hardened in detector outputs.
  • Silent excepts removed from treesitter resolvers — errors are no longer swallowed.
  • Defer policy key overwrites removed — policy values are now immutable once set.

Platform Fixes

  • Windows WinError 2 fix (#383): codex subprocess spawning now uses shutil.which() to resolve .cmd batch shims on Windows. Also recognizes [WinError 2] in runner failure detection.
  • Monorepo path validation fix (#387): the _PATH_RE regex in triage enrich was anchored on src/ and discarded leading path components like packages/backend/, causing false-positive path failures in monorepos.

Code Quality & Cleanup

A major theme of this release is reversing over-extractions and reducing indirection. Several rounds of mechanical function splits were reverted:

  • ~800+ net lines removed across 4 revert/inline commits — single-use helpers inlined back into their callers across triage stages, cluster display, queue flow, render, dispatch, and batch orchestration.
  • Triage stage commands (observe/reflect/organize) now read top-to-bottom as linear validation chains with early returns.
  • QueueRenderContext dataclass removed — explicit kwargs proved more readable.
  • review_quality unified as the single canonical key (killed dual-key handling).
  • Import score provenance metadata reduced from 10 fields to 4.
  • Scan workflow imports tightened — direct imports from planning.scan instead of going through the package namespace.
  • Remaining split-driver plan items bulk-skipped to prevent recurrence.

Tree-sitter Cleanup

  • Import cycle broken in tree-sitter spec modules.
  • Unused bridge modules deleted.
  • Compatibility bridges centralized into a single location.

Contract & Registry Alignment

  • Registry and subjective contracts aligned.
  • Plugin scaffold contracts updated.
  • TypeScript command surfaces normalized.
  • Command registry annotations tightened.
  • Schema drift payload builders normalized.

Test Improvements

  • Over-mocked flow tests replaced with direct coverage.
  • Direct coverage strengthened for import flows.
  • New scan orchestration test verifying the planning scan surface integration.
  • New review import support test helpers for plan sync runtime patching.
  • Batch triage test helpers refreshed.