Releases · peteromallet/desloppify

24 Mar 01:10

v0.9.14

19b9ebe

v0.9.14 Latest

Latest

This release overhauls the plan/execute lifecycle — consolidating phase derivation into a single canonical function, fixing a force-rescan bug that re-queued completed subjective reviews, and simplifying the internal state machine from fine-grained phase names down to just plan and execute. Also fixes stale dashboard counts, graph normalization sampling, and Windows UTF-8 encoding issues.

64 files changed | 11 commits | 5,660 tests passing

Headline Feature

Lifecycle Consolidation

The plan/execute lifecycle — the state machine that decides whether you're planning work or executing it — has been significantly refactored for clarity and correctness:

Shared phase derivation — both the reconciliation pipeline and the work queue snapshot now delegate to a single derive_display_phase() pure function with a documented priority chain. Previously, two independent implementations had to be kept manually in sync.
Pure reader — current_lifecycle_phase() no longer mutates plan data on read. Legacy phase name migration now runs once at plan-load time.
Marker invariants documented — the three scan-count markers that drive lifecycle transitions (lifecycle_phase, postflight_scan_completed_at_scan_count, subjective_review_completed_at_scan_count) are now documented with valid values, transitions, and single-writer functions.
No more bypass paths — snapshot phase resolution now routes all derivation through the shared function with no short-circuit returns.

This was driven by a recurring class of bugs where lifecycle markers got out of sync — most recently, force-rescan re-queuing completed subjective reviews.

Other Features

Score Checkpoint with Sparkline

plan_checkpoint progression events now include a sparkline showing score trajectory across checkpoints. This makes it easier to see at a glance whether the score is trending up or plateauing.

Simplified User-Facing Lifecycle

Users now see just "plan mode" and "execute mode" instead of internal phase names like workflow_postflight or triage_postflight. The communicate-score workflow step auto-resolves when no prior baseline exists, eliminating a confusing manual step on first use.

Bug Fixes

Force-rescan no longer resets subjective reviews — When --force-rescan ran during plan mode, the scan count increment caused subjective_review_completed_at_scan_count to go stale, re-queuing all 20 subjective reviews. The new carry_forward_subjective_review() promotes the marker when the old review matches the cycle being replaced.
Stale focus counts — status, next, and scan commands now show current focus counts instead of stale cached values. Closes #503.
Graph normalization sampling — check_all_graph_keys now inspects all keys for normalization, not just the first 3. Previously, graphs with only late-position abnormal keys could pass validation. Closes #502.
UTF-8 encoding for external tool reports — read_text() calls in jscpd_adapter.py, complexity.py, and test_coverage/io.py now specify encoding="utf-8" explicitly. On Windows, the system codepage default (cp1252) would crash when reports contained non-ASCII characters. Closes #505, reported by @pietrondo.
Tree-sitter CI stability — Spec tests now skip gracefully when grammar files aren't available, instead of failing the entire suite.

Refactoring & Internal

Single-writer lifecycle enforcement — eliminated side-channel phase writes that could put the lifecycle into inconsistent states.
Legacy phase name removal — all fine-grained persisted phase names (review_initial, assessment_postflight, workflow_postflight, etc.) are now migrated to coarse plan/execute modes at load time.
Snapshot signal shaping — mode-aware suppression of postflight signals (assessment/workflow/triage/review) now lives in the caller (_phase_for_snapshot) rather than inside the shared delegation layer, keeping _derive_display_phase as a pure items-to-bools mapper.

Community

Thanks to @pietrondo for reporting the Windows UTF-8 encoding issue (#505).

Contributors

pietrondo

Assets 2

22 Mar 17:47

peteromallet

v0.9.13

dd7a67a

v0.9.13

This release bundles all 11 skill overlays for one-step global install via desloppify setup, and fixes false positives across three detectors — orphaned, hardcoded_secret_name, and cycles — based on community reports.

31 files changed | 7 commits | 6,539 tests passing

Headline Feature

Bundled Skill Overlays with `desloppify setup`

All 11 agent skill files (Claude, Cursor, Copilot, Codex, Droid, Windsurf, Gemini, AMP, Hermes, OpenCode, Skill) are now bundled in the package and installed globally via desloppify setup. No network access needed — the command copies bundled docs to the right locations so agents discover desloppify across all projects. A pre-commit hook and make sync-docs target keep bundled copies in sync with docs/.

The README now points to desloppify setup as the primary install path, with update-skill as a per-project fallback.

Bug Fixes

Orphaned detector false positives — Files with __all__ exports are now recognized as intentional public API surfaces and excluded from orphan detection. Closes #496, reported by @Git-on-my-level.
hardcoded_secret_name false positives — Added entropy heuristic to filter non-secret values: field name constants (token_usage), sentinel strings, and label prefixes (agent_workspace@) are no longer flagged. Closes #496.
Cycles detector TYPE_CHECKING false positives — Imports inside if TYPE_CHECKING: blocks are now marked as deferred and excluded from cycle detection. From @Git-on-my-level's comment on #496.
Assessment crash on corrupted state — store_assessments now guards against raw int values in state with isinstance check before calling .get(). Closes #465, reported by @Vuk97.
fix_debug_logs negative-index corruption — Added lower-bound guard (start < 0) matching the pattern in all sibling fixers. Prevents silent wrong-line edits when entry["line"] is 0. Cherry-picked from PR #499 by @cpjet64.
.gitignore CLAUDE.md/AGENTS.md scope — Patterns now anchored to repo root (/CLAUDE.md) so subdirectory copies aren't ignored.

Refactoring & Internal

Review pipeline improvements — Added bias-to-action guidance: confirmed bugs get implemented immediately instead of deferred. Stage 3 now collects open questions for the maintainer instead of guessing. Hard rule: decisions are presented for approval before execution.

Community

Thanks to @Git-on-my-level for the detailed false-positive reports across three detectors (#496), @Vuk97 for reproducing the assessment crash (#465) and language detection bug (#466), and @cpjet64 for the fix_debug_logs guard (PR #499).

Contributors

Git-on-my-level, Vuk97, and cpjet64

Assets 2

21 Mar 03:54

peteromallet

v0.9.12

2d740c3

v0.9.12

This release adds the strategist triage role for trend-aware strategic oversight, a desloppify setup command for one-step global skill installation, and a cluster of fixes that finally get stale subjective reviews unstuck from queue ordering issues. It also includes community-contributed fixes for Python src-layout projects, Windows encoding, and Knip hangs.

149 files changed | 21 commits | 6,485 tests passing

Headline Feature

Strategist Triage Role

The triage pipeline gains a new "strategist" stage that acts as a CEO-level overseer for the cleanup cycle. Instead of blindly trusting agent-reported trends, the strategist cross-checks score and debt trends against computed trajectory data, overriding mismatches. It can create high-priority strategy:: work items that jump to the front of the queue, and it supports an explicit confirmation gate (like other triage stages) so humans can review strategic assessments before they take effect.

The underlying ScoreTrajectory now tracks all-time highs from full scan history (not just the 5-scan window), introduces a "recovering" trend for scores that are improving but haven't reached their previous peak, and detects cross-cycle regression where a plateau masks a post-reset decline.

Other Features

`desloppify setup` Command

New command: pip install desloppify && desloppify setup. Copies bundled skill definitions to ~/.claude/ and ~/.cursor/ without network access, so agents discover desloppify globally across all projects. Per-project installs remain via update-skill.

Subjective Anti-Gaming Policy Disabled

The integrity policy that zeroed dimension scores when they converged near the target was producing false positives — blind-packet subagent reviews have no way to anchor to target scores, so legitimate convergence was being penalized. The policy now passes assessments through unchanged. It can be re-enabled later if a better detection strategy emerges.

Bug Fixes

Stale subjective reviews blocked by queue state — Three related fixes ensure stale reviews are always detected and injected regardless of whether mechanical items remain in the queue. Previously, the live_planned_queue_empty guard blocked all reconciliation when objective items existed, leaving review dimensions perpetually stale.
Phase ordering for stale reviews — Stale subjective reviews now take priority over non-critical workflows (communicate score) and triage items. Pre-review workflows (deferred disposition, run scan, import scores) still jump ahead of everything.
Force-rescan stale review injection — Force-rescan now correctly injects stale reviews by bypassing both queue-empty guards and the cycle-just-completed deferral logic. Also adds _refresh_plan_start_baseline() that reseeds scores without clearing workflow sentinels.
Clusters not marked done — Clusters stayed active even when all their items were resolved, causing completed work to reappear after rescan. Now sets execution_status="done" on completion and sweeps active clusters during reconciliation.
Resolved items superseded from clusters — _supersede_dead_references was stripping resolved/fixed/wontfix items from clusters, making completed clusters appear incomplete. Now only supersedes items that are truly gone from state.
Python src-layout test coverage — resolve_import_spec now tries src/-prefixed candidates, and module name computation strips the src/ prefix. Fixes false "transitive_only" reports for PEP 621 src-layout projects. Cherry-picked from PR #489 by @AreboursTLS.
Python multi-line import regex — PY_IMPORT_RE now handles parenthesized imports (from pkg import (\n name, ...)), fixing the root cause of false transitive-only coverage reports.
Test coverage graph supplementation — Source parsing now always runs as a supplement to the import graph, catching submodule imports the graph resolves to __init__.py instead of the actual file.
Knip hang on missing dependency — Added stdin=subprocess.DEVNULL and --yes flag to prevent npx from blocking on interactive prompts when Knip is not a local dependency. Closes #494, reported by @goobsnake.
Windows UTF-8 encoding in review runner — Explicit encoding="utf-8", errors="replace" on all file reads in the review pipeline. Prevents charmap decode errors when Codex runners emit UTF-8 on Windows. Cherry-picked from PR #495 by @pietrondo.
Force-rescan queue-empty guard bypass — reconcile_plan() was double-guarded by queue emptiness, preventing stale review injection when any objective items remained.

Refactoring & Internal

Skill doc improvements — Tightened skill descriptions to reduce false activations on generic programming questions. Added explicit "run next after scan" instruction so agents follow the tool's workflow instead of interpreting scan output themselves.
Review pipeline results — Stage 1/2/3 assessments for PRs #495, #493, #489, #189 and issues #494-#490. Backfilled Stage 2 files for older items.
Setup command scope trim — Removed --local mode and global skill discovery integration from the initial implementation, keeping the command focused on a single responsibility.

Community

Thanks to @AreboursTLS for the Python src-layout test coverage fix (PR #489), @pietrondo for the Windows UTF-8 encoding fix (PR #495), and @goobsnake for reporting the Knip hang issue (#494).

Contributors

pietrondo, goobsnake, and AreboursTLS

Assets 2

19 Mar 00:18

peteromallet

v0.9.11

bc333c2

v0.9.11

This release adds the progression log — an append-only lifecycle event timeline that gives AI agents persistent memory across cycles — along with a batch of bug fixes from the community covering cross-platform issues, regex safety, serialization crashes, and triage quality.

62 files changed | 18 commits | 5,495 tests passing

Progression Log

The biggest addition in this release is .desloppify/progression.jsonl — an append-only event log that records lifecycle boundary events as they happen. Each line is a self-contained JSON object with a discriminated event_type, timestamps, scores, and a structured payload.

The problem this solves: desloppify's lifecycle is a loop (execute → scan → review → triage → execute), and until now there was no persistent record of what happened at each boundary. The scan history had the last 20 scans, the execution log had plan actions, and query.json was ephemeral. A CEO agent guiding triage had no way to answer "what improved since last cycle?" without reconstructing it from scattered sources.

The progression log records 7 boundary events:

scan_preflight — gate decision (allowed/blocked/bypassed) with queue state
scan_complete — scores, dimension deltas, scan diff, execution summary with resolved/skipped IDs, suppression metrics
postflight_scan_completed — scan marker flip for the current cycle
subjective_review_completed — reviewer observations: which dimensions were covered, evidence summaries, new review issue IDs and summaries, import provenance
triage_complete — strategy summary, cluster names and theses, verdict counts, organized/total counts
entered_planning_mode — phase transition into any planning phase, with trigger
execution_drain — queue drained via resolve or workflow resolve, with scores at drain

Key design decisions:

Timestamps as join keys — events carry enough summary for quick reads, but timestamps let a reader query state.json and the plan's execution_log for full detail. The progression log is a timeline index, not a data copy.
Idempotent marker triggers — events fire on mark_postflight_scan_completed() and mark_subjective_review_completed() returning True. These are idempotent per scan_count, so no double-fire.
Best-effort, never break parent — all hooks are wrapped in try/except. Advisory file locking with 2s timeout. On lock failure, appends without the lock and logs a warning.
Corruption-resilient reads — corrupt JSONL lines are skipped with a warning; the file is never erased. Periodic trim keeps it under 2000 lines.

Triage: Stop Obsessing Over Test Coverage

Multiple users reported the tool pushes test writing over actual code cleanup. The root cause: triage LLMs promote test_coverage clusters because they appear first (sorted by issue count) with no guidance to deprioritize.

Changes:

Triage prompt now explicitly says: clean up code quality BEFORE test coverage — writing tests for sloppy code locks in the slop
Added defer action type for auto-clusters (keeps in backlog for later cycles)
Scan coaching and catalog guidance reworded to "review coverage gaps" instead of "add tests"

Bug Fixes

Phantom cluster membership — action_step refs are traceability metadata, not membership. Merging them into issue_ids caused bare shorthand IDs from triage runners to become phantom cluster members that reappear after every reconcile cycle.
Update-skill duplicate detection — raises CommandError when skill content is already present but begin/end markers are missing, preventing silent duplicate appends.
Generic fixer crash — four autofix pipeline sites assumed FixResult entries always have a removed key. Generic fixers (e.g., eslint-warning) return {file, line} or {file, fixed} without it. Cherry-picked from PR #484 by @AugusteBalas.
JSON serialization crash — EcosystemFrameworkDetection dataclass instances leaked into review_cache via shared dict references, causing TypeError on state serialization. Adds a dataclass handler to json_default. Bug identified by @0-CYBERDYNE-SYSTEMS-0 in PR #486.
Synthetic ID deferred skip loop — synthetic queue IDs (workflow::*, triage::*) in the skipped dict caused phantom deferred-disposition items. Cherry-picked from PR #485 by @ryexLLC.
Dart regex catastrophic backtracking — the annotation sub-pattern had overlapping whitespace consumption between a character class and trailing \s*, wrapped in ()*, causing exponential backtracking. Possessive quantifiers prevent it. Cherry-picked from PR #477 by @AvoMandjian.
Framework cache state bloat — framework detection wrote dataclass objects into review_cache, which shares a dict reference with persisted state. New runtime_cache field separates ephemeral scan-scoped data from persisted review state. Cherry-picked from PR #483 by @maciej-trebacz.
Zone reclassification stale issues — when zone rules change (e.g., adding JS test patterns), existing open issues for reclassified files now auto-resolve instead of persisting forever. Bug identified by @claytona500 in PR #478.
macOS SSL for update-skill — uses certifi CA bundle instead of the system cert store, which on Homebrew Python often has no CA certificates. Closes #468, reported by @Vuk97.
Windows font fallbacks — scorecard image generator now has Consolas, Georgia, Segoe UI, and Arial as Windows fallbacks instead of falling through to Pillow's bitmap font.
Windows .exe process execution — _resolve_executable() no longer wraps .exe binaries in cmd /c, which caused \" escaping errors when prompts contain spaces. Closes #487, reported by @Dteyn.
Snyk false positive — renamed session_token placeholder to session_hmac in review JSON examples to avoid Snyk W007 credential-detection heuristic. Closes #473, reported by @mark-major.

R Code Smell Detectors

Contributed by @sims1253 (PR #450). Ten R-specific smell checks detecting common anti-patterns: setwd(), <<- global assignment, attach(), rm(list=ls()), debug leftovers, T/F ambiguity, 1:n() off-by-one risk, deprecated stringsAsFactors, and library() inside functions. The library_in_function check uses tree-sitter when available with a regex fallback.

Also extends the generic language framework with custom_phases support — any generic plugin can now inject language-specific detector phases without converting to a full plugin. 22 tests covering detection, false positive suppression, and edge cases.

Documentation

Work queue README — explains how items flow from scan to execution queue, why test_coverage dominates pre-triage ordering, that tier is display-only metadata, and the full sort order with filter chain
Main README process overview — new "How it works" section explaining the scan → score → review → triage → execute loop
Docs reorganization — internal documentation (DEVELOPMENT_PHILOSOPHY, QUEUE_LIFECYCLE, ci_plan) and release infrastructure moved from docs/ to dev/. Website separated as its own repo.

Community

Cherry-picks and bug reports from @AugusteBalas (generic fixer crash), @AvoMandjian (Dart regex DoS), @maciej-trebacz (framework cache bloat), @ryexLLC (synthetic ID loop), @0-CYBERDYNE-SYSTEMS-0 (dataclass serialization), @claytona500 (zone reclassification), @Vuk97 (macOS SSL), @Dteyn (Windows .exe execution), and @mark-major (Snyk false positive).

@sims1253 continues to drive R language support forward — this is the fifth R-focused PR across the last few releases, and the custom_phases framework extension benefits every language plugin.

Thank you all — every report and PR made this release better.

Contributors

maciej-trebacz, sims1253, and 8 other contributors

Assets 2

16 Mar 21:55

peteromallet

v0.9.10

7a835d4

v0.9.10

This release adds experimental Hermes Agent integration for fully autonomous cleanup loops, framework-aware detection with a full Next.js spec, SCSS language support, significant R language improvements, and a scan performance boost from detector prefetch + caching — alongside a batch of bug fixes from the community.

152 files changed | 54 commits | 5,466 tests passing

Hermes Agent Integration (Experimental)

We've been exploring what it looks like when a codebase health tool can actually drive an AI agent — not just generate reports, but orchestrate the entire cleanup loop autonomously. This release ships our first experimental integration with Hermes Agent.

The core idea: desloppify already knows what needs to be done (scan, triage, review, fix). Instead of printing instructions for a human, it can now tell the agent directly — switch to a cheap model for mechanical fixes, switch to an expensive one for architectural review, reset context between tasks, and keep the agent working via /autoreply, all without a human in the loop.

What the integration enables:

Autonomous review loops — desloppify orchestrates blind reviews via delegate_task subagents (up to 3 concurrent), no human needed
Model switching at phase boundaries — cheap models for execution, expensive for planning/review, switched automatically
Context management — automatic resets between tasks to keep the agent focused on long sessions
Lifecycle transitions — desloppify tells Hermes what to do next via the Control API

How to try it

This requires the Control API branch of Hermes (NousResearch/hermes-agent#1508), which hasn't been merged upstream yet. Without it, Hermes works as a normal harness but can't do autonomous model switching or self-prompting.

Copy-paste this into your agent (it handles both fresh installs and existing Hermes setups):

Install desloppify with Hermes Control API support. The Control API lives on a fork branch until NousResearch/hermes-agent#1508 merges upstream.

1. If Hermes is already installed (check with `which hermes`), switch it to the Control API branch:

   cd $(pip show hermes-agent 2>/dev/null | grep -i location | cut -d' ' -f2)/hermes-agent
   git remote add peteromallet https://github.com/peteromallet/hermes-agent.git 2>/dev/null || true
   git fetch peteromallet
   git checkout peteromallet/feat/control-api-autoreply
   pip install -e .

   IMPORTANT: If Hermes is currently running, tell the user they need to restart it with `HERMES_CONTROL_API=1 hermes` after this step — the new branch won't take effect until they do.

   If Hermes is NOT installed, clone fresh:

   git clone -b feat/control-api-autoreply https://github.com/peteromallet/hermes-agent.git
   cd hermes-agent
   pip install -e .

2. Install desloppify and set up the skill doc:

   pip install desloppify[full]
   cd /path/to/your/project
   desloppify update-skill hermes

3. Start Hermes with the Control API enabled:

   cd /path/to/your/project
   HERMES_CONTROL_API=1 hermes

4. In the Hermes session, run:

   Run desloppify scan, then follow its coaching output to clean up the codebase.

Desloppify will guide Hermes through the full lifecycle — scanning, triaging, blind reviews with subagents, and fixing. It switches models and resets context automatically at phase boundaries.

This is experimental and we're iterating fast. We'd love feedback on the approach, rough edges, and what you'd want to see next. If you try it, please open an issue — every report helps.

Framework-Aware Detection

Massive contribution from @MacHatter1 (PR #414). A new FrameworkSpec abstraction layer for framework-specific detection, shipping with a full Next.js spec that understands App Router conventions, server components, use client/use server directives, and Next.js-specific lint rules. This means dramatically fewer false positives when scanning Next.js projects — framework idioms are recognized, not flagged. The spec system is extensible, so adding support for other frameworks (Remix, SvelteKit, etc.) is now a matter of writing a spec, not changing the engine.

SCSS Language Plugin

Thanks to @klausagnoletti for adding SCSS/Sass support via stylelint integration (PR #428). Detects code smells, unused variables, and style issues in .scss and .sass files. @klausagnoletti has also submitted a follow-up PR (#452) with bug fixes, tests, and honest documentation — expected to land shortly after release.

Plugin Tests, Docs, and Ruby Improvements

@klausagnoletti also contributed across multiple language plugins:

Ruby plugin improvements (PR #462) — expanded exclusions, detect markers (Gemfile, Rakefile, .ruby-version, *.gemspec), default_src="lib", spec/ + test/ support, and 13 wiring tests. Also adds external_test_dirs and test_file_extensions params to the generic plugin framework.
JavaScript plugin tests + README (PR #458) — 12 sanity tests covering ESLint integration, command construction, fixer registration, and output parsing.
Python plugin README (PR #459) — user-facing documentation covering phases, requirements, and usage.

R Language Improvements

@sims1253 has been steadily building out R support and contributed four PRs to this release:

Jarl linter with autofix support (PR #425) — adds a fast R linter as an alternative to lintr
Shell quote escaping fix for lintr commands (PR #424) — prevents command injection on paths with special characters
Tree-sitter query improvements (PR #449) — captures anonymous functions in lapply/sapply calls and pkg::fn namespace imports
Factory Droid harness support (PR #451) — adds Droid as a new skill target, following the existing harness pattern exactly

Scan Performance: Detector Prefetch + Cache

Another big one from @MacHatter1 (PR #432). Cold and full scan times reduced significantly. Detectors now prefetch file contents and cache results across detection phases, avoiding redundant I/O. On large codebases this is a noticeable improvement.

Lifecycle & Triage

Lifecycle transition messages — the tool now tells agents what phase they're in and what to do next, with structured directives for each transition
Unified triage pipeline with step detail display
Staged triage now requires explicit decisions for auto-clusters before proceeding — no more accidentally skipping triage steps

Bug Fixes

Binding-aware unused import detection for JS/TS — @MacHatter1 (PR #433). No longer flags imports used via destructuring, as renames, or re-export patterns. This was a significant source of false positives in real JS/TS projects.
Rust dep graph hangs — @fluffypony (PR #429). String literals that look like import paths (e.g., "path/to/thing") no longer cause the dependency graph builder to hang. @fluffypony also contributed Rust inline-test filtering (PR #440), which prevents #[cfg(test)] diagnostic noise from inflating production debt scores.
Project root detection (PR #439) — fixed cases where the project root was derived incorrectly, plus force-rescan now properly wipes stale plan data, and manual clusters are visible in triage.
workflow::create-plan re-injection — @cdunda-perchwell (PR #435). Resolved workflow items no longer reappear in the execution queue after reconciliation. @cdunda-perchwell also identified the related communicate-score cycle-boundary sentinel issue (#447, fix in PR #448).
PHPStan parser fixes — @nickperkins (PR #420). stderr output and malformed JSON from PHPStan no longer crash the parser. Clean, focused fix.
Preserve plan_start_scores during force-rescan — manual clusters are no longer wiped when force-rescanning.
Import run project root — --scan-after-import now derives the project root correctly from the state file path.
Windows codex runner (PR #453) — proper cmd /c argument quoting + UTF-8 log encoding for Windows. Reported by @DenysAshikhin.
Scan after queue drain (PR #454) — score_display_mode now returns LIVE when queue is empty, fixing the UX contradiction where next says "run scan" but scan refuses. Reported by @kgelpes.
SKILL.md cleanup (PR #455) — removes unsupported allowed-tools frontmatter, fixes batch naming inconsistency (.raw.txt not .json), adds pip fallback alongside uvx. Three issues all reported by @willfrey.
Batch retry coverage gate (PR #456) — partial retries now bypass the full-coverage requirement instead of being rejected. Reported by @imetandy.
R anonymous function extraction (PR #461) — the tree-sitter anonymous function pattern from PR #449 now actually works (extractor handles missing @name capture with <anonymous> fallback).

Community

This release wouldn't exist without the community. Seriously — thank you all.

@MacHatter1 delivered three major PRs (framework-aware detection, detector prefetch + cache, binding-aware unused imports) that each individually would have been a headline feature. The framework spec system in particular opens up a whole new category of detection accuracy.

@fluffypony contributed both the Rust dep graph hang fix and the inline-test filtering — the latter being 1,000+ lines of carefully tested Rust syntax parsing with conservative cfg predicate handling and thorough edge-case coverage.

@sims1253 has been the driving force behind R language support, with four PRs spanning linting, tree-sitter queries, and harness support. The R plugin is becoming genuinely useful thanks to this sustained effort.

@klausagnoletti added SCSS support, imp...

Contributors

nickperkins, fluffypony, and 8 other contributors

Assets 2

13 Mar 19:47

peteromallet

v0.9.9

9c233a4

v0.9.9

This release focuses on plan lifecycle robustness — fixing workflow deadlocks, auto-resolving stale issues, hardening the reconciliation pipeline, and replacing heuristics with explicit cluster semantics. It also includes C++ detector scoping improvements from a community contributor and several UX fixes that prevent agents from getting stuck mid-cycle.

366 files changed | 16 commits | 5,367 tests passing

Refactoring & Internal Cleanup

This release continues the pattern of tightening seams and reducing indirection across the codebase. Over half the 366 changed files are internal restructuring:

Cluster and override → subpackages — cluster_ops_display.py, cluster_ops_manage.py, cluster_ops_reorder.py, cluster_update.py, and cluster_steps.py moved into a cluster/ subpackage. Same treatment for override_io.py, override_misc.py, override_skip.py, and override_resolve_* into override/.
Holistic cluster accessors inlined — ~8 small wrapper files in context_holistic/ deleted (_clusters_complexity.py, _clusters_consistency.py, _clusters_dependency.py, _clusters_security.py, etc.) and inlined into their callers
Plan sync pipeline extracted — new sync/pipeline.py and sync/phase_cleanup.py pulled out of the monolithic workflow, with reconcile.py renamed to scan_issue_reconcile.py and review import reconcile moved into sync/review_import.py
Issue semantics centralized — new issue_semantics.py (~225 lines) consolidating classification logic that was previously scattered across multiple modules
Plan reconcile simplified — scan/plan_reconcile.py cut from ~470 lines to ~200 by extracting shared logic into the engine layer
Work queue snapshot overhaul — snapshot.py gained ~470 lines of phase-aware partitioning and ranking refinements, replacing ad-hoc ordering logic
TS dead code removed — helpers_blocks.py and helpers_line_state.py deleted (~200 lines of unused smell detection helpers)
Broad type/schema updates — issue type references and state schema types updated across 130+ files for consistency with the new issue semantics

Auto-Resolve Issues for Deleted Files

When a scan runs and a previously-flagged file no longer exists on disk, its open issues are now automatically set to auto_resolved with a clear note. Previously, issues for deleted files would remain open and pollute the work queue indefinitely — particularly painful in Rust projects where module reorganization is common. Closes #412.

Triage Deadlock Fix

Fixed a deadlock where triage was stale (new review issues arrived mid-cycle), but triage couldn't start because objective backlog was still open, and objective resolves were blocked because triage was stale. The fix detects this "pending behind objective backlog" state and allows objective work to continue while keeping review resolves gated. The banner now shows TRIAGE PENDING instead of nudging toward a triage command that can't run yet. Community contribution from @imetandy (#413).

Batch Runner Stall Detection Fix

The review batch runner's stall detector was prematurely killing codex batches during their initialization phase — before any output file was written. This caused --import-run to fail with "missing result files for batches" errors. The stall detector now never declares a stall when no output file exists yet, while the hard timeout still catches truly hung batches. Closes #417 and #401.

Sequential Reconciliation Pipeline

Fixes a cluster tracker race condition on parallel updates. A new shared reconciliation pipeline runs all sync steps sequentially: subjective dimensions, auto-clustering, score communication, plan creation, triage, and lifecycle phase. This replaces the previous approach where parallel operations could produce inconsistent plan state.

Explicit Cluster Semantics

Clusters now carry explicit action_type (auto_fix, refactor, manual_fix, reorganize) and execution_policy (ephemeral_autopromote, planned_only) rather than relying on command-string sniffing. A new cluster_semantics.py module provides canonical semantic helpers, and the work queue uses these for phase-aware ordering instead of inferring intent from command strings.

C++ Detector Scoping Improvements

Three targeted fixes to the C++ plugin, contributed by @Dragoy (#415):

Security findings scoped to first-party files — clang-tidy and cppcheck findings from vendor/external headers are now filtered out instead of being reported as project issues
CMake-based test coverage mapping — CMakeLists.txt files are parsed for add_executable/add_library/target_sources to discover which source files a test target compiles, treating that as direct test coverage
Unused-imports phase disabled for C++ — the generic tree-sitter unused-import detector is unsound for #include semantics and now skips C++ projects
Header extension support — _extract_import_name now handles .h, .hh, .hpp extensions correctly

Flexible Triage Attestations

Triage attestation validation for organize, enrich, and sense-check stages no longer requires literal cluster name references. Users can now provide substantive work-product descriptions as an alternative, making the triage workflow less rigid for both human and AI operators.

Triage Validation & Sense-Check Enhancements

Sense-check stage gets a dedicated orchestrator with expanded prompts and evidence parsing
Triage completion policy significantly enhanced with richer stage validation
Stage prompt instruction blocks expanded for clearer agent guidance
Evidence parsing extracted into a dedicated module

Other Improvements

.gitignore reminder added to README setup instructions (#416)
PyPI publish workflow push triggers restored while maintaining the main-branch gate
Tweet release tests now properly stub the requests module for CI isolation

Community

Thanks to @imetandy for the triage deadlock fix and @Dragoy for the C++ detector scoping improvements. Issues and feedback from @guillaumejay, @wuurrd, @astappiev, @efstathiosntonas, @xliry, @kendonB, @WojciechBednarski, and @jakob1379 helped shape this release.

Contributors

wuurrd, efstathiosntonas, and 8 other contributors

Assets 2

12 Mar 22:25

peteromallet

v0.9.8

0c02a70

v0.9.8

This release adds full C++ and Rust language plugins, introduces two-phase review scoring, unified issue lifecycle status, anti-gaming safeguards, and delivers extensive triage validation, work queue, and cross-platform improvements — alongside continued code quality cleanup that removes 23 compat wrappers and tightens seams throughout the codebase.

609 files changed | 78 commits | 5,266 tests passing

C++ Language Support

C++ is now a full-depth language plugin with tree-sitter-based extraction, structural analysis, and tool-backed security scanning. The plugin includes:

Function/class/include extraction from C++ source files
Dependency graph analysis via #include graphs from compile_commands.json and Makefile projects
Structural and coupling phases with cppcheck integration and batch issue scanning
Security detection with normalized findings from cppcheck and clang-tidy
Review surfaces, test coverage hooks, and move support
14 test files with fixtures for CMake and Makefile sample projects

Rust Language Support

Rust gains full plugin parity with 13 Rust-specific detectors, 3 auto-fixers, and deep cargo toolchain integration:

13 detectors across 6 modules: API surface, cargo policy, safety, smells, dependencies, and custom rules
3 auto-fixers: crate imports, cargo features, readme doctests
Cargo tool integration: clippy, cargo check, rustdoc
Rust-aware dependency graphing and test coverage mapping with inline #[cfg(test)] recognition
117 tests across 12 test files

Two-Phase Review Scoring

Holistic review is restructured into two distinct phases:

Phase 1 — Observe: collect characteristics and defects without scoring
Phase 2 — Judge: synthesize dimension character from observations, then score

Positive observations now persist as context insights with full provenance (added_at, source, positive: true), replacing ephemeral strengths. A new context_schema.json defines the review data framework.

Unified Issue Lifecycle Status

DEFERRED and TRIAGED_OUT are added to the Status enum so state is always authoritative for issue disposition. Previously, temporary and triaged-out skips left issue.status as "open", causing overcounting in plan rendering and queue surfaces. Includes status migration on scan reconcile, new status icons, updated plan rendering, and surfaces history to reviewers with --retrospective True by default.

Anti-Gaming Safeguards

Two targeted fixes prevent AI agent score-anchoring:

Numeric target redacted from penalty messages — "matched target 95.0" replaced with "clustered on the scoring target" so agents cannot infer and anchor on the exact number
Blind-review workflow surfaced in the very first penalty message (previously only after streak ≥ 2), pointing agents to the blind packet and overlay docs immediately

Triage Validation Overhaul

Reflect dispositions are now binding for organize — a structured ReflectDisposition dataclass is parsed from the Coverage Ledger; organize validates that plan state matches every reflected disposition before submission
Organize validation extracted into dedicated organize_policy.py, separated from batch context normalization
Suggestion and evidence surfaced in show and cluster commands
Completion flow, observe batches, and stage queue gained new focused submodules

Work Queue Decomposition

The monolithic _work_queue/core.py and lifecycle.py are split into 5 focused modules (models.py, inputs.py, selection.py, finalize.py, snapshot.py). The engine/plan_queue.py facade is deleted. A canonical QueueSnapshot provides phase-aware partitioning.

Cross-Platform Hardening

Cross-platform state locking: fcntl on Unix, msvcrt on Windows for atomic state file persistence
Windows tool argv parsing: generic tool commands now execute correctly on Windows
Windows WinError 2 fix: codex exec spawning uses shutil.which() to resolve .cmd batch shims

Code Quality & Cleanup

23 compat wrapper files deleted — 13 _framework/ wrappers, 8 context_holistic/ wrappers, 2 helpers/ wrappers — plus removal of SimpleNamespace fake-module antipatterns
Go generated files now skipped during scanning, with improved import-run error messages
Scan export and scoring impact crash paths fixed
Rust workspace rustdoc execution repaired
Extensive seam-tightening: holistic review prep, triage validation, scan sync workflow, language framework surface, tree-sitter runtime caches, and reporting/planning helpers all streamlined

Assets 2

11 Mar 13:20

peteromallet

v0.9.5

e5f90c0

v0.9.5

This release adds Julia language support, extends the tree-sitter framework, rebalances the health score toward subjective dimensions (now 75/25), and delivers a broad set of improvements to stability, triage process reliability, security hardening, and platform-specific fixes — alongside significant code quality cleanups that reduce indirection and remove over-extractions.

354+ files changed | 80+ commits | 5,022 tests passing

Julia Language Support

Julia is now a supported language. The initial plugin skeleton includes tree-sitter-based parsing and import resolution, following the same framework as the existing Python and TypeScript plugins.

Tree-sitter Framework Extensions

The tree-sitter framework gains functional specs and functional import resolvers — new extension points that language plugins can use to define analysis rules declaratively. These underpin the Julia plugin and will simplify future language additions.

Reviewer Finding Adjudication

Review subagents now receive structured access to judgment-required findings during batch construction. Per-detector finding counts are embedded in batches, concern signals carry fingerprints and finding IDs, and the CLI renders exploration commands (desloppify show <det> --no-budget). The concern signal cap is raised from 8 to 30 with overflow guidance, and the dismiss path is simplified to 2 fields.

Scoring Rebalance

Scoring weights shift to 75% subjective / 25% mechanical. Subjective design quality — review dimensions like naming, cohesion, and abstraction quality — is now the primary driver of the health score. All docs, reporting strings, tests, and snapshots updated to match.

Judgment-Required Detectors Excluded from Auto-Clustering

Detectors marked needs_judgment (e.g., structural, dict_keys, smells, responsibility_cohesion) now return None from the clustering grouping key. This means they flow through the review process as mechanical evidence rather than being auto-grouped into plan tasks. The cluster strategy is simplified: special-case grouping by file, subtype, and detector for judgment-required issues has been removed entirely, along with unused parameters in generate_description.

Living Plan & Queue Overhaul

The next command now follows the living plan directly rather than computing priorities independently. The queue system was substantially rearchitected:

Execution vs. backlog queues are now separate surfaces — next pulls from the execution queue (current cluster work), while the backlog queue shows what's upcoming.
Queue lifecycle phases are explicitly persisted, giving clear visibility into where you are in the scan → triage → execute flow.
Scan is a first-class postflight phase — the queue system recognizes and tracks post-scan state transitions.
Queue output is labeled by surface so it's always clear which queue you're looking at.
Prompts and coaching text aligned with the new execution queue semantics.

Triage & Review Improvements

Triage dashboard fix: after completing triage, the dashboard no longer incorrectly shows "start with observe" restart guidance. Root cause was inferring status from empty triage_stages dict instead of checking triaged_ids.
No-op triage completion allowed for empty review batches — completing triage without findings no longer errors.
Staged triage flow consolidated — routing, validation, and state contracts tightened across the triage pipeline.
Review rerun preflight scoped — rerun checks are now properly gated.
Review dimension metadata made monkeypatchable — enables test and plugin customization.
Stale wontfix review tails are now cleared on completion.

State Recovery & Resilience

Triage state recovery from saved plans — if triage state is lost (e.g., from a crash), it can be reconstructed from the persisted plan.
State recovery from saved plans with deduplication of update-skill operations.
Plan recovery consolidated on scan_metadata, dropping the _saved_plan_recovery marker.
scan_metadata schema simplified — inventory_available/metrics_available are now derived from scan source rather than stored as separate bools.
Stale cluster focus cleared on completion and on skip/cluster mutations.

Security Hardening

Subprocess command paths hardened for detectors — command resolution tightened.
Controlled subprocess security seams tightened — removed # nosec comment noise and the copy-pasted _resolve_cli_executable helper that existed in 8 files.
Source security findings hardened in detector outputs.
Silent excepts removed from treesitter resolvers — errors are no longer swallowed.
Defer policy key overwrites removed — policy values are now immutable once set.

Platform Fixes

Windows WinError 2 fix (#383): codex subprocess spawning now uses shutil.which() to resolve .cmd batch shims on Windows. Also recognizes [WinError 2] in runner failure detection.
Monorepo path validation fix (#387): the _PATH_RE regex in triage enrich was anchored on src/ and discarded leading path components like packages/backend/, causing false-positive path failures in monorepos.

Code Quality & Cleanup

A major theme of this release is reversing over-extractions and reducing indirection. Several rounds of mechanical function splits were reverted:

~800+ net lines removed across 4 revert/inline commits — single-use helpers inlined back into their callers across triage stages, cluster display, queue flow, render, dispatch, and batch orchestration.
Triage stage commands (observe/reflect/organize) now read top-to-bottom as linear validation chains with early returns.
QueueRenderContext dataclass removed — explicit kwargs proved more readable.
review_quality unified as the single canonical key (killed dual-key handling).
Import score provenance metadata reduced from 10 fields to 4.
Scan workflow imports tightened — direct imports from planning.scan instead of going through the package namespace.
Remaining split-driver plan items bulk-skipped to prevent recurrence.

Tree-sitter Cleanup

Import cycle broken in tree-sitter spec modules.
Unused bridge modules deleted.
Compatibility bridges centralized into a single location.

Contract & Registry Alignment

Registry and subjective contracts aligned.
Plugin scaffold contracts updated.
TypeScript command surfaces normalized.
Command registry annotations tightened.
Schema drift payload builders normalized.

Test Improvements

Over-mocked flow tests replaced with direct coverage.
Direct coverage strengthened for import flows.
New scan orchestration test verifying the planning scan surface integration.
New review import support test helpers for plan sync runtime patching.
Batch triage test helpers refreshed.

Assets 2

Releases: peteromallet/desloppify

v0.9.14

Headline Feature

Lifecycle Consolidation

Other Features

Score Checkpoint with Sparkline

Simplified User-Facing Lifecycle

Bug Fixes

Refactoring & Internal

Community

Contributors

Uh oh!

v0.9.13

Headline Feature

Bundled Skill Overlays with desloppify setup

Bug Fixes

Refactoring & Internal

Community

Contributors

Uh oh!

v0.9.12

Headline Feature

Strategist Triage Role

Other Features

desloppify setup Command

Subjective Anti-Gaming Policy Disabled

Bug Fixes

Refactoring & Internal

Community

Contributors

Uh oh!

v0.9.11

Progression Log

Triage: Stop Obsessing Over Test Coverage

Bug Fixes

R Code Smell Detectors

Documentation

Community

Contributors

Uh oh!

v0.9.10

Hermes Agent Integration (Experimental)

How to try it

Framework-Aware Detection

SCSS Language Plugin

Plugin Tests, Docs, and Ruby Improvements

R Language Improvements

Scan Performance: Detector Prefetch + Cache

Lifecycle & Triage

Bug Fixes

Community

Contributors

Uh oh!

v0.9.9

Refactoring & Internal Cleanup

Auto-Resolve Issues for Deleted Files

Triage Deadlock Fix

Batch Runner Stall Detection Fix

Sequential Reconciliation Pipeline

Explicit Cluster Semantics

C++ Detector Scoping Improvements

Flexible Triage Attestations

Triage Validation & Sense-Check Enhancements

Other Improvements

Community

Contributors

Uh oh!

v0.9.8

C++ Language Support

Rust Language Support

Two-Phase Review Scoring

Unified Issue Lifecycle Status

Anti-Gaming Safeguards

Triage Validation Overhaul

Work Queue Decomposition

Cross-Platform Hardening

Code Quality & Cleanup

Uh oh!

v0.9.5

Julia Language Support

Bundled Skill Overlays with `desloppify setup`

`desloppify setup` Command