feat(install): add opt-in --auto-embed-hook flag for auto-refresh embeddings by Minidoracat · Pull Request #349 · tirth8205/code-review-graph

Minidoracat · 2026-04-19T17:21:49Z

Summary

Opt-in code-review-graph install --auto-embed-hook flag that writes a PostToolUse hook into .claude/settings.json (and .qoder/settings.json) so body-embedding data stays fresh between manual embed runs. Counterpart --no-auto-embed-hook performs a pure removal without touching any other install state. Default is OFF — passing no flag keeps today's behaviour JSON-struct-identical.

Stacked on #345 — do NOT merge until #345 lands

This PR depends on the code-review-graph embed CLI subcommand introduced by PR #345 (feat/body-embedding). Branch base is origin/feat/body-embedding on my fork. Once #345 merges into main, I'll rebase onto upstream main and re-run CI.

Why

#345 ships the embed CLI but install doesn't wire up any hook to refresh embeddings after edits. Users had to hand-craft a PostToolUse entry themselves — easy to get wrong (invalid schema fields, stdout leaks into MCP stdio, no cost gate against cloud providers). This PR bakes the correct shape into install.

Follow-up #1 in PR #345's description is exactly this feature.

ADR highlights (full decision table in the plan)

Opt-in via explicit flag, not env var / interactive prompt — matches the existing --no-skills / --no-hooks / --no-instructions flag family.
Hook schema: async: true (Claude Code Jan-2026 schema), NOT run_in_background (which does not exist in the schema). shell: \"bash\" implied; PowerShell deferred. Docs: https://code.claude.com/docs/en/hooks
Matcher: Edit|Write|MultiEdit. Intentionally diverges from the existing update hook's Edit|Write|Bash — Bash tool calls don't touch code, and the chained command re-runs update anyway.
Chaining: one command runs update --skip-flows && embed in sequence. Second update is a hash-dedup no-op on unchanged files but eliminates a cross-process race on graph.db between the standalone update hook and the new embed hook.
Platform scope: claude and qoder only (same as existing install_hooks). Other platforms trigger a clear stderr warning and skip.
Cost safety: CRG_EMBED_INCLUDE_BODY=0 is prefixed to each CLI invocation so the hook NEVER auto-triggers body-mode re-embeds. Body-mode users must opt in manually via embed --include-body --confirm-reembed; the hook resets the sticky flag back to 0 on each fire (documented in all 5 READMEs as intentional friction).
MCP stdio safety: >/dev/null 2>&1 on every CLI invocation + trailing || true. Protects the JSON-RPC stream from corrupting stdout/stderr leakage.
Opt-out: --no-auto-embed-hook short-circuits to a pure removal path. Does NOT re-run install_platform_configs / .gitignore / skill generation / instruction injection / git-hook install. Preserves any user customisation of other PostToolUse entries.

Shell-quoting security

The hook command embeds the repo path. Initial draft used json.dumps(path) for quoting, but bash still expands \$() and backticks inside double-quoted strings. After Codex round 1 flagged this as BLOCKING, switched to shlex.quote(...) which wraps paths containing shell metacharacters in single quotes that bash does not expand. Regression tests cover \$(...) and backtick paths.

Files

code_review_graph/skills.py: new generate_auto_embed_hook_entry(repo_root), new remove_auto_embed_hook_entry(repo_root, platform), modified install_hooks(repo_root, platform, include_auto_embed=False). Malformed-JSON handling now backs up before parsing and raises RuntimeError instead of silently overwriting.
code_review_graph/cli.py: mutex flag group on install and init; new _handle_init pre-flight warnings (gated on want_auto_embed and not skip_hooks and not dry_run); early return for the removal path.
tests/test_skills.py: 3 new test classes (TestGenerateAutoEmbedHookEntry, TestInstallHooksAutoEmbed, TestRemoveAutoEmbedHook) covering schema conformance, chaining order, stdio redirects, env-var placement via regex, path-quoting (including \$() and backticks), exact-dict dedup, 4 malformed-JSON defensive paths, and JSON-struct deep equality with pre-change behaviour.
tests/test_cli.py: 12 new CLI tests covering argparse, dry-run, warnings (extras missing + unsupported platform), short-circuit semantics for --no-auto-embed-hook, and warning-gating regressions from Codex round 1.
tests/test_integration_install_auto_embed.py: new file, 2 subprocess integration tests for full install and install-then-remove round-trip.
5 README files (EN / zh-CN / ja-JP / ko-KR / hi-IN): CLI reference rows + feature/limitations/sticky-flag-flip-flop paragraph with native phrasing.

Known limitations

POSIX shells only. The hook command uses &&, || true, and >/dev/null 2>&1 which work in bash / zsh / sh and via WSL or git-bash on Windows. Native Windows PowerShell is deferred to a follow-up; the schema supports \"shell\": \"powershell\" but the command string would need translation.
PATH inheritance. The hook uses the bare code-review-graph command, inheriting the PATH resolution precedent of the existing update hook. If Claude Code forks the hook without ~/.local/bin (pipx) or the uv tool path, the hook will silently no-op — the same failure mode as the existing update hook. Can be addressed via _detect_serve_command()-style resolution in a follow-up.
Sticky-flag flip-flop. Users who opted into body-mode embedding via embed --include-body --confirm-reembed will have that sticky flag reset to OFF on every hook fire. Documented in all 5 READMEs. Re-opt-in manually when body-mode is desired.
Burst-edit lock contention. 20+ rapid edits create 2× as many update spawns on graph.db; WAL + 30s SQLite timeout absorb it but may produce noticeable slowdown during bulk refactors. Debouncing deferred to v1.1.

Test plan

uv run pytest tests/test_skills.py tests/test_cli.py tests/test_integration_install_auto_embed.py -v — 120 passing
uv run ruff check code_review_graph/ — no new errors on my diff (12 pre-existing lint issues on base branch, unchanged)
uv run mypy code_review_graph/skills.py code_review_graph/cli.py --ignore-missing-imports --no-strict-optional — clean on my files
uv run bandit -r code_review_graph/skills.py code_review_graph/cli.py — 0 high-severity issues
Real e2e in /tmp/sandbox-repo: install → build → fire hook → verify embeddings row in graph.db → --no-auto-embed-hook round-trip → verify update entry survives and embed entry gone
Codex adversarial review 2 rounds → NO SUBSTANTIVE ISSUES on round 2
CI matrix (3.10 / 3.11 / 3.12 / 3.13) — pending PR runner

Follow-ups (explicitly NOT in this PR, tracked for later)

One-shot code-review-graph enable-body-embed --provider X --model Y that writes env into .mcp.json, flips the sticky flag, runs the first embed --include-body --confirm-reembed, and installs this hook — end-to-end in one command (from feat(embeddings): include function body in semantic search text #345's follow-up list).
Adaptive body-char budget (from feat(embeddings): include function body in semantic search text #345's follow-up list).
Windows native PowerShell branch of the hook command.
Absolute-PATH resolution for the hook command (share _detect_serve_command with install_platform_configs).

🤖 Generated with Claude Code

Why: signature-only embeddings carry ~10 tokens per node (name + kind + params + return type + language). That ceiling wastes the long-context capacity of modern embedding models — Gemini 2, Qwen3-8B and similar rank SOTA on MTEB-code largely on their ability to understand the *body* of a function, but we were handing them only its name. So the quality advantage of large-context embeddings against local MiniLM was getting flattened by our own input, not by model capability. What: `_node_to_text()` now appends a truncated body snippet after the signature metadata, with the signature line and leading docstrings / header comments stripped so the budget goes to real logic instead of documentation. Four design choices worth flagging: 1. Per-provider char budget (ADR-B4): local=700 / google=3000 / minimax=3000 / openai=6000, env `CRG_EMBED_BODY_MAX_CHARS` overrides. This gives MiniLM just enough to not silently truncate and gives the long-context models room to actually use their capacity. 2. Sticky-flag opt-in (ADR-C3): new `embeddings_meta` table on `embeddings.db` persists `embed_body_enabled` per-DB. Fresh repos auto-enable; existing DBs hold legacy behavior until the user runs `code-review-graph embed --include-body --confirm-reembed`. This avoids surprising cloud-provider users with a one-time re-embed API bill on upgrade, and the sticky flag keeps new nodes consistent with older ones across incremental runs (no mixed body-on / body-off state). 3. Two-stage IO-bounded filter (ADR-filter-short-circuit): stored hash is `meta_hash:body_hash:file_hash`. Stage 1 computes the metadata hash only (zero file IO) and skips when metadata + file_hash are unchanged. Stage 2 reads the file only when stage 1 can't prove freshness, recomputes the body hash, persists the combined form. Empty extracted bodies get sha256("") as a sentinel so they also short-circuit next time; read failures deliberately do NOT persist a sentinel so transient errors stay retryable. 4. Language-agnostic body extraction with block-comment + triple- quote state machine and conservative signature dedup via three- AND gate (node.name + declaration keyword + param first-token). Unknown languages fall back to "don't dedup" rather than risk dropping real body. Coverage today: Python / JS / TS / JSX / TSX / Java / Kotlin / Rust / Go; other languages keep the signature line and just eat a bit of their body budget. Backward compat: legacy rows (pre-existing sha256 hex in `embeddings.text_hash`, no `:`) and Iter-2 rows (`meta:body`, no file hash) are both parsed correctly by `_split_combined_hash` and trigger a single safe re-embed on first contact. Security: `_FileLineCache` resolves paths and enforces a `relative_to` containment check against the repo root, so a poisoned `nodes.file_path` can't read `/etc/passwd` or traverse out of the repo into cloud embedding calls. Verification: regression MRR@3 on a 10-query golden set (bundled mini repo, local MiniLM-L6-v2) goes from 0.9500 to 1.0000; body-dependent queries like "escape HTML entities including quotes and backticks" or "read GEMINI_API_KEY from the environment" move the correct node into top-3 for the first time because the body now contains the distinguishing tokens. Tested against real OpenAI-compatible endpoint too (http://127.0.0.1:3000/v1 + text-embedding-v4): 28 nodes embedded on first run, 0 on a repeat with sticky flag honored — filter IO short-circuit verified end-to-end. Tests: 59 new cases (file-line cache LRU + containment, signature dedup with decorator / recursive / params mismatch / unknown lang, block-comment and triple-quote state machine with JSDoc / Javadoc / Python single-line and multi-line docstrings / long comments, per-provider budget resolution + env override, sticky flag across runs including CLI flip, filter short-circuit with and without populated file_hash, read-failure must-retry regression, legacy hash backward compat). MRR regression test skipped gracefully when sentence-transformers isn't installed (CI matrix with `.[dev]` only stays green; CI jobs that install `.[embeddings]` activate the gate). Scope: all four providers (Local / Google / MiniMax / OpenAI- compatible) benefit transparently; no schema migration on graph.db, no change to `EmbeddingStore.__init__` signature, no change to parser.py / migrations.py / search.py / tools.py. Adds one new subcommand `embed` and a private `embeddings_meta` table on `embeddings.db`.

Why: Codex round 7 caught that the legacy inference ``db_path.parent -> repo root`` silently breaks for the supported ``CRG_DATA_DIR`` layout (graph.db lives in an external cache dir, not inside ``<repo>/.code-review-graph/``). In that case ``_FileLineCache`` received the cache dir as repo_root and every body-snippet read failed the containment check, making body enrichment a no-op for anyone who uses ``CRG_DATA_DIR``. The two- stage filter then wrote metadata-only hashes, so even opting back in via ``embed --include-body --confirm-reembed`` wouldn't help. What: ``_repo_root_guess`` now: 1. keeps the fast path when db_path parent is literally ``.code-review-graph`` (zero-config common case, no cost) 2. otherwise delegates to ``incremental.find_project_root`` which already honors ``CRG_REPO_ROOT`` and walks up for a ``.git`` marker — the same resolution order the rest of the CLI uses 3. returns None only when both fail, letting callers treat absolute file_path values as self-locating Regression test: ``TestRepoRootGuessExternalDataDir`` covers the legacy layout, the env-override fallback, and an end-to-end read through ``_FileLineCache`` from an external cache dir. Existing ``TestFilterStageShortCircuit`` fixtures now write their DB under ``<tmp_path>/.code-review-graph/`` so they hit the fast path rather than the cwd-relative fallback.

Why: Codex round 8 found the round-7 fast path still misfires when ``CRG_DATA_DIR`` itself ends with ``.code-review-graph`` — e.g. ``CRG_DATA_DIR=/var/tmp/.code-review-graph`` or ``/shared/cache/<repo>/.code-review-graph``. In that case ``db_path.parent.name == ".code-review-graph"`` is True, the fast path returns ``parent.parent`` (``/var/tmp`` or the shared-cache parent), and ``_FileLineCache`` silently falls back to metadata for every node. My round-7 tests only covered an external dir NOT named ``.code-review-graph`` so the regression slipped. What: ``_repo_root_guess`` now consults ``CRG_DATA_DIR`` first. When that env var is set we always defer to ``incremental.find_project_root`` regardless of what ``db_path`` looks like; the zero-config fast path only runs when the user is on the default ``<repo>/.code-review-graph/graph.db`` layout and has NOT overridden the data dir. Regression test: ``test_crg_data_dir_named_dot_code_review_graph`` exercises the misfire case directly — ``CRG_DATA_DIR`` pointed at a ``/var-tmp/.code-review-graph`` cache with ``CRG_REPO_ROOT`` pointed at a distant ``real-repo`` tree — and asserts the root resolves to ``real-repo``, not the cache's parent.

Why: cross-repo benchmark showed body enrichment regresses on framework-heavy codebases (humanitzbot — TypeScript Discord bot — MiniLM EN went from 0.375 → 0.194 MRR@3, -0.18). Diagnosis found that signature dedup + block-comment state machine are all working, but TS/Discord function bodies are thick with framework boilerplate that dilutes the retrieval signal. Examples from one regressing function: await interaction.deferReply(); const locale = interaction.locale; logger.info('handling power request', { signal }); console.log('...'); On small models (MiniLM: 256-token window), those lines fill the budget with `await`, `deferReply`, `locale`, `logger`, `console` — high-frequency framework tokens that don't discriminate between similarly-named commands (``_power`` vs ``_status`` vs ``_console``). What: add a language-agnostic check to the existing body-extraction state machine that skips lines which ARE purely a log/print call with no other executable content. Covers: * Python: ``print(...)``, ``logger.info(...)``, ``logging.debug(...)`` * JS/TS: ``console.log(...)``, ``logger.warn(...)`` * Java: ``System.out.println(...)``, ``log.debug(...)`` * Go: ``fmt.Println(...)``, ``log.Printf(...)`` * Rust: ``println!(...)``, ``eprintln!(...)``, ``dbg!(...)`` Lines where a log call is mixed with substantive code (``if err: logger.error(err); raise``) are preserved — they DO carry retrieval signal. Scope: one regex + one helper + one extra branch in the existing state machine loop. Stays within ADR-A1 "language-agnostic body extraction". Does NOT introduce framework-specific heuristics — log/print is a universal pattern, not a framework one. Also does NOT make the body hash non-deterministic (same input still produces same output), so the combined-hash filter short-circuit (AC-13) still holds. Tests: 9 new unit tests covering Python / JS / TS / Java / Go / Rust log patterns, an integration test verifying scattered log lines get stripped from a multi-statement TS body while real logic survives, plus negative tests for mixed-statement lines and non-log calls (``await interaction.deferReply()`` is NOT a log call and stays kept). Follow-up: framework-specific SDK boilerplate (Discord ``interaction.X``, React hooks, Spring ``@Autowired``) is where the next round of improvement lives, but that can't be done generically without either a tokenizer-based adaptive budget or per-language SDK awareness — both are out of scope for this PR.

Round 10 flagged two real over-matches in 7daf93c: 1. The generic ``identifier.(error|warn|info|...)`` branch swallowed non-logger APIs. In THIS repo it dropped ``serve_cmd.error(...)`` (argparse's error method, which raises rather than logs). Same class of issue for ``fmt.Fprintf(w, "ok")`` where ``w`` is an HTTP response writer — that IS the function's output, not a log. 2. ``re.match`` is prefix-anchored, so ``logger.info('x') or do_work()`` passed the log-only check even though it has substantive trailing code. Violated the "mixed- statement lines stay kept" design promise. Fixes: * Receiver identifier is now an explicit whitelist — ``log``, ``logger``, ``logging``, ``logs``, ``console``, ``slog``, ``trace``, ``LOG``/``Logger``/``LOGGER`` (Java casing), ``self.log``/``this.log``/``_log``/``_logger`` and the ``self.logger``/``this.logger`` variants. Custom loggers named differently fall through — they'll be kept as real code. Better to under-skip than mis-skip. * Dropped ``fmt.Fprint*`` from the Go branch — those take an arbitrary ``io.Writer`` (commonly HTTP responses). Only ``fmt.Print/Println/Printf`` remain (stdout-ish = print-like). * ``_is_log_only_line`` now walks the matched string after the opening ``(``, tracking string literals and paren depth, to find the balanced close. Anything non-trivial after that ``)`` means the line has other executable content and is NOT log-only. Regression tests: tail-continuation cases (``logger.info('x') or do_work()``), non-logger identifier cases (``serve_cmd.error``, ``db.warn``, ``item.notice``), and ``fmt.Fprintf`` cases now stay kept. Pure-log cases including nested-paren args (``logger.info('val=' + str(compute(x, y)))``) still get dropped.

Why: body enrichment already shows small positive Δ on PHP (Laravel RESTful apps saw MRR@3 +0.028 on usdt-center and +0.067 on six-forum with local MiniLM) but my ``_LOG_ONLY_LINE_RE`` covered only ``.`` -style method access, so PHP-world logging idioms (``$logger->info``, Laravel ``Log::info`` facade, bare ``error_log``/``syslog``/``trigger_error``) silently stayed in the body text. Adding them is pure upside for log-heavy PHP / Laravel code paths, no test-matrix cost anywhere else. What this commit does: * PHP instance logger: ``$logger->info(...)`` / ``$this->logger->debug(...)`` — covers Monolog convention. * PHP static facade: ``Log::info(...)`` / ``\Log::error(...)`` — Laravel's ``Illuminate\Support\Facades\Log`` (leading ``\\`` for fully-qualified imports). * PHP bare calls: ``error_log(...)`` / ``syslog(...)`` / ``trigger_error(...)`` — stdlib helpers. * PSR-3 log levels: debug / info / notice / warning / error / critical / alert / emergency, plus the common ``log``/``warn`` aliases already used by the rest of the regex. What this commit deliberately does NOT do: extend ``_looks_like_signature`` to PHP. I tested that locally — Laravel RESTful controllers have very short templated bodies (``$data = $request->validated(); Model::create($data);``) and the signature line carries type information (``StoreRequest``, ``JsonResponse``, etc.) that the body never repeats. Stripping the signature flips MiniLM from Δ +0.028 to 0.000 on usdt-center and from Δ +0.067 to **-0.083** on six-forum. The "signature duplicate wastes budget" heuristic breaks on convention-over-configuration PHP codebases, so PHP stays in the conservative fallback — signature line is kept as part of the body snippet. This is documented in the ``_looks_like_signature`` docstring and locked in by ``test_signature_dedup_php_conservative``. Tests: 10 new PHP log-pattern assertions (instance / static / bare / negative non-logger calls like ``$user->save()`` / ``User::create()`` / ``$response->setStatusCode()`` / ``$db->query()``) plus the conservative-fallback regression test noted above.

Why: my first-try rollout kept PHP in the conservative fallback based on a MiniLM-only bench that showed regressions on two Laravel repos. Fresh A/B covering MiniLM + OpenAI 3-large + Gemini-2-preview + Qwen3-8B across the same two repos tells a different story — PHP dedup is net +0.110 MRR@3 across the 8 (repo × provider) cells, with the wins concentrated on the top- tier cloud models where users most expect the quality gain: usdt-center OpenAI 3-large off Δ=-0.111 on Δ= 0.000 ↑ +0.111 usdt-center Gemini-2-preview off Δ=+0.056 on Δ=+0.139 ↑ +0.083 six-forum Gemini-2-preview off Δ=-0.083 on Δ=-0.033 ↑ +0.050 six-forum OpenAI 3-large off Δ=+0.117 on Δ=+0.117 = 0 six-forum Qwen3-8B off Δ=+0.117 on Δ=+0.117 = 0 usdt-center Qwen3-8B off Δ=-0.111 on Δ=-0.167 ↓ -0.056 * usdt-center MiniLM off Δ=+0.028 on Δ= 0.000 ↓ -0.028 six-forum MiniLM off Δ=+0.067 on Δ=+0.017 ↓ -0.050 * Qwen3-8B usdt-center: legacy already at MRR 1.000 (ceiling), any body token that isn't a retrieval signal is noise; this is a ceiling-effect artefact, not a regression from dedup. What: adds PHP back to ``_looks_like_signature`` with the same three-AND gate every other language uses — name + declaration keyword + param first-token match. Covers class methods (all visibility / static / abstract / final prefixes precede ``function``), free functions, classes, interfaces, traits, and enums. Log-skip patterns from the previous commit stay unchanged. Mini-LM-on-Laravel regression is documented in the module docstring; those users can revert via ``CRG_EMBED_INCLUDE_BODY=0`` without touching any other part of the pipeline. The sticky-flag opt-in is exactly the shape this kind of per-model tradeoff needs, and why we built it. Regression test in test_signature_dedup_php now asserts dedup fires for standard PHP signatures (method / class / interface / trait) and still returns False for body-only lines and for overload-style param mismatches, so a future change that breaks one of those categories gets caught.

Codex round 13 nit: the ``test_signature_dedup_php`` comment said "traits, enums" but only asserted a trait example. Add explicit assertions for both shapes of the PHP 8.1 enum syntax: enum OrderStatus: string { ... # declaration → dedup case PENDING = 'pending'; # enum value → keep as body This locks in the behavior Codex spot-checked manually: ``case`` is deliberately NOT in the dedup keyword allowlist, so enum values stay in the body snippet where they carry real retrieval signal.

…anguages Why: the body-embedding feature ships a new ``code-review-graph embed`` subcommand (with ``--include-body`` / ``--confirm-reembed`` / ``--skip-cost-check`` flags) but the existing CLI reference tables in each README stopped at ``eval`` / ``serve``. A first-time user after this PR merges wouldn't discover the subcommand from docs alone. What: adds two rows to the CLI reference ``<details>`` block in every translated README (EN / zh-CN / ja-JP / ko-KR / hi-IN): code-review-graph embed # fresh repo — body enrichment auto-on code-review-graph embed --include-body --confirm-reembed # existing DB — opt in to body Native-phrasing for each language explains the fresh-vs-existing-DB distinction so the sticky-flag + double-flag safety gate is obvious from the docs without cross-referencing the PR body. Notes for translators / follow-up contributors: the ``embed`` CLI also supports optional ``--provider`` / ``--model`` / ``--skip-cost-check`` flags that the full ``code-review-graph embed --help`` output covers; the CLI table sticks to the two shapes most users need.

…lpers Why: body-embedding (PR tirth8205#345) shipped a `code-review-graph embed` CLI but `install` doesn't wire up any hook to keep embeddings fresh after edits. Users who want auto-refresh have to hand-craft a PostToolUse entry in settings.json — easy to get wrong (wrong schema field, no stdio redirect, no cost-gate). This commit lands the building blocks so the next commit can expose a single `--auto-embed-hook` flag. What: - New `generate_auto_embed_hook_entry(repo_root)` in skills.py. Returns a single PostToolUse entry using the canonical Claude Code Jan-2026 schema (`async: true`, NOT the invalid `run_in_background`). Matcher covers `Edit|Write|MultiEdit` (Bash is intentionally left to the standalone update hook since Bash tools don't produce code edits). - Chains `update --skip-flows && embed` so graph.db is fresh when embed reads it; the second update call is a hash-dedup no-op. This eliminates a cross-process race that the separate update hook and this new embed hook would otherwise have on graph.db. - `CRG_EMBED_INCLUDE_BODY=0` is prefixed to each CLI call so the hook never auto-triggers PR tirth8205#345's body-mode cost bomb — users who want body-mode must run `embed --include-body --confirm-reembed` manually and accept the sticky-flag flip-flop (documented in README). - Full stdout+stderr redirection (`>/dev/null 2>&1` + `|| true`) keeps the MCP JSON-RPC stream uncorrupted even when embed prints progress or exits non-zero (e.g. missing provider). - `install_hooks(..., include_auto_embed=False)` adds the kwarg with a False default — existing callers see JSON-struct-identical output. - New `remove_auto_embed_hook_entry(...)` with defensive handling for malformed JSON / non-dict root / non-dict hooks / non-list PostToolUse (all return False without writing, no .bak on no-match — protects pristine baselines). Tests (36 new, all passing): - TestGenerateAutoEmbedHookEntry (13 cases): schema conformance, chaining order, stdio redirect count, env-var placement via regex, path-quoting, matcher content. - TestInstallHooksAutoEmbed (4 cases): entry dedup, JSON-struct deep equality vs pre-change behaviour, qoder platform parity. - TestRemoveAutoEmbedHook (9 cases): exact-dict removal, no-op-no-bak, user-custom preservation, 4 malformed-JSON defensive tests. Plan: .omc/plans/feat-install-auto-embed-hook.md (local, not committed). Stacked on PR tirth8205#345 (feat/body-embedding). Depends on `embed` CLI introduced there; if tirth8205#345 is rebased/squashed, this branch needs refresh.

Why: the hook-generation machinery from the previous commit is only useful if users can opt in from the CLI. This wires the flag through `install` and its `init` alias, surfaces pre-flight warnings for the two most common silent-failure modes (missing sentence-transformers, unsupported platform), and adds integration tests that exercise the full argparse → subprocess → settings.json path. What: - `install` and `init` both gain a mutually-exclusive flag group: `--auto-embed-hook` installs the entry; `--no-auto-embed-hook` removes an exact-match entry without touching anything else in settings.json. Default is OFF (opt-in) — passing neither flag produces the pre-change behaviour byte-for-equivalent on the wire. - `_handle_init` parses the flags into `want_auto_embed` / `want_remove_auto_embed`, emits a stderr warning when the flag is passed with a non-claude/qoder `--platform` (the hook block already skips those platforms, so the warning is the only user-facing signal), and a second stderr warning when the default local provider's `sentence_transformers` extra is missing — both warnings are non-blocking per the "never block the agent" principle. - Removal path explicitly does NOT call install_hooks to avoid re-merging the default update/SessionStart entries on top of any user customisation that happened between install and removal. Tests: - 9 new test_cli.py tests covering: flag acceptance on both subcommands, mutex-violation argparse exit 2, dry-run writes nothing, sentence-transformers find_spec monkeypatch warning, cursor-platform stderr warning, help-text contains both flags. - New tests/test_integration_install_auto_embed.py with two subprocess-driven round-trips — install-and-verify-schema plus install-then-remove — that exercise the full package entry point via `python -m code_review_graph`. Replaces the v1 plan's fragile jq-based smoke script with a CI-gated Python test. Asserts `async: true`, double stdout/stderr redirect, `|| true` tail, `CRG_EMBED_INCLUDE_BODY=0` presence, and static (not $CLAUDE_PROJECT_DIR) repo path. All 113 tests across the three touched files pass. Pre-existing 60 failures on test_parser / test_refactor / test_main (`--tools` flag) are documented in PR tirth8205#345 as unrelated upstream issues.

Why: a new CLI flag is invisible without discoverable docs. First-time users rarely find new flags from code alone — the install-command reference table and the existing body-embedding explanation are the places people actually look. Each of the 5 translated READMEs gets both touchpoints so no audience is stuck on English-only docs. What: - CLI reference table: two new lines right after the existing `embed --include-body --confirm-reembed` entry, covering install + uninstall of the auto-embed hook. Native-phrased per language. - Body-embedding explanation block: a new paragraph inside the same blockquote that already covers body-mode opt-in, documenting (a) the opt-in feature, (b) the Claude-Code-and-Qoder-only / POSIX-only scope limitation, and (c) the sticky-flag flip-flop behavior so anyone who ran `embed --include-body --confirm-reembed` before enabling the hook understands their preference will get reset on every fire. Parity: grep confirms each README mentions `--auto-embed-hook` twice (CLI row + explanation paragraph). Native phrasing preserved across EN / zh-CN / ja-JP / ko-KR / hi-IN — no English-only paragraphs sneaking into the translations. POSIX-only caveat and `install --no-auto-embed-hook` removal path are present in all five.

Why: Codex review flagged 1 BLOCKING + 2 MAJOR + 1 MINOR issue after inspecting the three atomic commits above. This commit lands all four fixes plus regression tests so the branch reaches NO SUBSTANTIVE ISSUES before pushing to upstream. BLOCKING — shell injection via repo path (skills.py:~588): - `generate_auto_embed_hook_entry` was using `json.dumps(...)` to quote the repo path, but json-escaping is NOT sufficient for bash double quotes: `/tmp/repo$(touch /tmp/pwned)` would still trigger command substitution when the hook fires. Switched to `shlex.quote(...)` which wraps any path with shell metacharacters in single quotes that bash does not expand. Added two regression tests asserting that paths containing `$()` and backticks land inside single-quoted tokens, not double-quoted ones. Kept the path-with-spaces test updated to expect the new shlex quoting shape. MAJOR — `--no-auto-embed-hook` had full-install side effects (cli.py _handle_init): - The flag was documented as "idempotent removal that preserves user config", but the original wiring still ran `install_platform_configs`, `.gitignore` update, `generate_skills`, `inject_claude_md`, `install_git_hook`, etc. Short-circuited `_handle_init` at the top: when `want_remove_auto_embed` is True we now do ONLY the removal and return, with a `--dry-run` preview path and clean no-op messages for `--no-hooks` and non-claude/qoder platforms. New test `test_no_auto_embed_hook_short_circuits_install_flow` asserts no .mcp.json / .gitignore / skills dir / CLAUDE.md / pre-commit hook is created when only `--no-auto-embed-hook` is passed. MAJOR — malformed / JSONC settings.json was silently overwritten (skills.py install_hooks): - Backup `shutil.copy2` lived INSIDE the `try: json.loads(...)` block, so a parse failure skipped the backup and the subsequent write destroyed the original file. Reordered so the backup always runs first, then parse; on `JSONDecodeError` we now raise `RuntimeError` with a message pointing the user at the fresh `.bak` and asking them to fix or remove the comment manually, instead of silently resetting to `{}` and writing a blank shape. New test `test_install_aborts_on_malformed_json_with_backup` covers the JSONC scenario Codex specifically called out. MINOR — warnings fired in dry-run / no-hooks modes (cli.py): - The `sentence-transformers is missing` and non-claude/qoder platform warnings were printed even when no hook would be installed (`--dry-run` or `--no-hooks`). Gated both warnings behind `want_auto_embed and not skip_hooks and not dry_run` so the CLI stays quiet in the modes where the warning isn't actionable. Two new tests (`test_extras_warning_gated_by_skip_hooks`, `test_extras_warning_gated_by_dry_run`) keep this regression-safe. Tests: 120 passing (was 113). No new lint / mypy / bandit findings on touched files beyond the pre-existing ones documented in PR tirth8205#345.

…view)

Minidoracat added 13 commits April 19, 2026 14:26

Minidoracat added a commit to Minidoracat/code-review-graph that referenced this pull request Apr 19, 2026

merge: install --auto-embed-hook feature (PR tirth8205#349, local pre…

361e9ea

…view)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(install): add opt-in --auto-embed-hook flag for auto-refresh embeddings#349

feat(install): add opt-in --auto-embed-hook flag for auto-refresh embeddings#349
Minidoracat wants to merge 13 commits intotirth8205:mainfrom
Minidoracat:feat/install-auto-embed-hook

Minidoracat commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Minidoracat commented Apr 19, 2026

Summary

Stacked on #345 — do NOT merge until #345 lands

Why

ADR highlights (full decision table in the plan)

Shell-quoting security

Files

Known limitations

Test plan

Follow-ups (explicitly NOT in this PR, tracked for later)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant