Fork contributions: bug fixes, Windows compatibility, pipeline resilience, and new features

I've been using OpenAnt (primarily on Windows) and have accumulated some changes in [my fork](https://github.com/joshbouncesecurity/OpenAnt) that I'd like to contribute back. I've already opened PR #15 (test suite + CI). Before submitting the rest as individual PRs, I wanted to check which changes you'd be interested in.

I have working implementations for all of these — happy to open PRs for whichever ones you'd like. Check the ones you're interested in, or let me know if you have questions about any of them.

### Testing & CI

- [ ] **1. Test suite + CI** *(already open as PR #15)* — 60 pytest tests covering parsers, token tracking, language detection, and Go CLI integration. GitHub Actions CI on Linux, macOS, and Windows.
- [ ] **2. Add ruff lint to CI** — Adds ruff with two rules: `F821` (undefined name) and `F811` (redefined unused name). Unlike compiled languages, Python won't report an undefined name until that code path executes at runtime, so a missing import can ship undetected and only crash when a user hits that branch. These two rules catch that statically with zero false positives and no style noise.

### Bug Fixes

These exist in the current codebase.

- [x] ~~**3. Findings count shows 0 in build-output** — `PrintBuildOutputSummary` in the Go CLI expects `findings_count` in the JSON response, but the Python CLI returns only `{"pipeline_output_path": path}` — so "Findings included: 0" is always displayed regardless of actual results.~~ *Addressed in upstream PR #23 — `build_pipeline_output` now returns `findings_count` and `cmd_build_output` includes it in the JSON response.*
- [ ] **4. Parse defaults to `--level all` instead of `reachable`** — The Go CLI `parse` command defaults `--level` to `"all"`, while both `scan` and the Python CLI default to `"reachable"`. This means `openant parse` standalone produces a different (larger, noisier) dataset than `openant scan` with no indication to the user.
- [x] ~~**5. Analyze summary missing verdict categories** — The Python analyzer tracks 6 verdict categories (vulnerable, bypassable, inconclusive, protected, safe, errors) but `PrintAnalyzeSummary` only displays Vulnerable and Safe — so the totals don't add up. `PrintScanSummaryV2` already displays all categories correctly; analyze should match.~~ *Addressed in upstream PR #23 — `PrintAnalyzeSummary` now displays all verdict categories (vulnerable, bypassable, protected, safe, inconclusive, errors).*
- [x] ~~**6. Report paths missing from Go CLI output** — `PrintReportSummary` expects `html_path`/`csv_path`/`summary_path` keys, but `ReportResult.to_dict()` only returns `output_path` and `format` — so "Reports Generated" is always blank.~~ *Addressed in upstream PR #23 — `PrintReportSummary` redesigned to read `format` + `output_path` directly, matching `ReportResult.to_dict()`. Also added a Go-based HTML report renderer.*
- [ ] **7. Incomplete call graph for TypeScript/NestJS codebases using dependency injection** — The TypeScript parser doesn't extract constructor parameter types, so dependency-injected service calls (e.g., `this.userService.findById()`) are unresolved in the call graph. This means the security analysis silently misses data flow through injected services — a major blind spot for most production NestJS apps. This adds DI-aware resolution by extracting `constructorDeps` metadata from the AST and using it to resolve `this.service.method()` calls to the correct class.
- [ ] **21. JS parser has no npm-install bootstrap** — `openant parse` on a JS/TS repo fails out of the box with `Cannot find module 'ts-morph'` because nothing in the install flow populates `parsers/javascript/node_modules/`. Fix adds lazy bootstrap in `_parse_javascript`: runs `npm install` once if missing, mirroring the Go CLI's venv bootstrap. See fork PR [#39](https://github.com/joshbouncesecurity/OpenAnt/pull/39).

### Windows Compatibility

These prevent OpenAnt from working correctly on Windows.

- [ ] **8. JS/Go parser path handling on Windows** — `path.relative()` produces backslash paths on Windows, but ts-morph treats backslashes as escape characters — so the JS parser finds 0 files. Additionally, Windows `\r\n` line endings leave trailing `\r` in file lists, and Unicode symbols (`✓✗→`) crash on cp1252 consoles.
- [ ] **9. UTF-8 file I/O across the codebase** — All bare `open()` calls use the system encoding (cp1252 on Windows), causing `charmap codec can't decode` errors on any target codebase containing non-ASCII characters. This adds centralized UTF-8 helpers (`open_utf8`, `read_json`, `write_json`, `run_utf8`) and migrates all file I/O.

### Pipeline Resilience

New capabilities addressing the cost and time implications of long-running scans failing mid-way.

- [x] ~~**10. Crash recovery: checkpoint and resume** — Enhance already supports checkpoint/resume via `--checkpoint <path>`, but it's opt-in and manual. This makes checkpointing always-on and automatic (replaces `--checkpoint` with `--fresh` to opt out), extends per-unit checkpointing to analyze and verify (which currently have none), adds scan step-level resume (re-running skips completed steps), adds `--fresh` to parse (so parser improvements take effect without manually deleting `dataset.json`), and wraps all JSON writes in atomic temp-file-then-rename to prevent corrupt files on crash. Currently, if a multi-hour scan crashes mid-analyze or mid-verify, all progress (and API spend) for that stage is lost.~~ *Addressed in upstream PR #23 — added `core/checkpoint.py` with directory-based per-unit `StepCheckpoint` for enhance, analyze, and verify. Uses `--checkpoint` flag and `--workers`/`--backoff` for parallelization. Does not include atomic JSON writes or `--fresh` flag.*
- [x] ~~**11. Auto-retry errored units** — Enhance and analyze automatically retry units that errored on the previous run instead of treating errors as complete. `--skip-errors` flag to opt out. Currently, API errors (rate limits, timeouts) permanently mark units as "done" — the only recovery is to re-process everything.~~ *Addressed in upstream PR #23 — `StepCheckpoint.load_ids(skip_errors=True)` excludes errored units by default so they are retried on resume. No opt-out flag; errors are always retried.*
- [x] ~~**12. Parallel LLM calls** — `ThreadPoolExecutor`-based parallelization for enhance, analyze, and verify with lock-protected checkpoints. `--concurrency`/`-j` flag (default 4). A scan that takes 2 hours serially completes in approximately 30 minutes with `--concurrency 4`.~~ *Addressed in upstream PR #23 — `ThreadPoolExecutor` with `--workers` (default 8) and `--backoff` (default 30s) flags, plus a `GlobalRateLimiter` for coordinated backoff across all workers.*

### New Features

- [ ] **13. Auto-detect language in `init`** — Makes `--language` optional by auto-detecting the project language from file extensions. Also makes git repository optional for local paths. Reduces friction for new users — `openant init` just works.
- [ ] **14. Auto-detect dependency changes** — Go CLI hashes `pyproject.toml` after `pip install -e` and re-runs install automatically when dependencies change. Prevents stale venv issues after `git pull`.
- [ ] **15. Centralize model IDs** — Single `model_config.py` with `MODEL_PRIMARY`/`MODEL_AUXILIARY`/`MODEL_DEFAULT` constants, replacing hardcoded model strings across 15 files. Makes model updates a one-line change instead of find-and-replace across the codebase.
- [ ] **16. Migrate to Claude Agent SDK** — Routes all LLM calls (analyze, enhance, verify, report/context) through the Claude Agent SDK instead of the `anthropic` API. SDK handles API-key and local Claude Code session auth natively, provides Read/Grep/Glob/Bash native tools for enhance/verify (replacing the manual multi-turn tool loop), and gives accurate cost tracking via `ResultMessage.total_cost_usd`. Fully implemented on fork, not yet submitted upstream — see detail section for history and cherry-pick order.
- [ ] **17. `generate-context` CLI command with auto-discovery** — Adds `openant generate-context [repo-path]` as a standalone pipeline step to generate `application_context.json`. Fully integrated with the project system (`openant init` / `project switch`) — defaults output to the project scan directory. Also wires up auto-discovery of `application_context.json` in `analyze` and `verify` commands (both Go and Python CLIs) so `--app-context` is no longer required when the file exists in the scan dir. Previously, running individual steps required manually passing `--app-context` to every command or skipping application context entirely.
- [ ] **18. Override merge mode for `generate-context`** — When `generate-context` detects a manual override file (`OPENANT.md`/`OPENANT.json`), it now prompts the user to choose: **use** (as-is, skip LLM), **merge** (feed override into LLM alongside other sources), or **ignore** (skip override, generate from scratch). New `--override-mode <use|merge|ignore>` flag bypasses the prompt for CI/automation. `--force` kept as backward-compatible shortcut for `--override-mode ignore`. Previously, override files were all-or-nothing — either they fully replaced LLM generation or were ignored entirely, with no way to combine developer-provided hints with LLM analysis.
- [ ] **19. `--fresh` flag for parse** — Adds `--fresh` to `openant parse` to force a full reparse from scratch without manually deleting `dataset.json`. Useful when parser improvements are deployed and the existing dataset needs to be regenerated. Also clears the dataset during `openant scan` when `--fresh` is passed. See fork PR [#21](https://github.com/joshbouncesecurity/OpenAnt/pull/21).
- [ ] **20. Atomic JSON writes for pipeline outputs** — Wraps all final output writes (`results.json`, `enhanced_dataset.json`, `results_verified.json`) in atomic temp-file-then-rename via `atomic_write_json()`. If a crash or power loss occurs mid-write, the previous output file is preserved intact rather than being left truncated or corrupt. Upstream currently uses plain `json.dump()` with `open()`, which can corrupt multi-hour scan results on interrupt. See fork PR [#7](https://github.com/joshbouncesecurity/OpenAnt/pull/7).

---

---

<details>
<summary>Detailed implementation notes, dependencies, and cherry-pick risks</summary>

## Detailed Implementation Notes

### Change 1: Test suite + CI (fork PRs [#1](https://github.com/joshbouncesecurity/OpenAnt/pull/1) + [#4](https://github.com/joshbouncesecurity/OpenAnt/pull/4), already upstream PR [#15](https://github.com/knostic/OpenAnt/pull/15))

- **Dependencies**: None — this is the foundation for all other changes
- **Cherry-pick difficulty**: Already submitted

### Change 2: Ruff lint in CI (fork PR [#8](https://github.com/joshbouncesecurity/OpenAnt/pull/8), partial)

- **Files**: `.github/workflows/test.yaml`
- **Dependencies**: Change 1 (CI workflow to add the step to)
- **Cherry-pick difficulty**: Clean
- **Note**: Run `ruff check .` against upstream/master before submitting to verify clean baseline

### Change 3: Findings count (fork PR [#12](https://github.com/joshbouncesecurity/OpenAnt/pull/12))

> **Superseded by upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23)** — no longer needs to be submitted.

### Change 4: Parse --level default (fork PR [#16](https://github.com/joshbouncesecurity/OpenAnt/pull/16))

- **Files**: `apps/openant-cli/cmd/parse.go`
- **Dependencies**: None
- **Cherry-pick difficulty**: Clean (one-line change)

### Change 5: Analyze summary verdicts (fork PR [#18](https://github.com/joshbouncesecurity/OpenAnt/pull/18))

> **Superseded by upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23)** — no longer needs to be submitted.

### Change 6: Report paths (fork PR [#19](https://github.com/joshbouncesecurity/OpenAnt/pull/19))

> **Superseded by upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23)** — no longer needs to be submitted.

### Change 7: DI-aware call resolution for TypeScript/NestJS (fork PR [#20](https://github.com/joshbouncesecurity/OpenAnt/pull/20))

- **Files**: `parsers/javascript/typescript_analyzer.js`, `utilities/agentic_enhancer/agent.py`, `utilities/agentic_enhancer/prompts.py`
- **Dependencies**: None (parser-level change)
- **Cherry-pick difficulty**: Clean

### Change 8: JS/Go parser Windows paths (fork PR [#3](https://github.com/joshbouncesecurity/OpenAnt/pull/3))

- **Files**: `parsers/javascript/typescript_analyzer.js`, `parsers/javascript/test_pipeline.py`, `parsers/go/test_pipeline.py`
- **Dependencies**: None
- **Cherry-pick difficulty**: Clean

### Change 9: UTF-8 file I/O (fork PR [#13](https://github.com/joshbouncesecurity/OpenAnt/pull/13))

- **Files**: New `utilities/file_io.py`, plus migrations across ~20 files
- **Dependencies**: None (but benefits from test suite for coverage)
- **Cherry-pick difficulty**: Clean — large diff but mechanical replacement
- **Note**: Cross-cutting change that touches many files. Best submitted early to avoid conflicts with other PRs.

### Change 10: Checkpoint and resume (fork PRs [#7](https://github.com/joshbouncesecurity/OpenAnt/pull/7), [#9](https://github.com/joshbouncesecurity/OpenAnt/pull/9), [#10](https://github.com/joshbouncesecurity/OpenAnt/pull/10), [#21](https://github.com/joshbouncesecurity/OpenAnt/pull/21))

> **Largely superseded by upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23).** Upstream added `core/checkpoint.py` with directory-based per-unit `StepCheckpoint` for enhance, analyze, and verify. The remaining sub-features not in upstream are now tracked as separate items: **change 19** (`--fresh` for parse) and **change 20** (atomic JSON writes).

Original fork implementation (kept as historical reference):

| Stage | Fork PR | Scope | Status |
|-------|---------|-------|--------|
| 1 | [#7](https://github.com/joshbouncesecurity/OpenAnt/pull/7) | Atomic writes (`atomic_write_json`) + enhance auto-checkpoint + `--fresh` flag | Atomic writes split to **change 20**; `--fresh` for parse split to **change 19** |
| 2 | [#9](https://github.com/joshbouncesecurity/OpenAnt/pull/9) | Analyze/verify per-unit checkpointing + `--fresh` for analyze/verify | **Superseded by upstream PR #23** |
| 3 | [#10](https://github.com/joshbouncesecurity/OpenAnt/pull/10) | Scan step-level resume (skip completed steps on re-run) | **Superseded by upstream PR #23** |
| — | [#21](https://github.com/joshbouncesecurity/OpenAnt/pull/21) | `--fresh` flag for parse (forces full reparse, clears dataset in scan) | Now **change 19** |

### Change 11: Auto-retry errored units (fork PR [#15](https://github.com/joshbouncesecurity/OpenAnt/pull/15))

> **Superseded by upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23)** — no longer needs to be submitted. Upstream's `StepCheckpoint.load_ids(skip_errors=True)` provides equivalent behavior.

### Change 12: Parallel LLM calls (fork PR [#17](https://github.com/joshbouncesecurity/OpenAnt/pull/17))

> **Superseded by upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23)** — no longer needs to be submitted. Upstream uses `--workers`/`--backoff` with `ThreadPoolExecutor` and `GlobalRateLimiter`. The fork's `parallel_executor.py` has been deleted as dead code.

### Change 13: Auto-detect language (fork PR [#6](https://github.com/joshbouncesecurity/OpenAnt/pull/6))

- **Dependencies**: None
- **Cherry-pick difficulty**: Clean

### Change 14: Auto-detect dependency changes (fork PR [#23](https://github.com/joshbouncesecurity/OpenAnt/pull/23))

- **Dependencies**: None (Go CLI only)
- **Cherry-pick difficulty**: Clean

### Change 15: Centralize model IDs (fork PR [#24](https://github.com/joshbouncesecurity/OpenAnt/pull/24))

- **Dependencies**: None
- **Cherry-pick difficulty**: Clean
- **Note**: Upstream may want different default model IDs but the centralization pattern is valuable

### Change 16: Claude Agent SDK migration (fork PR [#25](https://github.com/joshbouncesecurity/OpenAnt/pull/25))

**Status**: complete on fork as of 2026-04-19 via fork PRs [#30](https://github.com/joshbouncesecurity/OpenAnt/pull/30), [#31](https://github.com/joshbouncesecurity/OpenAnt/pull/31), [#32](https://github.com/joshbouncesecurity/OpenAnt/pull/32), [#33](https://github.com/joshbouncesecurity/OpenAnt/pull/33), [#34](https://github.com/joshbouncesecurity/OpenAnt/pull/34), [#36](https://github.com/joshbouncesecurity/OpenAnt/pull/36), [#37](https://github.com/joshbouncesecurity/OpenAnt/pull/37), [#38](https://github.com/joshbouncesecurity/OpenAnt/pull/38). Not yet submitted upstream.

**Timeline**:

| Date | Event |
|---|---|
| 2026-03-23 | Fork PR [#25](https://github.com/joshbouncesecurity/OpenAnt/pull/25) completed the migration — zero `anthropic` refs in the four call sites. |
| 2026-04-14 | Upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23) expanded `anthropic` usage on its own copy of the four files, adding typed rate-limit handling for the new parallel execution path. |
| 2026-04-16 | Fork PR [#29](https://github.com/joshbouncesecurity/OpenAnt/pull/29) merged upstream/master into fork and silently absorbed the regression — `anthropic` was back in all four files (8 / 5 / 5 / 3 refs). |
| 2026-04-19 | Fork PR #30 re-declared `anthropic` and added a regression-guard test; PRs #31 through #38 completed the re-migration and dropped the dep. |

This is the largest change. It subsumes all earlier local Claude attempts (fork PRs [#2](https://github.com/joshbouncesecurity/OpenAnt/pull/2), [#5](https://github.com/joshbouncesecurity/OpenAnt/pull/5), [#11](https://github.com/joshbouncesecurity/OpenAnt/pull/11), [#14](https://github.com/joshbouncesecurity/OpenAnt/pull/14)) which should NOT be submitted individually.

**Key benefits:**
- **Single backend**: All LLM calls route through the SDK
- **Local Claude Code support**: SDK natively supports both API key auth and local session auth
- **Native tools**: SDK provides Read/Grep/Glob/Bash tools natively, replacing the manual tool loop
- **Accurate cost tracking**: Uses SDK's `ResultMessage.total_cost_usd`
- **Dependency change**: Drops `anthropic` in favour of `claude-agent-sdk`. History: fork PR #25 originally dropped it; upstream PR #23 re-expanded its usage; fork PR #29 absorbed that back; fork PR #30 re-declared `anthropic` as a minimal bugfix; fork PRs #31–38 completed the re-migration and PR #38 dropped `anthropic` from `pyproject.toml` for good. Current fork master has zero `import anthropic` anywhere in `libs/openant-core/`.

- **Dependencies**: Centralize model IDs (change 15) should go first.
- **Cherry-pick difficulty**: Cherry-pickable in order if done one fork PR at a time (see table below). The original PR #25 diff alone is no longer sufficient — upstream has since added `GlobalRateLimiter` and parallel execution paths that rely on typed `anthropic` exceptions. The fork's re-migration PR chain already handles those additions and can be cherry-picked against current upstream.
- **Note**: Must account for upstream's parallelization, rate limiter, and checkpoint system (introduced in upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23)). The fork's re-migration (PRs #31–38) already does this.

**Fork PRs that re-implement change 16 against current upstream** (cherry-pick targets, in dependency order):

| Fork PR | Scope |
|---|---|
| [#30](https://github.com/joshbouncesecurity/OpenAnt/pull/30) | Regression guard (`tests/test_declared_dependencies.py`) that fails CI if any imported distribution isn't in `pyproject.toml`. **Cherry-pick this even if nothing else lands** — it prevents the same silent drift repeating on any future upstream merge. |
| [#31](https://github.com/joshbouncesecurity/OpenAnt/pull/31) | New `utilities/sdk_errors.py` taxonomy — one exception class per `AssistantMessageError` literal value. |
| [#33](https://github.com/joshbouncesecurity/OpenAnt/pull/33) | Wires `AssistantMessage.error` detection into `llm_client._run_query`. Centralises rate-limit reporting to `GlobalRateLimiter`. |
| [#32](https://github.com/joshbouncesecurity/OpenAnt/pull/32) | Ports `report/generator.py` (single-turn, smallest site). |
| [#34](https://github.com/joshbouncesecurity/OpenAnt/pull/34) | Ports `context_enhancer._build_error_info`'s isinstance chain to `sdk_errors.*`. |
| [#37](https://github.com/joshbouncesecurity/OpenAnt/pull/37) | Ports `finding_verifier.py` — manual tool-dispatch loop replaced with `run_native_verification` (SDK native tools). |
| [#36](https://github.com/joshbouncesecurity/OpenAnt/pull/36) | Ports `agentic_enhancer/agent.py`; absorbed `shared_client` removal in `context_enhancer.py`. |
| [#38](https://github.com/joshbouncesecurity/OpenAnt/pull/38) | Drops `anthropic>=0.40.0` from `pyproject.toml`; cleans up one straggler in `openant/cli.py`'s remediation-guidance LLM call that the earlier ports didn't touch. |

**Technical note on SDK rate-limit surfacing:** the Claude Agent SDK exposes API errors via `AssistantMessage.error: AssistantMessageError | None`, a Literal including `"rate_limit"`, `"authentication_failed"`, `"billing_error"`, `"invalid_request"`, `"server_error"`, `"unknown"` (see `claude_agent_sdk/types.py:767`). No stderr parsing required — earlier notes suggesting otherwise were wrong. `retry-after` and `request-id` headers are not surfaced; the `GlobalRateLimiter`'s default 30s backoff replaces the former in practice.

### Change 17: `generate-context` CLI command with auto-discovery (fork PR [#26](https://github.com/joshbouncesecurity/OpenAnt/pull/26))

- **Files**: New `apps/openant-cli/cmd/generatecontext.go`, modified `apps/openant-cli/cmd/root.go`, `apps/openant-cli/cmd/analyze.go`, `apps/openant-cli/cmd/verify.go`, `libs/openant-core/openant/cli.py`, `libs/openant-core/tests/test_go_cli.py`
- **Documentation**: Also includes doc updates (fork PR [#28](https://github.com/joshbouncesecurity/OpenAnt/pull/28)) to `PIPELINE_MANUAL.md`, `CURRENT_IMPLEMENTATION.md`, `README.md`, and `DOCUMENTATION.md` — these should be included when cherry-picking this change.
- **Dependencies**: None — standalone addition reusing existing `context.application_context` module
- **Cherry-pick difficulty**: Clean
- **Testing status**: Automated tests cover help output and API key validation. Manual testing with an API key is still needed for: generate-context with active project, auto-discovery in analyze/verify, `--force`/`--show-prompt`/`--json` flags, and explicit `--app-context` precedence over auto-discovery.

### Change 18: Override merge mode for `generate-context` (fork PR [#27](https://github.com/joshbouncesecurity/OpenAnt/pull/27))

- **Files**: Modified `apps/openant-cli/cmd/generatecontext.go`, `libs/openant-core/context/application_context.py`, `libs/openant-core/openant/cli.py`, `libs/openant-core/tests/test_go_cli.py`, plus docs (`CLAUDE.md`, `CURRENT_IMPLEMENTATION.md`, `PIPELINE_MANUAL.md`, `context/OPENANT_TEMPLATE.md`)
- **Dependencies**: Change 17 (`generate-context` command must exist)
- **Cherry-pick difficulty**: Clean — builds on top of change 17's `generatecontext.go` and `application_context.py`
- **Implementation**: Go CLI adds interactive prompt (following `uninstall.go` pattern) and `--override-mode` flag. Python core adds `find_override_file()` helper, `override_mode` parameter to `generate_application_context()`, and merge prompt supplement fed to the LLM. Terminal detection via `os.Stdin.Stat()` skips the prompt in non-interactive/CI environments (defaults to `use`).
- **Testing status**: Automated tests cover `--override-mode` in help output and `--force`/`--override-mode` mutual exclusion. Manual testing needed for: interactive prompt with a repo containing `OPENANT.md`, merge mode LLM output (verify with `--show-prompt`), and non-interactive default behavior.

### Change 19: `--fresh` flag for parse (fork PR [#21](https://github.com/joshbouncesecurity/OpenAnt/pull/21))

- **Files**: `apps/openant-cli/cmd/parse.go`, `libs/openant-core/openant/cli.py`, `libs/openant-core/core/scanner.py`
- **Dependencies**: None
- **Cherry-pick difficulty**: Clean

### Change 20: Atomic JSON writes (fork PR [#7](https://github.com/joshbouncesecurity/OpenAnt/pull/7), partial)

- **Files**: `core/utils.py` (new), `core/analyzer.py`, `core/enhancer.py`, `core/verifier.py`
- **Dependencies**: None
- **Cherry-pick difficulty**: Clean

### Change 21: JS parser lazy npm bootstrap (fork PR [#39](https://github.com/joshbouncesecurity/OpenAnt/pull/39))

- **Files**: `libs/openant-core/core/parser_adapter.py` (new `_ensure_js_parser_dependencies()` helper called from `_parse_javascript`), new `libs/openant-core/tests/test_js_parser_bootstrap.py` (5 unit tests covering: node_modules present → skip, missing → run install, npm missing → clear error, install failure → surfaced, bootstrap failure aborts before running Node).
- **Dependencies**: None — self-contained addition. Does not require change 9 (UTF-8 I/O) since the helper uses only `subprocess.run` and `shutil.which`.
- **Cherry-pick difficulty**: Clean — ~40 lines of production code plus a tests file.
- **Design note**: Lazy (on first JS parse) rather than eager (at CLI startup) so users who only scan Python/Go repos never need Node or npm installed. Mirrors the Go CLI's existing venv bootstrap pattern in `apps/openant-cli/internal/python/runtime.go` (check → install → cache), just scoped to the JS parser subdir instead of the user's home dir. The "installing deps (first run, this may take a minute)..." message is printed to stderr so it's visible but doesn't pollute machine-readable output.
- **Alternative considered**: Adding `npm install` to `apps/openant-cli/Makefile`'s `build` target or to a new `setup` target — rejected because it forces npm on users who never parse JS, and because users who skip the README still hit the same opaque `Cannot find module 'ts-morph'` error. The lazy check in `_parse_javascript` fails at the exact point the dep is needed, and either installs automatically or prints a clear "run npm install in X" message with the exact directory path.
- **Testing status**: Automated unit tests cover all four branches (present / missing+install / npm-missing / install-fails) plus the integration surface (`_parse_javascript` does not fall through to Node when bootstrap raises). Tests monkeypatch `subprocess.run` and `shutil.which` so they run without Node installed. Manually verified against Linux cloudshell repro (the exact `Cannot find module 'ts-morph'` failure documented in the PR description).

---

## Superseded Changes (Not for Upstream)

These were intermediate steps toward local Claude Code support or features now covered by upstream. They should NOT be submitted upstream individually.

| Fork PR | Description | Why superseded |
|---------|-------------|----------------|
| [#2](https://github.com/joshbouncesecurity/OpenAnt/pull/2) | feat: support local Claude Code session for LLM calls | Replaced by SDK's native auth support in [#25](https://github.com/joshbouncesecurity/OpenAnt/pull/25) |
| [#5](https://github.com/joshbouncesecurity/OpenAnt/pull/5) | test: add tests for local Claude Code mode | Tests for `LocalClaudeClient` which is deleted in [#25](https://github.com/joshbouncesecurity/OpenAnt/pull/25) |
| [#8](https://github.com/joshbouncesecurity/OpenAnt/pull/8) | fix: missing os import in context_enhancer | Bug introduced by fork's checkpoint code, not present in upstream |
| [#11](https://github.com/joshbouncesecurity/OpenAnt/pull/11) | feat: add text-based tool use simulation to LocalClaudeClient | Replaced by SDK's native tool support in [#25](https://github.com/joshbouncesecurity/OpenAnt/pull/25) |
| [#14](https://github.com/joshbouncesecurity/OpenAnt/pull/14) | fix: pass prompt via stdin to avoid WinError 206 | Fix for `local_claude.py` which is deleted in [#25](https://github.com/joshbouncesecurity/OpenAnt/pull/25) |
| [#22](https://github.com/joshbouncesecurity/OpenAnt/pull/22) | fix: pass --fresh flag to chained verify | Depends on `--fresh` for analyze/verify which is superseded by upstream's checkpoint system |

---

## Recommended Submission Order

The order below respects dependencies and minimizes merge conflicts:

| # | Change | Fork PR(s) | Type | Depends on |
|---|--------|-----------|------|------------|
| 1 | 1 | [#1](https://github.com/joshbouncesecurity/OpenAnt/pull/1), [#4](https://github.com/joshbouncesecurity/OpenAnt/pull/4) | Tests + CI | — (already upstream PR [#15](https://github.com/knostic/OpenAnt/pull/15)) |
| 2 | 2 | [#8](https://github.com/joshbouncesecurity/OpenAnt/pull/8) (partial) | Ruff lint in CI | 1 |
| 3 | 8 | [#3](https://github.com/joshbouncesecurity/OpenAnt/pull/3) | Windows JS/Go parser paths | — |
| 4 | 9 | [#13](https://github.com/joshbouncesecurity/OpenAnt/pull/13) | UTF-8 file I/O | 1 (for tests) |
| 5 | 4 | [#16](https://github.com/joshbouncesecurity/OpenAnt/pull/16) | Parse --level default | — |
| 6 | 7 | [#20](https://github.com/joshbouncesecurity/OpenAnt/pull/20) | DI-aware call resolution | — |
| 7 | 13 | [#6](https://github.com/joshbouncesecurity/OpenAnt/pull/6) | Auto-detect language | — |
| 8 | 14 | [#23](https://github.com/joshbouncesecurity/OpenAnt/pull/23) | Auto-detect dependency changes | — |
| 9 | 20 | [#7](https://github.com/joshbouncesecurity/OpenAnt/pull/7) (partial) | Atomic JSON writes | — |
| 10 | 19 | [#21](https://github.com/joshbouncesecurity/OpenAnt/pull/21) | `--fresh` flag for parse | — |
| 11 | 15 | [#24](https://github.com/joshbouncesecurity/OpenAnt/pull/24) | Centralize model IDs | — |
| 12 | 16 | [#30](https://github.com/joshbouncesecurity/OpenAnt/pull/30), [#31](https://github.com/joshbouncesecurity/OpenAnt/pull/31), [#33](https://github.com/joshbouncesecurity/OpenAnt/pull/33), [#32](https://github.com/joshbouncesecurity/OpenAnt/pull/32), [#34](https://github.com/joshbouncesecurity/OpenAnt/pull/34), [#37](https://github.com/joshbouncesecurity/OpenAnt/pull/37), [#36](https://github.com/joshbouncesecurity/OpenAnt/pull/36), [#38](https://github.com/joshbouncesecurity/OpenAnt/pull/38) (in this order) | Claude Agent SDK migration — see change 16's detail section | 15 |
| 13 | 17 | [#26](https://github.com/joshbouncesecurity/OpenAnt/pull/26) | `generate-context` CLI command + auto-discovery | — |
| 14 | 18 | [#27](https://github.com/joshbouncesecurity/OpenAnt/pull/27) | Override merge mode for `generate-context` | 17 |
| 15 | 21 | [#39](https://github.com/joshbouncesecurity/OpenAnt/pull/39) | JS parser lazy npm bootstrap | — |

---

## Cherry-Pick Risks & Notes

### Critical: Each PR must be rebased, not cherry-picked directly

Every fork PR was developed incrementally on top of previous fork PRs (including the superseded local Claude ones: [#2](https://github.com/joshbouncesecurity/OpenAnt/pull/2), [#5](https://github.com/joshbouncesecurity/OpenAnt/pull/5), [#11](https://github.com/joshbouncesecurity/OpenAnt/pull/11), [#14](https://github.com/joshbouncesecurity/OpenAnt/pull/14)). Direct `git cherry-pick` will produce diffs relative to the fork's history, not upstream's. **Each change must be rebased or re-authored as a standalone diff against upstream/master** (or against the upstream branch that includes previously merged contributions).

### Import chain dependencies

Several fork PRs create new modules that later PRs import. Missing a dependency in the chain will cause `ImportError` at runtime:

| Module | Created by | Required by |
|--------|-----------|-------------|
| `core/utils.py` (`atomic_write_json`) | Change 20 (atomic writes) | Change 9 |
| `utilities/file_io.py` (`open_utf8`, etc.) | Change 9 (UTF-8 I/O) | Every subsequent PR that touches file I/O |
| `utilities/model_config.py` (`MODEL_PRIMARY`, etc.) | Change 15 (model IDs) | Change 16 (SDK) |

### `cli.py` flag accumulation

The fork added CLI flags across multiple PRs. When rebasing against upstream, each PR needs to add its flags against **upstream's version** of `cli.py`, not the fork's. The same applies to the Go CLI `cmd/*.go` files.

### Go CLI cmd files

Upstream has 17 files in `apps/openant-cli/cmd/`. Upstream may have modified the same cmd files the fork changed. Each Go CLI PR should be diffed against upstream's current version of those files.

### `llm_client.py` divergence

This file diverged between fork and upstream:
- **Upstream**: Uses `anthropic.Anthropic()` directly, model IDs hardcoded as `"claude-opus-4-20250514"` / `"claude-sonnet-4-20250514"`
- **Change 15** (model IDs): Replaced hardcoded strings with `model_config` imports
- **Change 16** (SDK): Rewritten. Fork PRs [#31](https://github.com/joshbouncesecurity/OpenAnt/pull/31) (new `utilities/sdk_errors.py`) and [#33](https://github.com/joshbouncesecurity/OpenAnt/pull/33) (wires SDK error surfacing into `_run_query`) are the clean cherry-pick targets for this file. Change 15 still needs re-authoring against upstream's version since it predates the SDK work.

### `pyproject.toml` dependency divergence

Upstream has `anthropic>=0.40.0`. The fork has fully migrated off it (as of 2026-04-19) via the PR sequence listed under change 16. Change 16, when ported upstream, removes `anthropic` and adds `claude-agent-sdk>=0.1.48`. Intermediate fork PRs (#30 through #37) keep both deps declared while individual ports land; only the final PR (#38) removes `anthropic`. If cherry-picking incrementally upstream, follow the same order — removing the dep before all usages are ported breaks the build.

### Ruff lint may fail on upstream code

Change 2 adds ruff with F821/F811 rules to CI. If upstream has introduced code that violates these rules, CI will fail. **Run `ruff check .` against upstream/master before submitting** to verify clean baseline.

### `context_enhancer.py` and `finding_verifier.py` evolution

These files were modified by multiple fork PRs:
- `context_enhancer.py`: change 9 (UTF-8 I/O), change 16 — fork PRs [#34](https://github.com/joshbouncesecurity/OpenAnt/pull/34) (`_build_error_info` port) and [#36](https://github.com/joshbouncesecurity/OpenAnt/pull/36) (`shared_client` removal).
- `finding_verifier.py`: change 9 (UTF-8 I/O), change 16 — fork PR [#37](https://github.com/joshbouncesecurity/OpenAnt/pull/37) replaces the manual tool loop with `run_native_verification`.

Each of those change-16 PRs applies cleanly against current upstream. Ordering matters (see the table under change 16); the files compile mid-chain because `anthropic` stays declared in `pyproject.toml` until the final PR (#38) removes it.

### SDK migration (change 16): cherry-pick order matters

Change 16 is cherry-pickable one fork PR at a time in the order listed under change 16's detail section. The historical PR #25 diff alone is **not** sufficient — upstream has since added `GlobalRateLimiter` and parallel execution paths that rely on typed `anthropic` exceptions. The fork's re-migration PRs (#31, #33, #32, #34, #37, #36, #38) already handle those additions.

Key facts when cherry-picking:

1. The SDK surfaces rate limits via `AssistantMessage.error: AssistantMessageError | None` (Literal including `"rate_limit"`). No stderr parsing required.
2. `GlobalRateLimiter.report_rate_limit()` is called centrally from `llm_client._run_query` whenever that field fires. Callers do not need their own `except anthropic.RateLimitError` blocks.
3. `retry-after` is lost in translation — the SDK does not surface the header value. The limiter's default backoff (30s, configurable) replaces that hint in practice.
4. **Cherry-pick fork PR [#30](https://github.com/joshbouncesecurity/OpenAnt/pull/30) first.** It adds a regression-guard test that catches any future merge silently re-introducing an undeclared import. Cheap insurance; landing it alone provides value even if the rest of change 16 is deferred.

### Upstream may have moved

All analysis is based on `upstream/master` as of 2026-04-19. Upstream PR [#23](https://github.com/knostic/OpenAnt/pull/23) has been merged, superseding fork changes 3, 5, 6, 10 (stages 2-3), 11, and 12. If upstream merges other PRs before these contributions land, additional conflicts may arise. Re-fetch and recheck before each submission.

</details>












Stage	Fork PR	Scope	Status
1	#7	Atomic writes (`atomic_write_json`) + enhance auto-checkpoint + `--fresh` flag	Atomic writes split to change 20; `--fresh` for parse split to change 19
2	#9	Analyze/verify per-unit checkpointing + `--fresh` for analyze/verify	Superseded by upstream PR #23
3	#10	Scan step-level resume (skip completed steps on re-run)	Superseded by upstream PR #23
—	#21	`--fresh` flag for parse (forces full reparse, clears dataset in scan)	Now change 19

Date	Event
2026-03-23	Fork PR #25 completed the migration — zero `anthropic` refs in the four call sites.
2026-04-14	Upstream PR #23 expanded `anthropic` usage on its own copy of the four files, adding typed rate-limit handling for the new parallel execution path.
2026-04-16	Fork PR #29 merged upstream/master into fork and silently absorbed the regression — `anthropic` was back in all four files (8 / 5 / 5 / 3 refs).
2026-04-19	Fork PR #30 re-declared `anthropic` and added a regression-guard test; PRs #31 through #38 completed the re-migration and dropped the dep.

Fork PR	Scope
#30	Regression guard (`tests/test_declared_dependencies.py`) that fails CI if any imported distribution isn't in `pyproject.toml`. Cherry-pick this even if nothing else lands — it prevents the same silent drift repeating on any future upstream merge.
#31	New `utilities/sdk_errors.py` taxonomy — one exception class per `AssistantMessageError` literal value.
#33	Wires `AssistantMessage.error` detection into `llm_client._run_query`. Centralises rate-limit reporting to `GlobalRateLimiter`.
#32	Ports `report/generator.py` (single-turn, smallest site).
#34	Ports `context_enhancer._build_error_info`'s isinstance chain to `sdk_errors.*`.
#37	Ports `finding_verifier.py` — manual tool-dispatch loop replaced with `run_native_verification` (SDK native tools).
#36	Ports `agentic_enhancer/agent.py`; absorbed `shared_client` removal in `context_enhancer.py`.
#38	Drops `anthropic>=0.40.0` from `pyproject.toml`; cleans up one straggler in `openant/cli.py`'s remediation-guidance LLM call that the earlier ports didn't touch.

Fork PR	Description	Why superseded
#2	feat: support local Claude Code session for LLM calls	Replaced by SDK's native auth support in #25
#5	test: add tests for local Claude Code mode	Tests for `LocalClaudeClient` which is deleted in #25
#8	fix: missing os import in context_enhancer	Bug introduced by fork's checkpoint code, not present in upstream
#11	feat: add text-based tool use simulation to LocalClaudeClient	Replaced by SDK's native tool support in #25
#14	fix: pass prompt via stdin to avoid WinError 206	Fix for `local_claude.py` which is deleted in #25
#22	fix: pass --fresh flag to chained verify	Depends on `--fresh` for analyze/verify which is superseded by upstream's checkpoint system

#	Change	Fork PR(s)	Type	Depends on
1	1	#1, #4	Tests + CI	— (already upstream PR #15)
2	2	#8 (partial)	Ruff lint in CI	1
3	8	#3	Windows JS/Go parser paths	—
4	9	#13	UTF-8 file I/O	1 (for tests)
5	4	#16	Parse --level default	—
6	7	#20	DI-aware call resolution	—
7	13	#6	Auto-detect language	—
8	14	#23	Auto-detect dependency changes	—
9	20	#7 (partial)	Atomic JSON writes	—
10	19	#21	`--fresh` flag for parse	—
11	15	#24	Centralize model IDs	—
12	16	#30, #31, #33, #32, #34, #37, #36, #38 (in this order)	Claude Agent SDK migration — see change 16's detail section	15
13	17	#26	`generate-context` CLI command + auto-discovery	—
14	18	#27	Override merge mode for `generate-context`	17
15	21	#39	JS parser lazy npm bootstrap	—

Module	Created by	Required by
`core/utils.py` (`atomic_write_json`)	Change 20 (atomic writes)	Change 9
`utilities/file_io.py` (`open_utf8`, etc.)	Change 9 (UTF-8 I/O)	Every subsequent PR that touches file I/O
`utilities/model_config.py` (`MODEL_PRIMARY`, etc.)	Change 15 (model IDs)	Change 16 (SDK)

Fork contributions: bug fixes, Windows compatibility, pipeline resilience, and new features #16

Description

Testing & CI

Bug Fixes

Windows Compatibility

Pipeline Resilience

New Features

Detailed Implementation Notes

Change 1: Test suite + CI (fork PRs #1 + #4, already upstream PR #15)

Change 2: Ruff lint in CI (fork PR #8, partial)

Change 3: Findings count (fork PR #12)

Change 4: Parse --level default (fork PR #16)

Change 5: Analyze summary verdicts (fork PR #18)

Change 6: Report paths (fork PR #19)

Change 7: DI-aware call resolution for TypeScript/NestJS (fork PR #20)

Change 8: JS/Go parser Windows paths (fork PR #3)

Change 9: UTF-8 file I/O (fork PR #13)

Change 10: Checkpoint and resume (fork PRs #7, #9, #10, #21)

Change 11: Auto-retry errored units (fork PR #15)

Change 12: Parallel LLM calls (fork PR #17)

Change 13: Auto-detect language (fork PR #6)

Change 14: Auto-detect dependency changes (fork PR #23)

Change 15: Centralize model IDs (fork PR #24)

Change 16: Claude Agent SDK migration (fork PR #25)

Change 17: generate-context CLI command with auto-discovery (fork PR #26)

Change 18: Override merge mode for generate-context (fork PR #27)

Change 19: --fresh flag for parse (fork PR #21)

Change 20: Atomic JSON writes (fork PR #7, partial)

Change 21: JS parser lazy npm bootstrap (fork PR #39)

Superseded Changes (Not for Upstream)

Recommended Submission Order

Cherry-Pick Risks & Notes

Critical: Each PR must be rebased, not cherry-picked directly

Import chain dependencies

cli.py flag accumulation

Go CLI cmd files

llm_client.py divergence

pyproject.toml dependency divergence

Ruff lint may fail on upstream code

context_enhancer.py and finding_verifier.py evolution

SDK migration (change 16): cherry-pick order matters

Upstream may have moved

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Change 17: `generate-context` CLI command with auto-discovery (fork PR #26)

Change 18: Override merge mode for `generate-context` (fork PR #27)

Change 19: `--fresh` flag for parse (fork PR #21)

`cli.py` flag accumulation

`llm_client.py` divergence

`pyproject.toml` dependency divergence

`context_enhancer.py` and `finding_verifier.py` evolution