Skip to content

v1.1.7: automated test suite, cross-platform hardening, internal doctor tooling#81

Merged
jamubc merged 3 commits into
mainfrom
release/1.1.7
Jun 1, 2026
Merged

v1.1.7: automated test suite, cross-platform hardening, internal doctor tooling#81
jamubc merged 3 commits into
mainfrom
release/1.1.7

Conversation

@jamubc

@jamubc jamubc commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

Reliability patch plus the project's first automated test suite. Hardens cross-platform execution and adds a categorized node:test suite that gates CI. No runtime or default-config changes vs 1.1.6 — the only new knob is the opt-in GEMINI_CLI_PATH.

Closes / References


Testing infrastructure

A test/ tree segmented into four categories by how much of the real world they touch:

Category Folder Touches real gemini? Runs in CI? Command
unit test/unit/ No Yes (gates merges) npm run test:unit
integration test/integration/ No Yes (gates merges) npm run test:integration
e2e test/e2e/ Yes (live, opt-in) No npm run test:e2e
judge test/judge/ Yes + LLM judge No npm run doctor:judge
  • 57 hermetic unit + integration tests gate CI on Node 18/20/22.
  • E2E suite drives the real Gemini CLI through the built MCP server over stdio with the MCP SDK client — the automated replacement for manual mcpjam testing. Auto-skips when gemini isn't on PATH.
  • LLM-as-a-Judge (test/judge/, WIP) semantically evaluates tool output against rubrics via DeepSeek/OpenRouter. Opt-in, key-gated, never in CI.
  • Self-contained test env: .env / .env.example live in test/; shared config via test/envParser.ts.

Developer tooling

  • scripts/run-tests.mjs — category-aware test runner.
  • scripts/doctor.mjs — internal preflight diagnostics + e2e/judge runner (npm run doctor, doctor test, doctor:judge). Unpublished (excluded from files/bin).
  • tsconfig.test.json — type-checks both src/ and test/.
  • test/README.md — documents the categories, commands, the hermetic boundary, and how to add tests.

Windows & cross-platform hardening

  • Deliver changeMode / @file prompts on stdin instead of -p, avoiding cmd.exe argument parsing and the command-line length limit.
  • Resolve the gemini .cmd shim via where, honour GEMINI_CLI_PATH; never select an unlaunchable .ps1/extensionless shim.
  • Add windowsHide to suppress the popup console window.
  • Guard against uncaught EPIPE when a child closes stdin early.
  • Fix the Help tool: --help (was -help, which yargs split into -h -e -l -p).

Internal / non-breaking

  • Logger mutes routine [GMCPT] output under NODE_ENV=test (errors still print; production unchanged).
  • Export buildBrainstormPrompt / getMethodologyInstructions for unit tests.
  • Expose CLI flags + GEMINI_CLI_PATH in constants.
  • CI: drop Node 16 (no node:test), add Node 22, add a type-check step, remove continue-on-error.

Verification

  • npm test57/57 pass (exit 0)
  • npm run lint → clean (src + tests)
  • npm run build → clean
  • E2E verified locally against the gemini CLI with real LLM round-trips (npm run doctor test).

Copilot AI review requested due to automatic review settings June 1, 2026 05:05

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the tool to version 1.1.7, focusing on cross-platform reliability and introducing a comprehensive automated test suite (unit, integration, e2e, and LLM-as-a-judge). Key changes include hardening Windows execution by passing prompts via stdin, improving executable resolution, adding clearer ENOENT guidance, and introducing a diagnostic doctor script. The review feedback highlights a few areas for improvement: first, the doctor script should mirror the robust Windows executable resolution logic to avoid execution failures when checking the Gemini version; second, the test configuration parser should resolve the .env file path relative to the module's directory using import.meta.url rather than relying on process.cwd(), which improves the portability of the test suite.

Comment thread scripts/doctor.mjs
Comment thread test/envParser.ts
Comment thread test/envParser.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bumps the project to v1.1.7 and focuses on reliability and cross-platform hardening (notably Windows execution), while introducing the project’s first automated node:test suite (unit/integration CI-gating, plus opt-in e2e/judge suites) and internal “doctor” diagnostics tooling.

Changes:

  • Add categorized unit + integration tests (CI gating) plus opt-in e2e and LLM judge suites, with a category-aware test runner and tsconfig.test.json.
  • Harden Gemini CLI execution cross-platform (stdin prompt passing for changeMode/@file, Windows shim resolution, windowsHide, EPIPE handling, improved ENOENT guidance, --help fix).
  • Add internal doctor tooling and update CI + package scripts for new validation flow.

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tsconfig.test.json Adds a dedicated TS config to type-check src/ + test/ with noEmit.
test/unit/utils/geminiExecutor.test.ts Unit coverage for @file reference safety guard.
test/unit/utils/commandExecutor.test.ts Unit coverage for Windows quoting, command resolution, and ENOENT guidance.
test/unit/utils/chunkCache.test.ts Unit coverage for chunk cache key shape, TTL expiry, eviction, and stats.
test/unit/utils/changeModeTranslator.test.ts Unit coverage for changeMode response formatting and summaries.
test/unit/utils/changeModeParser.test.ts Unit coverage for parsing/validating OLD/NEW blocks.
test/unit/utils/changeModeChunker.test.ts Unit coverage for chunking behavior under budget constraints.
test/unit/tools/registry.test.ts Validates registry tool schema and prompt formatting helpers.
test/unit/tools/brainstorm.test.ts Unit coverage for brainstorm prompt construction + methodology selection.
test/README.md Documents test categories/commands and hermetic boundaries.
test/judge/judge.test.ts Adds opt-in semantic “LLM-as-a-Judge” suite for tool outputs.
test/integration/tool-contract.test.ts Hermetic integration tests for registry→tool validation and non-spawn paths.
test/integration/changeMode-pipeline.test.ts Hermetic integration test wiring the full changeMode parse→chunk→cache→fetch flow.
test/envParser.ts Shared configuration loader for e2e/judge suites via test/.env and env vars.
test/e2e/server.e2e.test.ts E2E protocol/system tool checks that don’t require Gemini.
test/e2e/harness.ts E2E harness for spawning built server and producing rich diagnostics.
test/e2e/fixtures/sentinel.txt Fixture for validating @file inlining in live Gemini runs.
test/e2e/ask-gemini.e2e.test.ts Opt-in live Gemini CLI E2E coverage (auto-skip when Gemini missing).
test/.env.example Example env for enabling judge runs (API keys + optional settings).
src/utils/logger.ts Mutes routine logs under NODE_ENV=test to keep test output readable.
src/utils/geminiExecutor.ts Sends changeMode/@file prompts via stdin; integrates stdin path into executor; changeMode pipeline tweaks.
src/utils/commandExecutor.ts Windows shim resolution, safer quoting, stdin support, windowsHide, EPIPE guard, ENOENT guidance.
src/tools/simple-tools.ts Uses constants for command names/flags; fixes Help to use --help.
src/tools/brainstorm.tool.ts Exports prompt builders for unit testing.
src/tools/ask-gemini.tool.ts Minor formatting changes; continuation validation remains in place.
src/constants.ts Adds ENV.GEMINI_CLI_PATH and corrects --help flag constant.
scripts/run-tests.mjs Category-aware test runner for unit/integration/e2e/judge suites.
scripts/doctor.mjs Adds internal diagnostics runner and helpers for running e2e/judge suites.
package.json Updates scripts to run real tests, adds doctor commands, updates lint to include tests, bumps version to 1.1.7.
CHANGELOG.md Documents v1.1.7 changes and cross-platform hardening.
.github/workflows/ci.yml Updates CI matrix (Node 18/20/22) and gates merges on build + type-check + hermetic tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/envParser.ts Outdated
Comment thread test/envParser.ts Outdated
Comment thread test/README.md Outdated
Comment thread test/README.md Outdated
Comment thread scripts/doctor.mjs Outdated
Comment thread scripts/doctor.mjs Outdated
Comment thread src/utils/geminiExecutor.ts
@jamubc jamubc force-pushed the release/1.1.7 branch 2 times, most recently from 9b76827 to 24a3c12 Compare June 1, 2026 05:20
@jamubc jamubc requested a review from Copilot June 1, 2026 05:21

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 6 comments.

Comment thread src/utils/geminiExecutor.ts
Comment thread test/README.md Outdated
Comment thread test/.env.example Outdated
Comment thread test/.env.example
Comment thread test/envParser.ts Outdated
Comment thread scripts/doctor.mjs
…or tooling

Introduce categorized node:test coverage (unit / integration / e2e / judge),
cross-platform reliability fixes, and internal developer tooling.

### Testing
- Add 56 hermetic unit + integration tests gating CI (Node 18/20/22)
- Add e2e suite driving the real Gemini CLI through the built MCP server
  (auto-skips when gemini is not on PATH)
- Add LLM-as-a-Judge semantic evaluation suite (test/judge/) using
  DeepSeek or OpenRouter against validation rubrics (WIP)
- Add scripts/run-tests.mjs (category-aware runner) and
  scripts/doctor.mjs (preflight / e2e / judge helper)
- Add test README, tsconfig.test.json, test/.env.example, and e2e harness
  with diagnostic logging (spawned CWD, tool args, raw responses)

### Windows & cross-platform hardening
- Deliver changeMode and @file prompts on stdin instead of -p flag,
  avoiding cmd.exe argument parsing and command-line length limits
- Resolve the gemini .cmd shim via `where`, honour GEMINI_CLI_PATH
- Add windowsHide to suppress popup console windows
- Guard against uncaught EPIPE when a child closes stdin early
- Fix --help flag (was -help, parsed by yargs as -h -e -l -p)

### Internal
- Logger mutes routine output under NODE_ENV=test
- Expose CLI flags and GEMINI_CLI_PATH in constants
- Update CHANGELOG for 1.1.7

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 5 comments.

Comment thread test/e2e/harness.ts Outdated
Comment thread src/utils/geminiExecutor.ts
Comment thread test/integration/tool-contract.test.ts Outdated
Comment thread test/.env.example
Comment thread test/README.md Outdated
  - scripts/doctor.mjs:47: version probing now runs the resolved primary path, handles Windows paths with spaces, and reports a
    bad GEMINI_CLI_PATH clearly.

  - src/utils/geminiExecutor.ts:196: changeMode continuation with an empty raw result now returns an explicit cache miss or
    invalid chunk index.

  - test/integration/tool-contract.test.ts:42: updated the missing-cache assertion.
  - test/envParser.ts:24: cleaned import placement, trailing comma, and precedence docs.
  - test/.env.example:2: fixed doctor:judge command and DeepSeek default model comment.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 6 comments.

Comment thread src/utils/commandExecutor.ts Outdated
Comment thread src/utils/geminiExecutor.ts Outdated
Comment thread scripts/run-tests.mjs
Comment thread src/utils/commandExecutor.ts Outdated
Comment thread src/utils/geminiExecutor.ts Outdated
Comment thread scripts/run-tests.mjs
…emini detection, doc accuracy

- commandExecutor: extract selectWindowsGeminiCandidate() (pure, unit-tested);
  drop the candidates[0] fallback so a lone .ps1/extensionless shim is never
  selected (it isn't launchable via shell:true) — falls back to gemini.cmd.
- e2e harness: hasGemini() now reuses resolveCommandForExecution + spawnSync with
  Windows-safe quoting, so the live-suite skip matches the server's real resolution
  (fixes false auto-skip on Windows .cmd shims).
- geminiExecutor: correct the prompt-quoting comment (executeCommand uses shell:true
  on Windows, not shell:false).
- run-tests.mjs + test/README.md: document the judge category; Node >= 18.19 (the
  --import tsx / --test-concurrency floor).
@jamubc

jamubc commented Jun 1, 2026

Copy link
Copy Markdown
Owner Author

Review feedback addressed

All automated review findings (gemini-code-assist + Copilot) are resolved as of 6f05b95. Mapping for the record:

Fixed earlier in this branch (547cfd8):

  • doctor.mjs — version probe now runs the resolved primary/override path (mirrors commandExecutor resolution), and reports a bad GEMINI_CLI_PATH explicitly. (gemini-code-assist high-priority)
  • geminiExecutorchangeMode continuation with an empty/expired cache now returns an explicit cache-miss / invalid-chunk-index message; removed the stray leading + from the "no edits" message.
  • tool-contract.test.ts — asserts the new cache-miss message.
  • envParser.ts.env path resolved via import.meta.url (cwd-independent); added the missing trailing comma; precedence doc matches the implementation (system env wins).
  • test/README.md — "four categories"; relative link instead of an absolute file:// path.
  • test/.env.example — uses npm run doctor:judge.

Fixed in 6f05b95 (this push):

  • commandExecutor — extracted selectWindowsGeminiCandidate() (now unit-tested) and dropped the candidates[0] fallback, so a lone .ps1/extensionless shim is never selected; falls back to gemini.cmd.
  • e2e/harness.tshasGemini() reuses resolveCommandForExecution + spawnSync with Windows-safe quoting, so the live suite no longer false-skips on a Windows .cmd shim.
  • geminiExecutor — corrected the prompt-quoting comment (executeCommand uses shell:true on Windows, not shell:false).
  • run-tests.mjs + README — document the judge category; Node ≥ 18.19 (the --import tsx / --test-concurrency floor).

Intentional (not a defect):

  • DeepSeek judge default model deepseek-v4-flash is correct and confirmed working; test/.env.example and judge.test.ts are aligned on it.

Resolving these threads.

@jamubc

jamubc commented Jun 1, 2026

Copy link
Copy Markdown
Owner Author

Thanks to contributors

1.1.7 builds on reports and review from the community:

@jamubc jamubc merged commit c3d33ee into main Jun 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants