v1.1.7: automated test suite, cross-platform hardening, internal doctor tooling#81
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the tool to version 1.1.7, focusing on cross-platform reliability and introducing a comprehensive automated test suite (unit, integration, e2e, and LLM-as-a-judge). Key changes include hardening Windows execution by passing prompts via stdin, improving executable resolution, adding clearer ENOENT guidance, and introducing a diagnostic doctor script. The review feedback highlights a few areas for improvement: first, the doctor script should mirror the robust Windows executable resolution logic to avoid execution failures when checking the Gemini version; second, the test configuration parser should resolve the .env file path relative to the module's directory using import.meta.url rather than relying on process.cwd(), which improves the portability of the test suite.
There was a problem hiding this comment.
Pull request overview
This PR bumps the project to v1.1.7 and focuses on reliability and cross-platform hardening (notably Windows execution), while introducing the project’s first automated node:test suite (unit/integration CI-gating, plus opt-in e2e/judge suites) and internal “doctor” diagnostics tooling.
Changes:
- Add categorized unit + integration tests (CI gating) plus opt-in e2e and LLM judge suites, with a category-aware test runner and
tsconfig.test.json. - Harden Gemini CLI execution cross-platform (stdin prompt passing for
changeMode/@file, Windows shim resolution,windowsHide, EPIPE handling, improved ENOENT guidance,--helpfix). - Add internal
doctortooling and update CI + package scripts for new validation flow.
Reviewed changes
Copilot reviewed 30 out of 31 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.test.json | Adds a dedicated TS config to type-check src/ + test/ with noEmit. |
| test/unit/utils/geminiExecutor.test.ts | Unit coverage for @file reference safety guard. |
| test/unit/utils/commandExecutor.test.ts | Unit coverage for Windows quoting, command resolution, and ENOENT guidance. |
| test/unit/utils/chunkCache.test.ts | Unit coverage for chunk cache key shape, TTL expiry, eviction, and stats. |
| test/unit/utils/changeModeTranslator.test.ts | Unit coverage for changeMode response formatting and summaries. |
| test/unit/utils/changeModeParser.test.ts | Unit coverage for parsing/validating OLD/NEW blocks. |
| test/unit/utils/changeModeChunker.test.ts | Unit coverage for chunking behavior under budget constraints. |
| test/unit/tools/registry.test.ts | Validates registry tool schema and prompt formatting helpers. |
| test/unit/tools/brainstorm.test.ts | Unit coverage for brainstorm prompt construction + methodology selection. |
| test/README.md | Documents test categories/commands and hermetic boundaries. |
| test/judge/judge.test.ts | Adds opt-in semantic “LLM-as-a-Judge” suite for tool outputs. |
| test/integration/tool-contract.test.ts | Hermetic integration tests for registry→tool validation and non-spawn paths. |
| test/integration/changeMode-pipeline.test.ts | Hermetic integration test wiring the full changeMode parse→chunk→cache→fetch flow. |
| test/envParser.ts | Shared configuration loader for e2e/judge suites via test/.env and env vars. |
| test/e2e/server.e2e.test.ts | E2E protocol/system tool checks that don’t require Gemini. |
| test/e2e/harness.ts | E2E harness for spawning built server and producing rich diagnostics. |
| test/e2e/fixtures/sentinel.txt | Fixture for validating @file inlining in live Gemini runs. |
| test/e2e/ask-gemini.e2e.test.ts | Opt-in live Gemini CLI E2E coverage (auto-skip when Gemini missing). |
| test/.env.example | Example env for enabling judge runs (API keys + optional settings). |
| src/utils/logger.ts | Mutes routine logs under NODE_ENV=test to keep test output readable. |
| src/utils/geminiExecutor.ts | Sends changeMode/@file prompts via stdin; integrates stdin path into executor; changeMode pipeline tweaks. |
| src/utils/commandExecutor.ts | Windows shim resolution, safer quoting, stdin support, windowsHide, EPIPE guard, ENOENT guidance. |
| src/tools/simple-tools.ts | Uses constants for command names/flags; fixes Help to use --help. |
| src/tools/brainstorm.tool.ts | Exports prompt builders for unit testing. |
| src/tools/ask-gemini.tool.ts | Minor formatting changes; continuation validation remains in place. |
| src/constants.ts | Adds ENV.GEMINI_CLI_PATH and corrects --help flag constant. |
| scripts/run-tests.mjs | Category-aware test runner for unit/integration/e2e/judge suites. |
| scripts/doctor.mjs | Adds internal diagnostics runner and helpers for running e2e/judge suites. |
| package.json | Updates scripts to run real tests, adds doctor commands, updates lint to include tests, bumps version to 1.1.7. |
| CHANGELOG.md | Documents v1.1.7 changes and cross-platform hardening. |
| .github/workflows/ci.yml | Updates CI matrix (Node 18/20/22) and gates merges on build + type-check + hermetic tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9b76827 to
24a3c12
Compare
…or tooling Introduce categorized node:test coverage (unit / integration / e2e / judge), cross-platform reliability fixes, and internal developer tooling. ### Testing - Add 56 hermetic unit + integration tests gating CI (Node 18/20/22) - Add e2e suite driving the real Gemini CLI through the built MCP server (auto-skips when gemini is not on PATH) - Add LLM-as-a-Judge semantic evaluation suite (test/judge/) using DeepSeek or OpenRouter against validation rubrics (WIP) - Add scripts/run-tests.mjs (category-aware runner) and scripts/doctor.mjs (preflight / e2e / judge helper) - Add test README, tsconfig.test.json, test/.env.example, and e2e harness with diagnostic logging (spawned CWD, tool args, raw responses) ### Windows & cross-platform hardening - Deliver changeMode and @file prompts on stdin instead of -p flag, avoiding cmd.exe argument parsing and command-line length limits - Resolve the gemini .cmd shim via `where`, honour GEMINI_CLI_PATH - Add windowsHide to suppress popup console windows - Guard against uncaught EPIPE when a child closes stdin early - Fix --help flag (was -help, parsed by yargs as -h -e -l -p) ### Internal - Logger mutes routine output under NODE_ENV=test - Expose CLI flags and GEMINI_CLI_PATH in constants - Update CHANGELOG for 1.1.7
- scripts/doctor.mjs:47: version probing now runs the resolved primary path, handles Windows paths with spaces, and reports a
bad GEMINI_CLI_PATH clearly.
- src/utils/geminiExecutor.ts:196: changeMode continuation with an empty raw result now returns an explicit cache miss or
invalid chunk index.
- test/integration/tool-contract.test.ts:42: updated the missing-cache assertion.
- test/envParser.ts:24: cleaned import placement, trailing comma, and precedence docs.
- test/.env.example:2: fixed doctor:judge command and DeepSeek default model comment.
…emini detection, doc accuracy - commandExecutor: extract selectWindowsGeminiCandidate() (pure, unit-tested); drop the candidates[0] fallback so a lone .ps1/extensionless shim is never selected (it isn't launchable via shell:true) — falls back to gemini.cmd. - e2e harness: hasGemini() now reuses resolveCommandForExecution + spawnSync with Windows-safe quoting, so the live-suite skip matches the server's real resolution (fixes false auto-skip on Windows .cmd shims). - geminiExecutor: correct the prompt-quoting comment (executeCommand uses shell:true on Windows, not shell:false). - run-tests.mjs + test/README.md: document the judge category; Node >= 18.19 (the --import tsx / --test-concurrency floor).
Review feedback addressedAll automated review findings (gemini-code-assist + Copilot) are resolved as of 6f05b95. Mapping for the record: Fixed earlier in this branch (547cfd8):
Fixed in 6f05b95 (this push):
Intentional (not a defect):
Resolving these threads. |
Thanks to contributors1.1.7 builds on reports and review from the community:
|
Summary
Reliability patch plus the project's first automated test suite. Hardens cross-platform execution and adds a categorized
node:testsuite that gates CI. No runtime or default-config changes vs 1.1.6 — the only new knob is the opt-inGEMINI_CLI_PATH.Closes / References
changeModecontinuation with a missing/expired cache returned the generic "No edits found…" (an empty string fell through to the parser). It now returns an explicit cache-miss / invalid-chunk-index message.-p"cannot use both a positional prompt and--prompt" error:@file/changeModeprompts are now delivered on stdin (no-p), and simple prompts pass only-p(never a positional), so the conflict can't arise.CHANGELOG.mddocumenting 1.1.1 → 1.1.7.windowsHide) are harvested into this branch. Close those as superseded once this merges.Testing infrastructure
A
test/tree segmented into four categories by how much of the real world they touch:test/unit/npm run test:unittest/integration/npm run test:integrationtest/e2e/npm run test:e2etest/judge/npm run doctor:judgegeminiisn't onPATH.test/judge/, WIP) semantically evaluates tool output against rubrics via DeepSeek/OpenRouter. Opt-in, key-gated, never in CI..env/.env.examplelive intest/; shared config viatest/envParser.ts.Developer tooling
scripts/run-tests.mjs— category-aware test runner.scripts/doctor.mjs— internal preflight diagnostics + e2e/judge runner (npm run doctor,doctor test,doctor:judge). Unpublished (excluded fromfiles/bin).tsconfig.test.json— type-checks bothsrc/andtest/.test/README.md— documents the categories, commands, the hermetic boundary, and how to add tests.Windows & cross-platform hardening
changeMode/@fileprompts on stdin instead of-p, avoiding cmd.exe argument parsing and the command-line length limit..cmdshim viawhere, honourGEMINI_CLI_PATH; never select an unlaunchable.ps1/extensionless shim.windowsHideto suppress the popup console window.Helptool:--help(was-help, which yargs split into-h -e -l -p).Internal / non-breaking
[GMCPT]output underNODE_ENV=test(errors still print; production unchanged).buildBrainstormPrompt/getMethodologyInstructionsfor unit tests.GEMINI_CLI_PATHin constants.node:test), add Node 22, add a type-check step, removecontinue-on-error.Verification
npm test→ 57/57 pass (exit 0)npm run lint→ clean (src + tests)npm run build→ cleannpm run doctor test).