jamubc · jamubc · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -9,25 +9,31 @@ on:
 jobs:
   test:
     runs-on: ubuntu-latest
-    
+
     strategy:
       matrix:
-        node-version: [16.x, 18.x, 20.x]
-
+        # node:test (used by the suite) requires Node >= 18.
+        node-version: [18.x, 20.x, 22.x]
+
     steps:
     - uses: actions/checkout@v4
-    
+
     - name: Use Node.js ${{ matrix.node-version }}
       uses: actions/setup-node@v4
       with:
         node-version: ${{ matrix.node-version }}
-    
+
     - name: Install dependencies
       run: npm ci
-    
+
     - name: Build
       run: npm run build
-
+
+    - name: Type-check (source + tests)
+      run: npm run lint
+
+    # Hermetic suite only (unit + integration). The e2e suite needs an
+    # authenticated gemini CLI and is run on demand via `npm run test:e2e`
+    # (or `npm run doctor test`).
     - name: Run tests
       run: npm test
-      continue-on-error: true
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,7 +1,20 @@
 # Changelog
 
+## [1.1.7] - 2026-05-31
+Reliability patch plus the project's first automated test suite. Hardens cross-platform execution (the Windows fixes and a few robustness guards) and adds a categorized `node:test` suite that gates CI. **No runtime or default-config changes vs 1.1.6** — the only new knob is the opt-in `GEMINI_CLI_PATH`.
+
+- **Windows: stdin prompt passing** — `changeMode` and `@file` prompts are sent to the Gemini CLI on **stdin** instead of the `-p` flag, sidestepping cmd.exe argument parsing and the OS command-line length limit; this also avoids the deprecated-`-p` positional-prompt conflict for those prompts (#48). Adds `windowsHide` to suppress the popup console window. (harvested from #27 via #77)
+- **Windows: executable resolution** — honours `GEMINI_CLI_PATH`, otherwise resolves the real `gemini` shim via `where` (preferring `.cmd`), fixing "command not found" when the MCP server doesn't inherit your shell's PATH.
+- **Clearer ENOENT guidance** when the executable isn't found, including the `GEMINI_CLI_PATH` hint.
+- **stdin EPIPE / spawn-error hardening** — a child that closes stdin early no longer throws an uncaught error that could drop the long-lived server connection (candidate fix for the disconnects in #64).
+- **`Help` tool** now invokes `gemini --help` instead of `-help`, which the Gemini CLI's yargs parser split into `-h -e -l -p`.
+- **Test suite** — categorized `node:test` coverage under `test/`: **unit** (command quoting / Windows resolution / ENOENT, the `@file` guard, the changeMode parser/chunker/translator, the chunk cache, the tool registry, brainstorm prompt building), **integration** (the changeMode → `fetch-chunk` pipeline and the registry → tool contract, both hermetic), and **e2e** (the real gemini driven through the built MCP server; auto-skips without gemini). `npm test` runs unit+integration and now **gates CI** (Node 18/20/22); `npm run test:e2e` runs the live suite. Includes a regression test for the changeMode cache-miss path (#67).
+- **Internal `doctor`** (work in progress) — `npm run doctor` reports node + the detected `gemini` install; `npm run doctor test` builds the server and runs the e2e suite (the automated replacement for manual MCP inspector or costly token burning tests and checks). Excluded from the npm package (`files`/`bin`).
+- **LLM judge semantic test suite** (`test/judge/`) — Use DeepSeek or OpenRouter to evaluate tool outputs against validation rubrics. This is a work in progress.
+- **Diagnostics logging** — E2E harness now logs the spawned server's working directory (`📂 SPAWNED CWD`) for easier local debugging.
+
 ## [1.1.6] - 2026-05-30
-_Emergency security patch — the CVE-2026-0755 fix only, ahead of the larger 1.2.0 release._
+_Emergency security patch — CVE-2026-0755 fix only._
 - Security fix: OS command-injection / `@file` exfiltration via prompt quoting in `geminiExecutor.ts` (CVE-2026-0755, CWE-78). Fixes #73 (and the literal-quote corruption in #66).
   - Removed the broken double-quote wrapping from both the primary and fallback paths. With `spawn` running `shell: false`, those quotes were passed as literal characters — they provided no protection and corrupted `@file` references. Windows `.cmd` argument quoting is hardened separately (see below).
   - Added `assertSafeFileReferences()`, which rejects any `@file` reference that resolves outside the project working directory (absolute paths, `~` home references, and `../` traversal), closing the arbitrary-file-read exfiltration vector while preserving legitimate in-project `@file` usage.

diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "gemini-mcp-tool",
-  "version": "1.1.6",
+  "version": "1.1.7",
   "description": "MCP server for Gemini CLI integration",
   "type": "module",
   "main": "dist/index.js",
@@ -11,8 +11,13 @@
     "build": "tsc",
     "start": "node dist/index.js",
     "dev": "tsc && node dist/index.js",
-    "test": "echo \"No tests yet\" && exit 0",
-    "lint": "tsc --noEmit",
+    "doctor": "node scripts/doctor.mjs",
+    "doctor:judge": "node scripts/doctor.mjs judge",
+    "test": "node scripts/run-tests.mjs unit integration",
+    "test:unit": "node scripts/run-tests.mjs unit",
+    "test:integration": "node scripts/run-tests.mjs integration",
+    "test:e2e": "npm run build && node scripts/run-tests.mjs e2e",
+    "lint": "tsc -p tsconfig.test.json",
     "contribute": "tsx src/contribute.ts",
     "prepublishOnly": "echo '⚠️  Remember to test locally first!' && npm run build",
     "docs:dev": "vitepress dev docs",

diff --git a/scripts/doctor.mjs b/scripts/doctor.mjs
@@ -0,0 +1,224 @@
+#!/usr/bin/env node
+// gemini-mcp-tool doctor — INTERNAL dev / diagnostic + test tool.
+//
+// Not published: deliberately excluded from package.json "bin" and "files", so
+// it ships with the repo but NOT the npm package. Run it from a checkout:
+//
+//   npm run doctor        → report the live system state for the MCP server
+//   npm run doctor test    → preflight + run the e2e suite (the automated MCP
+//                            test that replaces manual mcpjam clicking)
+//
+// This is the 1.1.7 seed: it reports what 1.1.7 actually has (node, the gemini
+// CLI, GEMINI_CLI_PATH) and runs the test suite. Later feature PRs grow it with
+// backend / model / approval / timeout diagnostics.
+// Self-contained: pure Node, no build step or dependencies.
+
+import { spawnSync } from "node:child_process";
+import { existsSync, readFileSync } from "node:fs";
+import path from "node:path";
+import { fileURLToPath } from "node:url";
+
+const ENV = {
+  GEMINI_CLI_PATH: "GEMINI_CLI_PATH", // explicit path to the gemini executable
+};
+
+const isWindows = process.platform === "win32";
+const useColor = process.stdout.isTTY && !process.env.NO_COLOR;
+const paint = (code, s) => (useColor ? `\x1b[${code}m${s}\x1b[0m` : s);
+const c = {
+  bold: (s) => paint("1", s),
+  dim: (s) => paint("2", s),
+  green: (s) => paint("32", s),
+  yellow: (s) => paint("33", s),
+  red: (s) => paint("31", s),
+  cyan: (s) => paint("36", s),
+};
+const OK = c.green("✓");
+const WARN = c.yellow("⚠");
+const BAD = c.red("✗");
+
+const repoRoot = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "..");
+
+function heading(title) {
+  console.log("\n" + c.bold(title));
+  console.log(c.dim("─".repeat(Math.max(title.length, 16))));
+}
+
+function runCmd(cmd, args) {
+  try {
+    const executable = isWindows && /\s/.test(cmd) ? `"${cmd.replace(/"/g, '""')}"` : cmd;
+    const r = spawnSync(executable, args, { encoding: "utf8", timeout: 20000, shell: isWindows, windowsHide: true });
+    if (r.error) return { ok: false, err: r.error.message };
+    return { ok: r.status === 0, status: r.status, out: (r.stdout || "").trim(), err: (r.stderr || "").trim() };
+  } catch (e) {
+    return { ok: false, err: e instanceof Error ? e.message : String(e) };
+  }
+}
+
+function locate(cmd) {
+  const r = runCmd(isWindows ? "where" : "which", [cmd]);
+  if (!r.ok || !r.out) return [];
+  return r.out.split(/\r?\n/).map((s) => s.trim()).filter(Boolean);
+}
+
+// Mirror commandExecutor's resolution: honour GEMINI_CLI_PATH, else PATH.
+function detectGemini() {
+  const override = (process.env[ENV.GEMINI_CLI_PATH] || "").trim();
+  const pathCandidates = locate("gemini");
+  const candidates = override ? [override, ...pathCandidates.filter((p) => p !== override)] : pathCandidates;
+  let primary = override || null;
+  if (!primary && candidates.length > 0) {
+    if (isWindows) {
+      const byExt = (ext) => candidates.find((c) => c.toLowerCase().endsWith(ext));
+      primary = byExt(".cmd") || byExt(".exe") || byExt(".bat") || candidates[0];
+    } else {
+      primary = candidates[0];
+    }
+  }
+  const found = override ? existsSync(override) : candidates.length > 0;
+  let version = null;
+  if (found && primary) {
+    const v = runCmd(primary, ["--version"]);
+    if (v.ok && v.out) version = v.out.split(/\r?\n/)[0].trim();
+  }
+  return { found: !!found, primary, candidates, override: override || null, version };
+}
+
+// ── report ───────────────────────────────────────────────────────────────────
+function runReport() {
+  const problems = [];
+
+  heading("System");
+  console.log(`  node      ${process.version}`);
+  console.log(`  platform  ${process.platform} (${process.arch})`);
+
+  heading("Gemini CLI");
+  const gemini = detectGemini();
+  if (gemini.found) {
+    console.log(`  ${OK} found${gemini.override ? " (via " + ENV.GEMINI_CLI_PATH + ")" : ""}`);
+    console.log(`     path     ${gemini.primary}`);
+    console.log(`     version  ${gemini.version ? c.cyan(gemini.version) : c.yellow("(could not read --version)")}`);
+    if (gemini.candidates.length > 1) console.log(c.dim(`     also on PATH: ${gemini.candidates.slice(1).join(", ")}`));
+  } else {
+    console.log(`  ${BAD} ${gemini.override ? ENV.GEMINI_CLI_PATH + " path not found" : "not found on PATH"}`);
+    problems.push(
+      gemini.override
+        ? `${ENV.GEMINI_CLI_PATH} is set to ${gemini.override}, but that path does not exist.`
+        : `Gemini CLI not found. Install it (npm i -g @google/gemini-cli) or set ${ENV.GEMINI_CLI_PATH} to its full path.`
+    );
+  }
+
+  heading("Summary");
+  if (problems.length === 0) {
+    console.log(`  ${OK} ${c.green("No problems detected.")}`);
+  } else {
+    console.log(`  ${BAD} ${c.red(`${problems.length} issue(s) found:`)}`);
+    for (const p of problems) console.log(`     - ${p}`);
+  }
+  console.log(c.dim(`\n  Tips:`));
+  console.log(c.dim(`    \`npm run doctor test\`    → build + run live e2e tests`));
+  console.log(c.dim(`    \`npm run doctor judge\`   → build + run semantic LLM judge tests`));
+  console.log("");
+  process.exit(problems.length === 0 ? 0 : 1);
+}
+
+// ── test (automated MCP test, replaces manual mcpjam) ──────────────────────────
+function runTest() {
+  heading("Preflight");
+  const gemini = detectGemini();
+  if (gemini.found) {
+    console.log(`  ${OK} gemini ${gemini.version ? c.cyan(gemini.version) : ""} ${c.dim("(" + gemini.primary + ")")}`);
+  } else {
+    console.log(`  ${WARN} gemini not on PATH — live model tests will skip; only the gemini-independent server tests run.`);
+  }
+
+  heading("Build");
+  const build = spawnSync(isWindows ? "npm.cmd" : "npm", ["run", "build"], {
+    stdio: "inherit",
+    cwd: repoRoot,
+    shell: isWindows,
+  });
+  if (build.status !== 0) {
+    console.log(`  ${BAD} ${c.red("build failed — aborting.")}`);
+    process.exit(build.status ?? 1);
+  }
+  console.log(`  ${OK} build succeeded`);
+
+  heading("E2E suite (real gemini through the MCP server)");
+  const runner = path.join(repoRoot, "scripts", "run-tests.mjs");
+  const e2e = spawnSync(process.execPath, [runner, "e2e"], { stdio: "inherit", cwd: repoRoot });
+  if (e2e.status === 0) {
+    console.log(`\n  ${OK} ${c.green("e2e suite passed — the MCP server works end-to-end.")}`);
+  } else {
+    console.log(`\n  ${BAD} ${c.red("e2e suite failed.")}`);
+  }
+  process.exit(e2e.status ?? 1);
+}
+
+// ── judge (semantic evaluation) ────────────────────────────────────────────────
+function runJudgeTest() {
+  heading("Judge Preflight");
+  const config = detectJudgeKeys();
+  if (config.hasKey) {
+    console.log(`  ${OK} LLM Judge configured via: ${config.keyType}`);
+  } else {
+    console.log(`  ${BAD} No LLM Judge keys found. Please set DEEPSEEK_API_KEY or OPENROUTER_API_KEY in your test/.env file.`);
+    process.exit(1);
+  }
+
+  heading("Build");
+  const build = spawnSync(isWindows ? "npm.cmd" : "npm", ["run", "build"], {
+    stdio: "inherit",
+    cwd: repoRoot,
+    shell: isWindows,
+  });
+  if (build.status !== 0) {
+    console.log(`  ${BAD} ${c.red("build failed — aborting.")}`);
+    process.exit(build.status ?? 1);
+  }
+  console.log(`  ${OK} build succeeded`);
+
+  heading("LLM-as-a-Judge semantic test suite");
+  const runner = path.join(repoRoot, "scripts", "run-tests.mjs");
+  const judgeRun = spawnSync(process.execPath, [runner, "judge"], { stdio: "inherit", cwd: repoRoot });
+  if (judgeRun.status === 0) {
+    console.log(`\n  ${OK} ${c.green("Judge suite passed — semantic checks successful!")}`);
+  } else {
+    console.log(`\n  ${BAD} ${c.red("Judge suite failed.")}`);
+  }
+  process.exit(judgeRun.status ?? 1);
+}
+
+function detectJudgeKeys() {
+  let hasKey = false;
+  let keyType = "";
+  const envPath = path.join(repoRoot, "test", ".env");
+
+  if (process.env.DEEPSEEK_API_KEY) {
+    hasKey = true;
+    keyType = "process.env.DEEPSEEK_API_KEY";
+  } else if (process.env.OPENROUTER_API_KEY) {
+    hasKey = true;
+    keyType = "process.env.OPENROUTER_API_KEY";
+  }
+
+  if (!hasKey && existsSync(envPath)) {
+    try {
+      const content = readFileSync(envPath, "utf-8");
+      if (/DEEPSEEK_API_KEY\s*=\s*[^\s#]+/i.test(content)) {
+        hasKey = true;
+        keyType = "test/.env (DEEPSEEK_API_KEY)";
+      } else if (/OPENROUTER_API_KEY\s*=\s*[^\s#]+/i.test(content)) {
+        hasKey = true;
+        keyType = "test/.env (OPENROUTER_API_KEY)";
+      }
+    } catch {}
+  }
+  return { hasKey, keyType };
+}
+
+// ── dispatch ───────────────────────────────────────────────────────────────────
+const mode = (process.argv[2] || "").toLowerCase();
+if (mode === "test") runTest();
+else if (mode === "judge") runJudgeTest();
+else runReport();
diff --git a/scripts/run-tests.mjs b/scripts/run-tests.mjs
@@ -0,0 +1,86 @@
+#!/usr/bin/env node
+// Category-aware test runner. Discovers *.test.ts under the selected category
+// folders (test/unit, test/integration, test/e2e, test/judge) and runs them with the
+// built-in node:test runner via the tsx loader, so the TypeScript sources run
+// directly.
+//
+// Usage:
+//   node scripts/run-tests.mjs                  # default: unit + integration (hermetic)
+//   node scripts/run-tests.mjs unit             # one category
+//   node scripts/run-tests.mjs integration e2e  # several
+//   node scripts/run-tests.mjs judge            # semantic LLM judge tests
+//   node scripts/run-tests.mjs all              # unit + integration + e2e + judge
+//
+// Categories:
+//   unit         pure, single-module tests. No subprocess, no network, no real CLI.
+//   integration  several real modules wired together. Still hermetic — never the real gemini CLI.
+//   e2e          the real gemini CLI driven through the real MCP server over stdio. Opt-in (live).
+//   judge        live Gemini CLI output evaluated by a second LLM judge. Opt-in (live).
+import { spawnSync } from "node:child_process";
+import { readdirSync, statSync, existsSync } from "node:fs";
+import path from "node:path";
+import { fileURLToPath } from "node:url";
+
+const scriptDir = path.dirname(fileURLToPath(import.meta.url));
+const testDir = path.join(scriptDir, "..", "test");
+
+const KNOWN = ["unit", "integration", "e2e", "judge"];
+const DEFAULT = ["unit", "integration"]; // the hermetic suite `npm test` runs and CI gates on
+
+function resolveCategories(argv) {
+  const args = argv.slice(2).map((a) => a.toLowerCase());
+  if (args.length === 0) return DEFAULT;
+  if (args.includes("all")) return KNOWN;
+  const unknown = args.filter((a) => !KNOWN.includes(a));
+  if (unknown.length > 0) {
+    console.error(`Unknown test category: ${unknown.join(", ")}`);
+    console.error(`Valid categories: ${KNOWN.join(", ")}, all`);
+    process.exit(2);
+  }
+  // De-dupe while preserving the documented order.
+  return KNOWN.filter((c) => args.includes(c));
+}
+
+function findTests(dir) {
+  const found = [];
+  if (!existsSync(dir)) return found;
+  for (const entry of readdirSync(dir)) {
+    const full = path.join(dir, entry);
+    if (statSync(full).isDirectory()) found.push(...findTests(full));
+    else if (entry.endsWith(".test.ts")) found.push(full);
+  }
+  return found;
+}
+
+const categories = resolveCategories(process.argv);
+const tests = categories.flatMap((c) => findTests(path.join(testDir, c)));
+
+if (tests.length === 0) {
+  console.log(`No test files found for: ${categories.join(", ")}`);
+  process.exit(0);
+}
+
+console.log(`Running ${tests.length} test file(s) [${categories.join(", ")}]`);
+
+// tsx requires Node >= 18.19 which always supports --import.
+// The older --loader flag is deprecated and breaks on CI (Node 18.19+/20/22).
+const loaderArgs = ["--import", "tsx"];
+
+// Mute routine [GMCPT] logging for the hermetic categories so the reporter
+// output stays readable. The e2e suite keeps full server logs (its child
+// server process inherits this env), which is useful for debugging live calls.
+const env = { ...process.env };
+if (!categories.includes("e2e")) env.NODE_ENV = "test";
+
+// Run test files serially (--test-concurrency=1). The changeMode chunk cache is
+// a single shared on-disk dir (os.tmpdir()/gemini-mcp-chunks); files that touch
+// it (chunkCache, changeMode-pipeline) would otherwise race across parallel
+// worker processes. Serial e2e also avoids hitting the gemini quota in parallel.
+// The hermetic suite is tiny, so the cost is negligible. (Flag available on the
+// Node 18.19+/20.10+/22 versions CI runs.)
+const result = spawnSync(
+  process.execPath,
+  [...loaderArgs, "--test", "--test-concurrency=1", ...tests],
+  { stdio: "inherit", env },
+);
+process.exit(result.status ?? 1);