Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,25 +9,31 @@ on:
jobs:
test:
runs-on: ubuntu-latest

strategy:
matrix:
node-version: [16.x, 18.x, 20.x]

# node:test (used by the suite) requires Node >= 18.
node-version: [18.x, 20.x, 22.x]

steps:
- uses: actions/checkout@v4

- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}

- name: Install dependencies
run: npm ci

- name: Build
run: npm run build


- name: Type-check (source + tests)
run: npm run lint

# Hermetic suite only (unit + integration). The e2e suite needs an
# authenticated gemini CLI and is run on demand via `npm run test:e2e`
# (or `npm run doctor test`).
- name: Run tests
run: npm test
continue-on-error: true
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,20 @@
# Changelog

## [1.1.7] - 2026-05-31
Reliability patch plus the project's first automated test suite. Hardens cross-platform execution (the Windows fixes and a few robustness guards) and adds a categorized `node:test` suite that gates CI. **No runtime or default-config changes vs 1.1.6** — the only new knob is the opt-in `GEMINI_CLI_PATH`.

- **Windows: stdin prompt passing** — `changeMode` and `@file` prompts are sent to the Gemini CLI on **stdin** instead of the `-p` flag, sidestepping cmd.exe argument parsing and the OS command-line length limit; this also avoids the deprecated-`-p` positional-prompt conflict for those prompts (#48). Adds `windowsHide` to suppress the popup console window. (harvested from #27 via #77)
- **Windows: executable resolution** — honours `GEMINI_CLI_PATH`, otherwise resolves the real `gemini` shim via `where` (preferring `.cmd`), fixing "command not found" when the MCP server doesn't inherit your shell's PATH.
- **Clearer ENOENT guidance** when the executable isn't found, including the `GEMINI_CLI_PATH` hint.
- **stdin EPIPE / spawn-error hardening** — a child that closes stdin early no longer throws an uncaught error that could drop the long-lived server connection (candidate fix for the disconnects in #64).
- **`Help` tool** now invokes `gemini --help` instead of `-help`, which the Gemini CLI's yargs parser split into `-h -e -l -p`.
- **Test suite** — categorized `node:test` coverage under `test/`: **unit** (command quoting / Windows resolution / ENOENT, the `@file` guard, the changeMode parser/chunker/translator, the chunk cache, the tool registry, brainstorm prompt building), **integration** (the changeMode → `fetch-chunk` pipeline and the registry → tool contract, both hermetic), and **e2e** (the real gemini driven through the built MCP server; auto-skips without gemini). `npm test` runs unit+integration and now **gates CI** (Node 18/20/22); `npm run test:e2e` runs the live suite. Includes a regression test for the changeMode cache-miss path (#67).
- **Internal `doctor`** (work in progress) — `npm run doctor` reports node + the detected `gemini` install; `npm run doctor test` builds the server and runs the e2e suite (the automated replacement for manual MCP inspector or costly token burning tests and checks). Excluded from the npm package (`files`/`bin`).
- **LLM judge semantic test suite** (`test/judge/`) — Use DeepSeek or OpenRouter to evaluate tool outputs against validation rubrics. This is a work in progress.
- **Diagnostics logging** — E2E harness now logs the spawned server's working directory (`📂 SPAWNED CWD`) for easier local debugging.

## [1.1.6] - 2026-05-30
_Emergency security patch — the CVE-2026-0755 fix only, ahead of the larger 1.2.0 release._
_Emergency security patch — CVE-2026-0755 fix only._
- Security fix: OS command-injection / `@file` exfiltration via prompt quoting in `geminiExecutor.ts` (CVE-2026-0755, CWE-78). Fixes #73 (and the literal-quote corruption in #66).
- Removed the broken double-quote wrapping from both the primary and fallback paths. With `spawn` running `shell: false`, those quotes were passed as literal characters — they provided no protection and corrupted `@file` references. Windows `.cmd` argument quoting is hardened separately (see below).
- Added `assertSafeFileReferences()`, which rejects any `@file` reference that resolves outside the project working directory (absolute paths, `~` home references, and `../` traversal), closing the arbitrary-file-read exfiltration vector while preserving legitimate in-project `@file` usage.
Expand Down
11 changes: 8 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "gemini-mcp-tool",
"version": "1.1.6",
"version": "1.1.7",
"description": "MCP server for Gemini CLI integration",
"type": "module",
"main": "dist/index.js",
Expand All @@ -11,8 +11,13 @@
"build": "tsc",
"start": "node dist/index.js",
"dev": "tsc && node dist/index.js",
"test": "echo \"No tests yet\" && exit 0",
"lint": "tsc --noEmit",
"doctor": "node scripts/doctor.mjs",
"doctor:judge": "node scripts/doctor.mjs judge",
"test": "node scripts/run-tests.mjs unit integration",
"test:unit": "node scripts/run-tests.mjs unit",
"test:integration": "node scripts/run-tests.mjs integration",
"test:e2e": "npm run build && node scripts/run-tests.mjs e2e",
"lint": "tsc -p tsconfig.test.json",
"contribute": "tsx src/contribute.ts",
"prepublishOnly": "echo '⚠️ Remember to test locally first!' && npm run build",
"docs:dev": "vitepress dev docs",
Expand Down
224 changes: 224 additions & 0 deletions scripts/doctor.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
#!/usr/bin/env node
// gemini-mcp-tool doctor — INTERNAL dev / diagnostic + test tool.
//
// Not published: deliberately excluded from package.json "bin" and "files", so
// it ships with the repo but NOT the npm package. Run it from a checkout:
//
// npm run doctor → report the live system state for the MCP server
// npm run doctor test → preflight + run the e2e suite (the automated MCP
// test that replaces manual mcpjam clicking)
//
// This is the 1.1.7 seed: it reports what 1.1.7 actually has (node, the gemini
// CLI, GEMINI_CLI_PATH) and runs the test suite. Later feature PRs grow it with
// backend / model / approval / timeout diagnostics.
// Self-contained: pure Node, no build step or dependencies.

import { spawnSync } from "node:child_process";
import { existsSync, readFileSync } from "node:fs";
import path from "node:path";
import { fileURLToPath } from "node:url";

const ENV = {
GEMINI_CLI_PATH: "GEMINI_CLI_PATH", // explicit path to the gemini executable
};

const isWindows = process.platform === "win32";
const useColor = process.stdout.isTTY && !process.env.NO_COLOR;
const paint = (code, s) => (useColor ? `\x1b[${code}m${s}\x1b[0m` : s);
const c = {
bold: (s) => paint("1", s),
dim: (s) => paint("2", s),
green: (s) => paint("32", s),
yellow: (s) => paint("33", s),
red: (s) => paint("31", s),
cyan: (s) => paint("36", s),
};
const OK = c.green("✓");
const WARN = c.yellow("⚠");
const BAD = c.red("✗");

const repoRoot = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "..");

function heading(title) {
console.log("\n" + c.bold(title));
console.log(c.dim("─".repeat(Math.max(title.length, 16))));
}

function runCmd(cmd, args) {
try {
const executable = isWindows && /\s/.test(cmd) ? `"${cmd.replace(/"/g, '""')}"` : cmd;
const r = spawnSync(executable, args, { encoding: "utf8", timeout: 20000, shell: isWindows, windowsHide: true });
if (r.error) return { ok: false, err: r.error.message };
return { ok: r.status === 0, status: r.status, out: (r.stdout || "").trim(), err: (r.stderr || "").trim() };
} catch (e) {
return { ok: false, err: e instanceof Error ? e.message : String(e) };
}
}

function locate(cmd) {
const r = runCmd(isWindows ? "where" : "which", [cmd]);
if (!r.ok || !r.out) return [];
return r.out.split(/\r?\n/).map((s) => s.trim()).filter(Boolean);
}

// Mirror commandExecutor's resolution: honour GEMINI_CLI_PATH, else PATH.
function detectGemini() {
const override = (process.env[ENV.GEMINI_CLI_PATH] || "").trim();
const pathCandidates = locate("gemini");
const candidates = override ? [override, ...pathCandidates.filter((p) => p !== override)] : pathCandidates;
let primary = override || null;
if (!primary && candidates.length > 0) {
if (isWindows) {
const byExt = (ext) => candidates.find((c) => c.toLowerCase().endsWith(ext));
primary = byExt(".cmd") || byExt(".exe") || byExt(".bat") || candidates[0];
} else {
primary = candidates[0];
}
}
const found = override ? existsSync(override) : candidates.length > 0;
let version = null;
if (found && primary) {
const v = runCmd(primary, ["--version"]);
if (v.ok && v.out) version = v.out.split(/\r?\n/)[0].trim();
}
Comment thread
jamubc marked this conversation as resolved.
return { found: !!found, primary, candidates, override: override || null, version };
}
Comment thread
jamubc marked this conversation as resolved.

// ── report ───────────────────────────────────────────────────────────────────
function runReport() {
const problems = [];

heading("System");
console.log(` node ${process.version}`);
console.log(` platform ${process.platform} (${process.arch})`);

heading("Gemini CLI");
const gemini = detectGemini();
if (gemini.found) {
console.log(` ${OK} found${gemini.override ? " (via " + ENV.GEMINI_CLI_PATH + ")" : ""}`);
console.log(` path ${gemini.primary}`);
console.log(` version ${gemini.version ? c.cyan(gemini.version) : c.yellow("(could not read --version)")}`);
if (gemini.candidates.length > 1) console.log(c.dim(` also on PATH: ${gemini.candidates.slice(1).join(", ")}`));
} else {
console.log(` ${BAD} ${gemini.override ? ENV.GEMINI_CLI_PATH + " path not found" : "not found on PATH"}`);
problems.push(
gemini.override
? `${ENV.GEMINI_CLI_PATH} is set to ${gemini.override}, but that path does not exist.`
: `Gemini CLI not found. Install it (npm i -g @google/gemini-cli) or set ${ENV.GEMINI_CLI_PATH} to its full path.`
);
}

heading("Summary");
if (problems.length === 0) {
console.log(` ${OK} ${c.green("No problems detected.")}`);
} else {
console.log(` ${BAD} ${c.red(`${problems.length} issue(s) found:`)}`);
for (const p of problems) console.log(` - ${p}`);
}
console.log(c.dim(`\n Tips:`));
console.log(c.dim(` \`npm run doctor test\` → build + run live e2e tests`));
console.log(c.dim(` \`npm run doctor judge\` → build + run semantic LLM judge tests`));
console.log("");
process.exit(problems.length === 0 ? 0 : 1);
}

// ── test (automated MCP test, replaces manual mcpjam) ──────────────────────────
function runTest() {
heading("Preflight");
const gemini = detectGemini();
if (gemini.found) {
console.log(` ${OK} gemini ${gemini.version ? c.cyan(gemini.version) : ""} ${c.dim("(" + gemini.primary + ")")}`);
} else {
console.log(` ${WARN} gemini not on PATH — live model tests will skip; only the gemini-independent server tests run.`);
}

heading("Build");
const build = spawnSync(isWindows ? "npm.cmd" : "npm", ["run", "build"], {
stdio: "inherit",
cwd: repoRoot,
shell: isWindows,
});
if (build.status !== 0) {
console.log(` ${BAD} ${c.red("build failed — aborting.")}`);
process.exit(build.status ?? 1);
}
console.log(` ${OK} build succeeded`);

heading("E2E suite (real gemini through the MCP server)");
const runner = path.join(repoRoot, "scripts", "run-tests.mjs");
const e2e = spawnSync(process.execPath, [runner, "e2e"], { stdio: "inherit", cwd: repoRoot });
if (e2e.status === 0) {
console.log(`\n ${OK} ${c.green("e2e suite passed — the MCP server works end-to-end.")}`);
} else {
console.log(`\n ${BAD} ${c.red("e2e suite failed.")}`);
}
process.exit(e2e.status ?? 1);
}

// ── judge (semantic evaluation) ────────────────────────────────────────────────
function runJudgeTest() {
heading("Judge Preflight");
const config = detectJudgeKeys();
if (config.hasKey) {
console.log(` ${OK} LLM Judge configured via: ${config.keyType}`);
} else {
console.log(` ${BAD} No LLM Judge keys found. Please set DEEPSEEK_API_KEY or OPENROUTER_API_KEY in your test/.env file.`);
process.exit(1);
}

heading("Build");
const build = spawnSync(isWindows ? "npm.cmd" : "npm", ["run", "build"], {
stdio: "inherit",
cwd: repoRoot,
shell: isWindows,
});
if (build.status !== 0) {
console.log(` ${BAD} ${c.red("build failed — aborting.")}`);
process.exit(build.status ?? 1);
}
console.log(` ${OK} build succeeded`);

heading("LLM-as-a-Judge semantic test suite");
const runner = path.join(repoRoot, "scripts", "run-tests.mjs");
const judgeRun = spawnSync(process.execPath, [runner, "judge"], { stdio: "inherit", cwd: repoRoot });
if (judgeRun.status === 0) {
console.log(`\n ${OK} ${c.green("Judge suite passed — semantic checks successful!")}`);
} else {
console.log(`\n ${BAD} ${c.red("Judge suite failed.")}`);
}
process.exit(judgeRun.status ?? 1);
}

function detectJudgeKeys() {
let hasKey = false;
let keyType = "";
const envPath = path.join(repoRoot, "test", ".env");

if (process.env.DEEPSEEK_API_KEY) {
hasKey = true;
keyType = "process.env.DEEPSEEK_API_KEY";
} else if (process.env.OPENROUTER_API_KEY) {
hasKey = true;
keyType = "process.env.OPENROUTER_API_KEY";
}

if (!hasKey && existsSync(envPath)) {
try {
const content = readFileSync(envPath, "utf-8");
if (/DEEPSEEK_API_KEY\s*=\s*[^\s#]+/i.test(content)) {
hasKey = true;
keyType = "test/.env (DEEPSEEK_API_KEY)";
} else if (/OPENROUTER_API_KEY\s*=\s*[^\s#]+/i.test(content)) {
hasKey = true;
keyType = "test/.env (OPENROUTER_API_KEY)";
}
} catch {}
}
return { hasKey, keyType };
}

// ── dispatch ───────────────────────────────────────────────────────────────────
const mode = (process.argv[2] || "").toLowerCase();
if (mode === "test") runTest();
else if (mode === "judge") runJudgeTest();
else runReport();
86 changes: 86 additions & 0 deletions scripts/run-tests.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#!/usr/bin/env node
// Category-aware test runner. Discovers *.test.ts under the selected category
// folders (test/unit, test/integration, test/e2e, test/judge) and runs them with the
// built-in node:test runner via the tsx loader, so the TypeScript sources run
// directly.
//
// Usage:
// node scripts/run-tests.mjs # default: unit + integration (hermetic)
// node scripts/run-tests.mjs unit # one category
// node scripts/run-tests.mjs integration e2e # several
// node scripts/run-tests.mjs judge # semantic LLM judge tests
// node scripts/run-tests.mjs all # unit + integration + e2e + judge
//
// Categories:
// unit pure, single-module tests. No subprocess, no network, no real CLI.
// integration several real modules wired together. Still hermetic — never the real gemini CLI.
// e2e the real gemini CLI driven through the real MCP server over stdio. Opt-in (live).
Comment thread
jamubc marked this conversation as resolved.
Comment thread
jamubc marked this conversation as resolved.
// judge live Gemini CLI output evaluated by a second LLM judge. Opt-in (live).
import { spawnSync } from "node:child_process";
import { readdirSync, statSync, existsSync } from "node:fs";
import path from "node:path";
import { fileURLToPath } from "node:url";

const scriptDir = path.dirname(fileURLToPath(import.meta.url));
const testDir = path.join(scriptDir, "..", "test");

const KNOWN = ["unit", "integration", "e2e", "judge"];
const DEFAULT = ["unit", "integration"]; // the hermetic suite `npm test` runs and CI gates on

function resolveCategories(argv) {
const args = argv.slice(2).map((a) => a.toLowerCase());
if (args.length === 0) return DEFAULT;
if (args.includes("all")) return KNOWN;
const unknown = args.filter((a) => !KNOWN.includes(a));
if (unknown.length > 0) {
console.error(`Unknown test category: ${unknown.join(", ")}`);
console.error(`Valid categories: ${KNOWN.join(", ")}, all`);
process.exit(2);
}
// De-dupe while preserving the documented order.
return KNOWN.filter((c) => args.includes(c));
}

function findTests(dir) {
const found = [];
if (!existsSync(dir)) return found;
for (const entry of readdirSync(dir)) {
const full = path.join(dir, entry);
if (statSync(full).isDirectory()) found.push(...findTests(full));
else if (entry.endsWith(".test.ts")) found.push(full);
}
return found;
}

const categories = resolveCategories(process.argv);
const tests = categories.flatMap((c) => findTests(path.join(testDir, c)));

if (tests.length === 0) {
console.log(`No test files found for: ${categories.join(", ")}`);
process.exit(0);
}

console.log(`Running ${tests.length} test file(s) [${categories.join(", ")}]`);

// tsx requires Node >= 18.19 which always supports --import.
// The older --loader flag is deprecated and breaks on CI (Node 18.19+/20/22).
const loaderArgs = ["--import", "tsx"];

// Mute routine [GMCPT] logging for the hermetic categories so the reporter
// output stays readable. The e2e suite keeps full server logs (its child
// server process inherits this env), which is useful for debugging live calls.
const env = { ...process.env };
if (!categories.includes("e2e")) env.NODE_ENV = "test";

// Run test files serially (--test-concurrency=1). The changeMode chunk cache is
// a single shared on-disk dir (os.tmpdir()/gemini-mcp-chunks); files that touch
// it (chunkCache, changeMode-pipeline) would otherwise race across parallel
// worker processes. Serial e2e also avoids hitting the gemini quota in parallel.
// The hermetic suite is tiny, so the cost is negligible. (Flag available on the
// Node 18.19+/20.10+/22 versions CI runs.)
const result = spawnSync(
process.execPath,
[...loaderArgs, "--test", "--test-concurrency=1", ...tests],
{ stdio: "inherit", env },
);
process.exit(result.status ?? 1);
Loading
Loading