fix: restore 8 silently-reverted parser fixes + repair _apply_tool_filter for fastmcp>=3 (closes #361) by npkriami18 · Pull Request #365 · tirth8205/code-review-graph

npkriami18 · 2026-04-21T14:19:10Z

Closes #361.

Result

# Before (upstream/main @ b0f8527)
60 failed, 1015 passed, 1 skipped, 2 xpassed, 6 errors

# After (this branch)
0 failed, 1081 passed, 1 skipped, 2 xpassed, 0 errors

A single, self-contained PR that takes the suite to fully green.

Two distinct bug classes addressed

Class 1 — silently-reverted parser PRs (9 cherry-picks, original authorship preserved)

Two chore: resolve merge conflicts with main commits (3cf31ec, cb9d25a) accepted the stale feature-branch side of code_review_graph/parser.py for the ReScript PR (#323) and Julia PR (#319). Both feature branches forked from db2d2df — before the fixes below landed. Because only main had touched those regions, git merge auto-resolved silently in the wrong direction (no conflict markers).

Commit	PR	Author	Tests restored
`fe383c7`	#285	@michael-denyer	7 module-scope CALLS
`e6e3144`	#276	@azizur100389	10 shebang-detection + `.ksh`
`aa627fb`	#274	@azizur100389	1 Databricks CRLF
`f092922` + `425810b` + `536fd4b`	#319	@danvinci	21 Julia parser
`8104eb7`	#278	@krmahadevan	3 Java inheritance
`9a88f20`	#275	@krmahadevan	1 Java method-name extraction
`088c281`	#280	@krmahadevan	2 Java import resolution
`112a442`	#316	@Mornor	9 GDScript
`d660e02`	#298	@bakulb-bansal	1 PHP CALL extraction

Class 2 — fastmcp>=3 API drift (1 commit)

a3a043b (feat(serve): add --tools flag and CRG_TOOLS env var for MCP tool filtering) was authored against the fastmcp 2.x internal API. The pinned fastmcp>=3 removed mcp._tool_manager._tools and mcp.get_tools(), leaving:

_apply_tool_filter raising AttributeError whenever --tools or CRG_TOOLS was set (6 test errors in TestApplyToolFilter)
a stale test_stdio_calls_mcp_run_stdio assertion (PR fix: disable FastMCP banner for stdio transport #290 had added show_banner=False to the production call without refreshing the test)

This PR:

Rewrites _apply_tool_filter to enumerate via the public await mcp.list_tools() and remove via mcp.local_provider.remove_tool(name) (the non-deprecated fastmcp 3 API). Includes a concurrent.futures.ThreadPoolExecutor fallback for callers that already have a running event loop, so the function stays sync-callable from both main() and async test bodies.
Refreshes TestServeMainTransport.test_stdio_calls_mcp_run_stdio to expect show_banner=False.
Rewrites TestApplyToolFilter._restore_tools to snapshot via await mcp.list_tools() and restore via mcp.add_tool(tool).
Adds an autouse module-level fixture in tests/test_main.py that strips CRG_TOOLS from the env, so a developer who has CRG_TOOLS exported can't accidentally permanently shrink the global tool registry mid-run (caught during a rubber-duck pre-merge review).

Conflict resolution log (during cherry-picks)

code_review_graph/incremental.py (fix: resolve all 8 Windows test failures — UTF-8, CRLF, FastMCP API, stop_at (#239, #241) #274): kept HEAD to preserve SVN support added later.
tests/test_main.py (fix: resolve all 8 Windows test failures — UTF-8, CRLF, FastMCP API, stop_at (#239, #241) #274): kept HEAD to preserve TestApplyToolFilter from a3a043b.
code_review_graph/parser.py:
- EXTENSION_TO_LANGUAGE: kept both .res/.resi and .gd.
- _CLASS_TYPES: kept full Julia list and gdscript entry.
- _CALL_TYPES: kept full Julia list and gdscript entry.
- _extract_import: concatenated Julia and GDScript elif branches (mutually exclusive — disjoint by language).
- _TEST_PATH_PATTERNS: merged ReScript + Julia regex entries.
- _extract_call_name: concatenated Julia macrocall block + PHP block (mutually exclusive — disjoint by language).
tests/test_multilang.py: kept HEAD to preserve TestJuliaParsing and ReScript cross-module resolver tests.

Quality gate

✅ uv run pytest tests/ — 0 failed, 0 errors, 1081 passed
✅ uv run ruff check code_review_graph/parser.py code_review_graph/incremental.py tests/test_main.py tests/test_multilang.py — clean
✅ uv run mypy code_review_graph/main.py code_review_graph/parser.py code_review_graph/incremental.py --ignore-missing-imports --no-strict-optional — clean
✅ Rubber-duck pre-merge review surfaced one issue (env-sensitive CRG_TOOLS leak across test classes) — fixed before push.
✅ Pre-existing I001 on code_review_graph/main.py left untouched (not introduced here; out of scope).

Non-goals

No version bump, no CHANGELOG edit — release notes belong to the maintainer.
No Copilot/skills work (that's feature: Add GitHub Copilot support (CLI and CHAT) ‼️‼️‼️ #129).

cc @tirth8205 — single PR, fully green. Original authorship preserved on every restored fix; pinging @azizur100389 @michael-denyer @danvinci @krmahadevan @Mornor @bakulb-bansal as the original PR authors in case the cherry-pick changed any intent.

azizur100389 · 2026-04-21T14:23:40Z

Thanks for the expanded scope @npkriami18 — taking the suite from 60 failed → 0 failed in a single PR is a serious service to the project, and the writeup on #361 is genuinely the clearest diagnosis of this class of silent-revert I've seen.

Re-verified from my end:

e4f0ae3 (feat: .ksh extension + shebang-based language detection for extension-less scripts (#235, #237) #276 — .ksh + shebang) — SHEBANG_INTERPRETER_TO_LANGUAGE table, _detect_language_from_shebang(), .ksh → bash, and the 10 regression tests in tests/test_parser.py::TestCodeParser::test_detect_shebang_* + tests/test_multilang.py::TestBashParsing::test_ksh_extension_parses_as_bash, all intact byte-for-byte.
de956fe (fix: resolve all 8 Windows test failures — UTF-8, CRLF, FastMCP API, stop_at (#239, #241) #274 — Windows CRLF/UTF-8/stop_at) — CRLF-tolerant Databricks detection, encoding="utf-8" on auto-generated .gitignore, stop_at parameter on find_repo_root / find_project_root, and the 15 regression tests, all intact.

Author is azizur100389 <azizur100389@gmail.com> on both — authorship preserved correctly.

The Class 2 fastmcp>=3 fix is particularly elegant. A few observations I endorse:

Using await mcp.list_tools() + mcp.local_provider.remove_tool(name) is exactly the right move — both are on the public surface of fastmcp 3 and the deprecation notices agree. The concurrent.futures.ThreadPoolExecutor fallback for "caller already in an event loop" is the pattern the fastmcp maintainers use themselves, so it will not rot.
Catching the CRG_TOOLS env-leak during the rubber-duck review is the kind of defensive review that usually only surfaces after an angry bug report. Good catch.
Refreshing test_stdio_calls_mcp_run_stdio to expect show_banner=False is the right call — PR fix: disable FastMCP banner for stdio transport #290 added that to the production call and this test assertion was lagging behind. Would not have caught it without running the whole suite.

Conflict-resolution log matches what I'd have done for the files I know (incremental.py HEAD side for SVN, tests/test_main.py HEAD side for TestApplyToolFilter, concatenated Julia + GDScript branches in _extract_import / _extract_call_name since they are mutually exclusive by language). No objections.

One tiny callout for @tirth8205 if it's useful for the merge decision: the way PR #274's Databricks CRLF fix is laid out, it's a pure addition on top of the existing no-CRLF branch — so even if a future merge conflict re-triggers the silent-revert class of bug, the damage is contained to Databricks auto-detection, not to any other parser path. I'd keep that surface-area guarantee in mind for any further refactor.

Strongly approve on the restoration of my two commits as-is. Rest of the content I trust the original authors and the diff to speak for — @krmahadevan @michael-denyer @danvinci @Mornor @bakulb-bansal best-positioned to confirm intent on their own patches.

krmahadevan · 2026-04-21T14:30:53Z

@npkriami18 - Am a bit confused. Please help me out here. 3 of my PRs were already merged into main by @tirth8205. Can you please let me know if there’s anything wrong with them or if they got reverted or something?

npkriami18 · 2026-04-21T14:49:16Z

@krmahadevan no problem at all — your 3 PRs (#275, #278, #280) were perfectly fine and did merge into main. Nothing wrong with them.

What happened: a couple of later "chore: resolve merge conflicts with main" commits (3cf31ec, cb9d25a) silently accepted the stale feature-branch side of code_review_graph/parser.py for the ReScript and Julia merges. Those branches had forked from db2d2df — before your fixes landed — and because only main had touched the Java code regions, git merge auto-resolved without conflict markers in the wrong direction. Net effect: your Java fixes got wiped from parser.py even though their commits still show in git log.

Full root-cause writeup with blame evidence is in #361. PR #365 just cherry-picks your 3 commits back (with -x, so authorship is preserved) along with 5 other PRs that hit the same trap. Your code is unchanged — just being put back where it belongs.

npkriami18 · 2026-04-21T14:53:16Z

@tirth8205 friendly ping 🙏

Quick summary of where this PR stands:

What it does: restores 8 PRs that were silently wiped from code_review_graph/parser.py by the 3cf31ec and cb9d25a "chore: resolve merge conflicts with main" commits (the ReScript feat: add ReScript language support (#309 with conflict resolution) #323 and Julia feat(parser): comprehensive Julia language support #319 branches had forked from db2d2df before those fixes landed, and git merge auto-resolved without conflict markers in the wrong direction). Full root-cause writeup with blame evidence in Parser fixes from #274, #276, #285 (and Julia impl from #319) were silently reverted by merge-conflict resolutions #361.
Authorship preserved: every commit is git cherry-pick -x of the original author's work (@krmahadevan, @azizur100389, and others) — I'm just putting their code back where it belongs.
Also fixes the 6 _tool_manager errors from a3a043b by rewriting _apply_tool_filter against the public FastMCP 3.x API.
Test status: 0 failed, 0 errors, 1081 passed (was 60 failed / 6 errors on main). Ruff + mypy clean.

Would really appreciate a review and merge whenever you have a moment 🙇 Closes #361.

krmahadevan · 2026-04-21T14:53:36Z

@npkriami18 - Thank you for clarifying that. I got confused because this PR says it is merging 12 commits but files changed are 3 and none of them were the files that i had changed as part of my 3 PRs. Thank you so much for helping out on this.

npkriami18 · 2026-04-21T14:54:04Z

@tirth8205 quick note on merge order 🙏

Please merge this PR (#365) first, then #129.

Reason: #365 restores main to a green test suite (0 failed / 0 errors / 1081 passed — currently 60 failed / 6 errors). Once it's in, #129 (Copilot support) rebases cleanly on top of a healthy main and CI on #129 will actually be meaningful. Doing it the other way around means #129 lands on top of a broken baseline and any failures there get muddled with the pre-existing regressions.

npkriami18 · 2026-04-21T14:56:28Z

Good news on conflicts — checked the changed-file lists for both PRs:

fix: restore 8 silently-reverted parser fixes + repair _apply_tool_filter for fastmcp>=3 (closes #361) #365: code_review_graph/main.py, code_review_graph/parser.py, tests/test_main.py
feature: Add GitHub Copilot support (CLI and CHAT) ‼️‼️‼️ #129: README.md, code_review_graph/cli.py, code_review_graph/skills.py, docs/USAGE.md, tests/test_skills.py

Zero file overlap → no merge conflicts in either order. The recommendation to merge #365 first is purely about CI signal quality (so #129 lands on a green baseline), not about conflict avoidance.

The parser gated CALLS edge emission on `enclosing_func` being set, so calls made from module scope (top-level script glue, CLI entrypoints, `if __name__ == "__main__"` blocks, and Jupyter/Databricks notebook cells) produced zero CALLS edges. Any function invoked only from those contexts was flagged as dead by `find_dead_code`, even when the function was the entire reason the script existed. Notebooks are particularly affected because every cell is module-scope by definition, so the existing notebook parser (PR tirth8205#69) emitted nodes and IMPORTS_FROM edges but no CALLS edges — making the dead-code detector's notebook coverage vacuous. Fix: when `enclosing_func` is None, attribute the CALLS edge to the File node instead of dropping it. Matches the existing convention used by `_extract_value_references` and CONTAINS edges. Applied to all 5 gated emission sites: generic Python/JS/TS path, JSX components, Elixir, Solidity `emit`, and R. Downstream: `detect_entry_points` now filters File-sourced CALLS via `get_all_call_targets(include_file_sources=False)` so script-only callees remain detectable as entry points (otherwise `run_job()` called from `script.py` module scope would look "called" by `script.py` and disappear from flow analysis). Verified end-to-end against a Databricks `.ipynb` that calls `Predict.extract_data_from_sample_ids()` from cell-level code: edge count went from 0 to 14 CALLS edges, and `find_dead_code` no longer flags the method. Tests: - `test_module_scope_calls_attributed_to_file` — bare `.py` script - `test_module_scope_calls_in_notebook` — `.ipynb` file - `test_detect_entry_points_module_scope_caller_is_still_root` — flow analysis treats File-sourced CALLS correctly - `test_module_scope_caller_prevents_dead_code_flag` — end-to-end parse → store → find_dead_code - `test_if_main_block_caller_prevents_dead_code_flag` — same for `__main__` block (cherry picked from commit fe383c7)

@tirth8205

…on-less scripts (tirth8205#276) Two parser improvements that expand code-review-graph's file coverage to extension-less Unix scripts and Korn shell files. Feature 1: .ksh extension → bash parser (tirth8205#235) ----------------------------------------------- Register .ksh (Korn shell) with tree-sitter-bash alongside the existing .sh / .bash / .zsh entries shipped in v2.3.0. Korn shell is close enough to bash syntactically that tree-sitter-bash handles the structural features the graph captures correctly. Context: in the close comment on PR tirth8205#230, @tirth8205 explicitly flagged this as worth adding: "The .ksh extension in particular looks worth adding — I didn't include it in tirth8205#227." Tests: test_detects_language extended with .ksh assertion; test_ksh_extension_parses_as_bash — end-to-end regression test that copies sample.sh to a temp .ksh file, parses it, and asserts identical function set and edge counts. Feature 2: shebang-based language detection (tirth8205#237) -------------------------------------------------- detect_language() was extension-only — any file with no extension returned None and was silently skipped. This misses a huge category of production files: git hooks, CI scripts, bin/ entry points, installers. New SHEBANG_INTERPRETER_TO_LANGUAGE table maps common interpreter basenames to languages already registered: bash/sh/zsh/ksh/dash/ash -> bash python/python2/python3/pypy/pypy3 -> python node/nodejs -> javascript ruby, perl, lua, Rscript, php New _detect_language_from_shebang(path) static method reads the first 256 bytes, handles direct form (#!/bin/bash), env indirection (#!/usr/bin/env bash), env -S flags, trailing flags (#!/bin/bash -e), CRLF, binary content, and strict UTF-8 decoding. detect_language() now falls back to the shebang probe for files with no extension (suffix == ""). Files with a known extension are never re-read — extension-based detection stays authoritative. Tests (16 new in test_parser.py): every interpreter mapping, env -S flag, trailing flags, missing shebang, empty file, binary content, unknown interpreter, extension-does-not-get-overridden, and end-to-end parse_file producing function nodes from an extension-less bash script. Files changed ------------- - code_review_graph/parser.py — .ksh mapping + SHEBANG_INTERPRETER_TO_LANGUAGE table + _detect_language_from_shebang() + detect_language() fallback - tests/test_multilang.py — .ksh detection + end-to-end ksh parsing test - tests/test_parser.py — 16 shebang detection tests (cherry picked from commit e6e3144)

…irth8205#274) Five root-cause fixes that together resolve every pre-existing Windows test failure on main. Each fix has targeted regression tests; the net effect is full green CI on Windows (8 failures -> 0). Bug 1: get_data_dir() writes non-UTF-8 .gitignore on Windows (tirth8205#239) -------------------------------------------------------------------- write_text() called without encoding="utf-8". The em-dash in the header is U+2014 which Python encodes as cp1252 byte 0x97 on Windows. Any later UTF-8 read fails with UnicodeDecodeError. Fix: add encoding="utf-8" (matches sibling _ensure_repo_gitignore). Test: test_auto_gitignore_is_valid_utf8 — asserts UTF-8 byte sequence, rejects cp1252 byte. Bug 2: Databricks notebook detection fails on CRLF line endings (tirth8205#239) ---------------------------------------------------------------------- source.startswith(b"# Databricks notebook source\n") hard-codes LF. Windows git checkout (core.autocrlf=true) produces CRLF. All Databricks handling silently bypassed — 4 tests fail. Fix: parse first line robustly, strip trailing \r before exact match. Tests: test_databricks_header_crlf_line_endings, test_databricks_header_lf_line_endings_still_work, test_databricks_header_prefix_false_positive_rejected. Bug 3: Stale FastMCP API in async regression guard (tirth8205#239) --------------------------------------------------------- test_heavy_tools_are_coroutines called mcp.get_tools() which does not exist in fastmcp>=2.14.0 (pinned in pyproject.toml). The guard has been silently broken since it was written — the protection promised by PR tirth8205#231 for tirth8205#46/tirth8205#136 was never actually enforced. Fix: resolve tools via getattr(crg_main, name) like the sibling test. Drop @pytest.mark.asyncio since no event loop is needed. Test: test_regression_guard_does_not_depend_on_fastmcp_internals — AST-walks the guard source to ensure no mcp internal API references. Bug 4-5: find_repo_root walks above test sandbox (tirth8205#241) ------------------------------------------------------- test_returns_none_without_git and test_falls_back_to_start fail on any machine where tmp_path has a git-initialized ancestor (dotfiles repo at ~/.git — very common on developer machines). Fix: add optional stop_at parameter to find_repo_root() and find_project_root(). When set, the walk examines stop_at for .git and then stops. Default is None (existing walk-to-root behavior). Fully backward-compatible — all 7 production callers unchanged. Tests: test_stop_at_prevents_escape_to_outer_git, test_stop_at_finds_git_at_boundary, test_stop_at_forwarded_to_find_repo_root. Files changed ------------- - code_review_graph/incremental.py — encoding fix + stop_at API - code_review_graph/parser.py — CRLF-tolerant Databricks detection - tests/test_incremental.py — gitignore UTF-8 guard + stop_at tests - tests/test_main.py — fixed async guard + meta-guard - tests/test_notebook.py — CRLF + LF + false-positive guards (cherry picked from commit aa627fb)

(cherry picked from commit f092922)

(cherry picked from commit 425810b)

(cherry picked from commit 536fd4b)

…odes (tirth8205#278) tree-sitter-java wraps extends/implements clauses in superclass and super_interfaces nodes whose .text includes the keyword. The _get_bases() function was storing the full text (e.g. "implements UserRepository") as the INHERITS edge target instead of just the type name. This caused inheritors_of queries to fail — the query looks up edges by the qualified class name or bare name, neither of which matches "implements UserRepository". Adds a Java-specific branch in _get_bases() that drills into the AST children to extract type_identifier nodes (including generic_type for parameterized interfaces like IBar<String>). C#/Kotlin are unaffected and retain the existing behavior. (cherry picked from commit 8104eb7)

…h8205#275) tree-sitter-java places type_identifier (return type) before identifier (method name) in method_declaration nodes. The generic _get_name() loop matched type_identifier first, causing methods to be indexed under their return type instead of their actual name. For example: * `public String getName()` was indexed as "String" instead of "getName", and * `public ConfigBean getUtilityIngestionBean()`was indexed as "ConfigBean". This broke callers_of,callees_of, and children_of queries for any Java method with a non-void, non-generic return type. Adds a Java-specific branch in _get_name() that returns the first identifier child for method_declaration nodes, following the same pattern as the Go fix (field_identifier) from PR tirth8205#166. Kotlin & Scala is unaffected — its syntax places the name before the return type. (cherry picked from commit 9a88f20)

Java imports like `import com.example.auth.User` were stored as raw dot-notation strings because _do_resolve_module() had no Java branch. This caused `importers_of` queries to return 0 — the query looks for file path targets, but the stored edges had raw import strings. Adds a Java branch that converts dot-notation to a relative path (com/example/auth/User.java) and walks up from the caller's directory to find the source root. This resolves same-source-root imports (the common case in Maven modules). Also handles: - Static imports (import static pkg.Class.member) — strips the member name and resolves to the class file - Wildcard imports (import pkg.*) — skipped, can't resolve to one file - JDK/library imports (java.util.*) — remain unresolved (no local file) (cherry picked from commit 088c281)

(cherry picked from commit 112a442)

…alls (tirth8205#298) (cherry picked from commit d660e02)

Two pre-existing failures on main, both caused by the 'a3a043b' feature landing against the fastmcp 2.x private API and never being updated for the pinned fastmcp>=3: 1. _apply_tool_filter accessed mcp._tool_manager._tools which was removed when fastmcp rewrote the tool registry. Replaced with the public async mcp.list_tools() + mcp.local_provider.remove_tool(), with a thread-pool fallback for callers that already have a running event loop (tests, future integrations). 2. TestServeMainTransport.test_stdio_calls_mcp_run_stdio asserted exact kwargs {transport: stdio} but PR tirth8205#290 (cc169af) added show_banner=False to the production call without refreshing the test. Updated the assertion. 3. TestApplyToolFilter relied on mcp._tool_manager._tools for its snapshot/restore fixture and on await mcp.get_tools() for its assertions. Neither exists on fastmcp>=3. Rewrote the fixture to snapshot via await mcp.list_tools() + restore via mcp.add_tool(), and the assertions to use list_tools() directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

npkriami18 mentioned this pull request Apr 21, 2026

feature: Add GitHub Copilot support (CLI and CHAT) ‼️‼️‼️ #129

Open

michael-denyer and others added 12 commits April 21, 2026 22:26

parser julia-specific helpers

0d76330

(cherry picked from commit f092922)

add parametric constructors

a525af5

(cherry picked from commit 425810b)

support for @enum and public

d6206e5

(cherry picked from commit 536fd4b)

feat: add GDScript (Godot) as a supported language (tirth8205#316)

3bd076e

(cherry picked from commit 112a442)

fix: PHP CALL extraction — method, static, and unqualified function c…

1e05d73

…alls (tirth8205#298) (cherry picked from commit d660e02)

npkriami18 force-pushed the fix/restore-all-parser-regressions branch from 6d3f4d4 to 7a3c915 Compare April 21, 2026 16:56

azizur100389 mentioned this pull request Apr 22, 2026

docs: add Multi-user / multi-workstation guide to README (closes #357) #373

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore 8 silently-reverted parser fixes + repair _apply_tool_filter for fastmcp>=3 (closes #361)#365

fix: restore 8 silently-reverted parser fixes + repair _apply_tool_filter for fastmcp>=3 (closes #361)#365
npkriami18 wants to merge 12 commits intotirth8205:mainfrom
npkriami18:fix/restore-all-parser-regressions

npkriami18 commented Apr 21, 2026

Uh oh!

azizur100389 commented Apr 21, 2026

Uh oh!

krmahadevan commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

krmahadevan commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

npkriami18 commented Apr 21, 2026

Result

Two distinct bug classes addressed

Class 1 — silently-reverted parser PRs (9 cherry-picks, original authorship preserved)

Class 2 — fastmcp>=3 API drift (1 commit)

Conflict resolution log (during cherry-picks)

Quality gate

Non-goals

Uh oh!

azizur100389 commented Apr 21, 2026

Uh oh!

krmahadevan commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

krmahadevan commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants