fix: restore dropped parser fixes from #274 / #276 / #285 (+ Julia impl from #319) by npkriami18 · Pull Request #362 · tirth8205/code-review-graph

npkriami18 · 2026-04-21T13:47:02Z

Fixes #361.

Context

Four already-merged fixes were silently reverted on main by two chore: resolve merge conflicts with main commits (3cf31ec, cb9d25a). Neither conflict emitted markers because the feature branches had not touched those regions at all — only the main side had them — so git merge produced a clean auto-resolution with the wrong side accepted. See #361 for the full root-cause analysis with blame evidence.

What this PR does

Cherry-picks the four commits back onto main, preserving authorship:

Commit	PR	Author	Tests restored
`fe383c7`	#285	@michael-denyer	7 module-scope CALLS tests
`e6e3144`	#276	@azizur100389	10 shebang-detection tests
`aa627fb`	#274	@azizur100389	1 Databricks CRLF test
`f092922` + `425810b` + `536fd4b`	#319	@danvinci	21 Julia parser tests

Conflict resolution notes:

aa627fb: only code_review_graph/parser.py (the one file on main that actually reverted); incremental.py stop_at changes and test scaffolding were still on main, so the no-op hunks dropped out. tests/test_main.py HEAD-side kept (preserves TestApplyToolFilter introduced by a3a043b).
incremental.py: HEAD side kept (retains SVN support added after fix: resolve all 8 Windows test failures — UTF-8, CRLF, FastMCP API, stop_at (#239, #241) #274).
f092922: trivial — kept both ReScript and Julia test-file patterns in _TEST_PATH_PATTERNS.

Test results

# Before (upstream/main @ b0f8527)
60 failed, 1015 passed, 1 skipped, 2 xpassed, 6 errors

# After (this branch)
16 failed, 1059 passed, 1 skipped, 2 xpassed, 6 errors

44 test failures fixed. Remaining 16 + 6 are out of scope:

6 TestApplyToolFilter errors — separate FastMCP._tool_manager private-API bug introduced by a3a043b.
1 test_stdio_calls_mcp_run_stdio — separate stdio-transport regression.
6 Java / 1 PHP / 9 GDScript — pre-existing regressions unrelated to any of the four PRs restored here.

Quality gate

✅ uv run ruff check code_review_graph/parser.py code_review_graph/incremental.py
✅ uv run mypy code_review_graph/parser.py code_review_graph/incremental.py --ignore-missing-imports --no-strict-optional

Non-goals

No changes to skills.py, cli.py, or any Copilot work (see feature: Add GitHub Copilot support (CLI and CHAT) ‼️‼️‼️ #129 for that).
No version bump, no CHANGELOG edit — release notes belong to the maintainer.
No touching of the _tool_manager or Java/PHP/GDScript regressions — those deserve their own PRs.

cc @tirth8205 — could you take a look? @azizur100389 @michael-denyer @danvinci — pinging as original authors in case I misread any of your original intent during the cherry-pick.

The parser gated CALLS edge emission on `enclosing_func` being set, so calls made from module scope (top-level script glue, CLI entrypoints, `if __name__ == "__main__"` blocks, and Jupyter/Databricks notebook cells) produced zero CALLS edges. Any function invoked only from those contexts was flagged as dead by `find_dead_code`, even when the function was the entire reason the script existed. Notebooks are particularly affected because every cell is module-scope by definition, so the existing notebook parser (PR tirth8205#69) emitted nodes and IMPORTS_FROM edges but no CALLS edges — making the dead-code detector's notebook coverage vacuous. Fix: when `enclosing_func` is None, attribute the CALLS edge to the File node instead of dropping it. Matches the existing convention used by `_extract_value_references` and CONTAINS edges. Applied to all 5 gated emission sites: generic Python/JS/TS path, JSX components, Elixir, Solidity `emit`, and R. Downstream: `detect_entry_points` now filters File-sourced CALLS via `get_all_call_targets(include_file_sources=False)` so script-only callees remain detectable as entry points (otherwise `run_job()` called from `script.py` module scope would look "called" by `script.py` and disappear from flow analysis). Verified end-to-end against a Databricks `.ipynb` that calls `Predict.extract_data_from_sample_ids()` from cell-level code: edge count went from 0 to 14 CALLS edges, and `find_dead_code` no longer flags the method. Tests: - `test_module_scope_calls_attributed_to_file` — bare `.py` script - `test_module_scope_calls_in_notebook` — `.ipynb` file - `test_detect_entry_points_module_scope_caller_is_still_root` — flow analysis treats File-sourced CALLS correctly - `test_module_scope_caller_prevents_dead_code_flag` — end-to-end parse → store → find_dead_code - `test_if_main_block_caller_prevents_dead_code_flag` — same for `__main__` block (cherry picked from commit fe383c7)

@tirth8205

…on-less scripts (tirth8205#276) Two parser improvements that expand code-review-graph's file coverage to extension-less Unix scripts and Korn shell files. Feature 1: .ksh extension → bash parser (tirth8205#235) ----------------------------------------------- Register .ksh (Korn shell) with tree-sitter-bash alongside the existing .sh / .bash / .zsh entries shipped in v2.3.0. Korn shell is close enough to bash syntactically that tree-sitter-bash handles the structural features the graph captures correctly. Context: in the close comment on PR tirth8205#230, @tirth8205 explicitly flagged this as worth adding: "The .ksh extension in particular looks worth adding — I didn't include it in tirth8205#227." Tests: test_detects_language extended with .ksh assertion; test_ksh_extension_parses_as_bash — end-to-end regression test that copies sample.sh to a temp .ksh file, parses it, and asserts identical function set and edge counts. Feature 2: shebang-based language detection (tirth8205#237) -------------------------------------------------- detect_language() was extension-only — any file with no extension returned None and was silently skipped. This misses a huge category of production files: git hooks, CI scripts, bin/ entry points, installers. New SHEBANG_INTERPRETER_TO_LANGUAGE table maps common interpreter basenames to languages already registered: bash/sh/zsh/ksh/dash/ash -> bash python/python2/python3/pypy/pypy3 -> python node/nodejs -> javascript ruby, perl, lua, Rscript, php New _detect_language_from_shebang(path) static method reads the first 256 bytes, handles direct form (#!/bin/bash), env indirection (#!/usr/bin/env bash), env -S flags, trailing flags (#!/bin/bash -e), CRLF, binary content, and strict UTF-8 decoding. detect_language() now falls back to the shebang probe for files with no extension (suffix == ""). Files with a known extension are never re-read — extension-based detection stays authoritative. Tests (16 new in test_parser.py): every interpreter mapping, env -S flag, trailing flags, missing shebang, empty file, binary content, unknown interpreter, extension-does-not-get-overridden, and end-to-end parse_file producing function nodes from an extension-less bash script. Files changed ------------- - code_review_graph/parser.py — .ksh mapping + SHEBANG_INTERPRETER_TO_LANGUAGE table + _detect_language_from_shebang() + detect_language() fallback - tests/test_multilang.py — .ksh detection + end-to-end ksh parsing test - tests/test_parser.py — 16 shebang detection tests (cherry picked from commit e6e3144)

…irth8205#274) Five root-cause fixes that together resolve every pre-existing Windows test failure on main. Each fix has targeted regression tests; the net effect is full green CI on Windows (8 failures -> 0). Bug 1: get_data_dir() writes non-UTF-8 .gitignore on Windows (tirth8205#239) -------------------------------------------------------------------- write_text() called without encoding="utf-8". The em-dash in the header is U+2014 which Python encodes as cp1252 byte 0x97 on Windows. Any later UTF-8 read fails with UnicodeDecodeError. Fix: add encoding="utf-8" (matches sibling _ensure_repo_gitignore). Test: test_auto_gitignore_is_valid_utf8 — asserts UTF-8 byte sequence, rejects cp1252 byte. Bug 2: Databricks notebook detection fails on CRLF line endings (tirth8205#239) ---------------------------------------------------------------------- source.startswith(b"# Databricks notebook source\n") hard-codes LF. Windows git checkout (core.autocrlf=true) produces CRLF. All Databricks handling silently bypassed — 4 tests fail. Fix: parse first line robustly, strip trailing \r before exact match. Tests: test_databricks_header_crlf_line_endings, test_databricks_header_lf_line_endings_still_work, test_databricks_header_prefix_false_positive_rejected. Bug 3: Stale FastMCP API in async regression guard (tirth8205#239) --------------------------------------------------------- test_heavy_tools_are_coroutines called mcp.get_tools() which does not exist in fastmcp>=2.14.0 (pinned in pyproject.toml). The guard has been silently broken since it was written — the protection promised by PR tirth8205#231 for tirth8205#46/tirth8205#136 was never actually enforced. Fix: resolve tools via getattr(crg_main, name) like the sibling test. Drop @pytest.mark.asyncio since no event loop is needed. Test: test_regression_guard_does_not_depend_on_fastmcp_internals — AST-walks the guard source to ensure no mcp internal API references. Bug 4-5: find_repo_root walks above test sandbox (tirth8205#241) ------------------------------------------------------- test_returns_none_without_git and test_falls_back_to_start fail on any machine where tmp_path has a git-initialized ancestor (dotfiles repo at ~/.git — very common on developer machines). Fix: add optional stop_at parameter to find_repo_root() and find_project_root(). When set, the walk examines stop_at for .git and then stops. Default is None (existing walk-to-root behavior). Fully backward-compatible — all 7 production callers unchanged. Tests: test_stop_at_prevents_escape_to_outer_git, test_stop_at_finds_git_at_boundary, test_stop_at_forwarded_to_find_repo_root. Files changed ------------- - code_review_graph/incremental.py — encoding fix + stop_at API - code_review_graph/parser.py — CRLF-tolerant Databricks detection - tests/test_incremental.py — gitignore UTF-8 guard + stop_at tests - tests/test_main.py — fixed async guard + meta-guard - tests/test_notebook.py — CRLF + LF + false-positive guards (cherry picked from commit aa627fb)

(cherry picked from commit f092922)

(cherry picked from commit 425810b)

(cherry picked from commit 536fd4b)

azizur100389 · 2026-04-21T13:54:47Z

Thanks for catching this @npkriami18 — and thanks for doing the blame-and-cherry-pick work to restore authorship properly. Two of the four reverted fixes are mine (#274 and #276); this PR picks them back up exactly as I wrote them. No intent lost in translation.

Verified from my end:

5d5248d (feat: .ksh extension + shebang-based language detection for extension-less scripts (#235, #237) #276: .ksh + shebang) — SHEBANG_INTERPRETER_TO_LANGUAGE table, _detect_language_from_shebang(), .ksh → bash entry, and 10 regression tests in tests/test_parser.py::TestCodeParser::test_detect_shebang_* + tests/test_multilang.py::TestBashParsing::test_ksh_extension_parses_as_bash — all intact.
1c02194 (fix: resolve all 8 Windows test failures — UTF-8, CRLF, FastMCP API, stop_at (#239, #241) #274: Windows CRLF/UTF-8/stop_at) — _extract_databricks_header CRLF-tolerant detection, encoding="utf-8" on the auto-generated .gitignore, stop_at parameter on find_repo_root / find_project_root, and the 15 regression tests — all intact.

Your conflict resolution notes match what I'd have done too:

Keeping the incremental.py HEAD side is correct — the SVN support added after fix: resolve all 8 Windows test failures — UTF-8, CRLF, FastMCP API, stop_at (#239, #241) #274 is additive and doesn't touch the stop_at branch or get_data_dir encoding branch. I re-read the diff; no interaction.
tests/test_main.py HEAD-side keeps TestApplyToolFilter, which is independent of my work. Good call.
The f092922 trivial merge to keep both ReScript and Julia patterns in _TEST_PATH_PATTERNS is fine.

Root cause is worth flagging for future super-PR resolution: when the base of a stacked feature branch predates a fix that lives in a completely unrelated region, git merge auto-resolves by taking the feature side and silently drops the fix. There are no conflict markers because the feature branch simply doesn't know the fix exists. The only defense is diffing main against the merge result on any file the super-PR touches — not just the hunks the super-PR modified. #361's writeup is the cleanest diagnosis of this class of bug I've seen on the repo.

Happy to approve if reviewers have the bit for it, or land my OK here and wait for @tirth8205. Zero objections from me — restore my two commits as-is.

Not addressed here and I agree they're out of scope:

_tool_manager private-API regression from a3a043b (that's the same ex-issue our fix: resolve all 8 Windows test failures — UTF-8, CRLF, FastMCP API, stop_at (#239, #241) #274 regression tests were meant to guard against, but the guard itself assumes a stable FastMCP surface — worth a separate small PR to make the guard tolerant of the new attribute name once it's decided).
Java / PHP / GDScript regressions — agreed, separate PRs per language.

Thanks again for the careful work.

npkriami18 · 2026-04-21T13:57:24Z

Closing in favour of a single unified restoration PR — combining all reverted parser fixes (#274, #275, #276, #278, #280, #285, #298, #316, #319) plus the _tool_manager FastMCP API drift and stdio banner test into one green PR.

michael-denyer and others added 6 commits April 21, 2026 19:12

parser julia-specific helpers

429d509

(cherry picked from commit f092922)

add parametric constructors

e19f96e

(cherry picked from commit 425810b)

support for @enum and public

a428a9c

(cherry picked from commit 536fd4b)

npkriami18 closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore dropped parser fixes from #274 / #276 / #285 (+ Julia impl from #319)#362

fix: restore dropped parser fixes from #274 / #276 / #285 (+ Julia impl from #319)#362
npkriami18 wants to merge 6 commits intotirth8205:mainfrom
npkriami18:fix/restore-reverted-parser-fixes

npkriami18 commented Apr 21, 2026

Uh oh!

azizur100389 commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

npkriami18 commented Apr 21, 2026

Context

What this PR does

Test results

Quality gate

Non-goals

Uh oh!

azizur100389 commented Apr 21, 2026

Uh oh!

npkriami18 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants