Skip to content

fix: disambiguate colliding node IDs from same-name files#441

Open
V0v1kkk wants to merge 2 commits intosafishamsi:v4from
V0v1kkk:fix/node-id-collisions
Open

fix: disambiguate colliding node IDs from same-name files#441
V0v1kkk wants to merge 2 commits intosafishamsi:v4from
V0v1kkk:fix/node-id-collisions

Conversation

@V0v1kkk
Copy link
Copy Markdown

@V0v1kkk V0v1kkk commented Apr 18, 2026

Fixes #438

Summary

Adds a post-extraction pass that detects and resolves node ID collisions caused by files sharing the same stem (e.g. Program.cs in multiple directories).

_make_id(stem, name) uses only the filename stem, so src/App1/Program.cs and src/App2/Program.cs both produce program_program. This causes their methods and type nodes to silently merge into a single node.

The fix detects these collisions by grouping nodes by ID and checking if they come from different source_file paths. When a collision is found, all affected nodes are renamed by prepending the parent directory name (e.g. app1_program_program and app2_program_program).

Impact

Common in .NET solutions where each project has its own Program.cs, Startup.cs, AssemblyInfo.cs, etc. Also affects any multi-module project with naming conventions that produce duplicate file stems.

Test Plan

  • Verified on a .NET solution with 2 Program.cs files
  • Nodes correctly separated after fix
  • Edges re-pointed to new node IDs
  • Existing tests pass

@GustavoStingelin
Copy link
Copy Markdown

I'm not familiar with the codebase, so I asked a generic Copilot agent to review it:

Findings (ordered by severity):

Critical: cross-file edge targets can remain stale after renaming
In the rename pass, edge remapping is keyed only by the edge source_file, then applied to both source and target IDs ([extract.py:3225].
But cross-file uses edges are created with source_file set to the importing file while target points to a node in another file ([extract.py:2651].
Result: when a target node from file B is renamed, an edge originating in file A may not update its target, leaving dangling/mismatched IDs.

High: unresolved-call edges may reference old caller IDs
Raw calls store caller_nid before the collision-disambiguation step ([extract.py:1071], [extract.py:1072].
Colliding nodes are later renamed ([extract.py:3203].
During cross-file raw call resolution, the caller is read directly from raw_calls with no remap ([extract.py:3258], [extract.py:3262]).
Result: inferred calls can be emitted from non-existent source IDs after rename.

High: replacement ID can still collide and silently drop nodes
New IDs are built from only the parent directory name + old ID ([extract.py:3200], [extract.py:3201].
If two colliding files have the same parent directory name in different paths (for example, src/app/Program.cs and tests/app/Program.cs), both compute the same replacement ID, and one is dropped by seen_final_ids ([extract.py:3202], [extract.py:3206].
This reintroduces merge/loss behavior in a common monorepo layout.

The main issue is that this does not take into account the full path after the current working directory when generating the ID.

It would also be useful to add unit tests to cover these cases.

@V0v1kkk
Copy link
Copy Markdown
Author

V0v1kkk commented Apr 27, 2026

Thanks for the thorough review @GustavoStingelin! All three issues were real bugs. I've pushed a fix:

1. Cross-file edge targets remaining stale — Edge remap was scoped by source_file, so a target node in another file wouldn't be updated. Fixed by using a global old_id → new_id map that checks all files for the old target ID.

2. raw_calls caller_nid not remapped — Now remaps caller_nid in raw_calls before cross-file resolution runs, so inferred edges use the correct post-rename IDs.

3. Parent dir collision — Now uses progressively deeper path components until a unique ID is found. So src/app/Program.cssrc_app_program_program and tests/app/Program.cstests_app_program_program.

Also added 5 unit tests with fixture files covering all three scenarios:

  • Same-stem files get unique IDs
  • No duplicate IDs after disambiguation
  • All edge sources reference valid nodes
  • Same parent dir uses deeper path components
  • Cross-file edges are properly remapped

V0v1kkk added a commit to V0v1kkk/graphify that referenced this pull request Apr 27, 2026
…r PyPI request

- PR safishamsi#441: addressed 3 bugs from review, added 5 unit tests
- tree-sitter-razor: added issue safishamsi#20 and PR safishamsi#21 for PyPI publishing
- PR safishamsi#19 (scanner.c fix): marked as merged

Made-with: Cursor
V0v1kkk added 2 commits April 27, 2026 09:57
When two files share the same stem (e.g. Program.cs in BKD.Api and
BooksKnowledgeDistillation), _make_id(stem, name) produces identical
node IDs, causing nodes to merge incorrectly.

Detect ID collisions across different source files and disambiguate
by prepending the parent directory name to the ID. This ensures each
Program class gets its own node in the graph.

Made-with: Cursor
Fixes three bugs found during review:

1. Cross-file edge targets could remain stale after renaming: edge remap
   was scoped by source_file, so a target in another file would not be
   updated. Now uses a global rename map that checks all files for the
   old target ID.

2. raw_calls caller_nid was not remapped after disambiguation, causing
   inferred cross-file edges to reference non-existent source IDs. Now
   remaps caller_nid before cross-file resolution.

3. Replacement IDs could still collide when two files share the same
   parent directory name (e.g. src/app/Program.cs and tests/app/Program.cs).
   Now uses progressively deeper path components until a unique ID is found.

Adds 5 unit tests covering:
- Same-stem files get unique IDs
- No duplicate IDs after disambiguation
- All edge sources reference valid nodes
- Same parent dir uses deeper path components
- Cross-file edges are properly remapped

Made-with: Cursor
@V0v1kkk V0v1kkk force-pushed the fix/node-id-collisions branch from af87c80 to 4eecade Compare April 27, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Node ID collisions when multiple files share the same stem (e.g. Program.cs)

2 participants