fix: disambiguate colliding node IDs from same-name files#441
fix: disambiguate colliding node IDs from same-name files#441V0v1kkk wants to merge 2 commits intosafishamsi:v4from
Conversation
|
I'm not familiar with the codebase, so I asked a generic Copilot agent to review it: Findings (ordered by severity):
Critical: cross-file edge targets can remain stale after renaming High: unresolved-call edges may reference old caller IDs High: replacement ID can still collide and silently drop nodes The main issue is that this does not take into account the full path after the current working directory when generating the ID. It would also be useful to add unit tests to cover these cases. |
|
Thanks for the thorough review @GustavoStingelin! All three issues were real bugs. I've pushed a fix: 1. Cross-file edge targets remaining stale — Edge remap was scoped by 2. 3. Parent dir collision — Now uses progressively deeper path components until a unique ID is found. So Also added 5 unit tests with fixture files covering all three scenarios:
|
…r PyPI request - PR safishamsi#441: addressed 3 bugs from review, added 5 unit tests - tree-sitter-razor: added issue safishamsi#20 and PR safishamsi#21 for PyPI publishing - PR safishamsi#19 (scanner.c fix): marked as merged Made-with: Cursor
When two files share the same stem (e.g. Program.cs in BKD.Api and BooksKnowledgeDistillation), _make_id(stem, name) produces identical node IDs, causing nodes to merge incorrectly. Detect ID collisions across different source files and disambiguate by prepending the parent directory name to the ID. This ensures each Program class gets its own node in the graph. Made-with: Cursor
Fixes three bugs found during review: 1. Cross-file edge targets could remain stale after renaming: edge remap was scoped by source_file, so a target in another file would not be updated. Now uses a global rename map that checks all files for the old target ID. 2. raw_calls caller_nid was not remapped after disambiguation, causing inferred cross-file edges to reference non-existent source IDs. Now remaps caller_nid before cross-file resolution. 3. Replacement IDs could still collide when two files share the same parent directory name (e.g. src/app/Program.cs and tests/app/Program.cs). Now uses progressively deeper path components until a unique ID is found. Adds 5 unit tests covering: - Same-stem files get unique IDs - No duplicate IDs after disambiguation - All edge sources reference valid nodes - Same parent dir uses deeper path components - Cross-file edges are properly remapped Made-with: Cursor
af87c80 to
4eecade
Compare
Fixes #438
Summary
Adds a post-extraction pass that detects and resolves node ID collisions caused by files sharing the same stem (e.g.
Program.csin multiple directories)._make_id(stem, name)uses only the filename stem, sosrc/App1/Program.csandsrc/App2/Program.csboth produceprogram_program. This causes their methods and type nodes to silently merge into a single node.The fix detects these collisions by grouping nodes by ID and checking if they come from different
source_filepaths. When a collision is found, all affected nodes are renamed by prepending the parent directory name (e.g.app1_program_programandapp2_program_program).Impact
Common in .NET solutions where each project has its own
Program.cs,Startup.cs,AssemblyInfo.cs, etc. Also affects any multi-module project with naming conventions that produce duplicate file stems.Test Plan
Program.csfiles