feat: deterministic tested_by edges + dashboard badge (#113) by Lum1104 · Pull Request #122 · Lum1104/Understand-Anything

Lum1104 · 2026-05-06T14:25:10Z

Closes #113.

What's wrong

@Bulkmaker reported in #113 that test-coverage info on the graph is currently unusable for two reasons:

tested_by edge direction is inconsistent. The file-analyzer prompt says edges should be production → test, but production files don't import their tests, so the LLM only sees the relationship while analyzing a test file — and naturally emits test → production (inverted). Same project, same file types, mixed directions across batches.
Massive undercounting. On a real Nuxt 4 + Directus monorepo with ~140 unit + e2e tests, only 17 tested_by edges came through (~7% of production files looked tested vs. true coverage being far higher).
Dashboard treats tested_by like any other edge. Nothing visually distinguishes a tested file from an untested one.

Confirmed reproduction on Google's microservices-demo (real corpus, not Nuxt-specific): 7 LLM-emitted tested_by edges, 3 of them (43%) inverted (shippingservice_test.go → main.go, shippingservice_test.go → tracker.go, product_catalog_test.go → product_catalog.go). 0 production nodes tagged.

Fix

Two pieces:

1. Canonicalize `tested_by` direction in `merge-batch-graphs.py` (two-pass linker)

Path-based linker integrated as Step 5b between node dedup and edge dedup, with a refined two-pass design:

Pass 1 — preserve LLM evidence, fix direction.
The LLM's tested_by pairings are real (it sees import { CartService } from '../src/services/CartService' in the test file). What's wrong is direction: the source is the file being analyzed = a test. So we walk every existing tested_by edge and:

canonical (production → test) → keep unchanged
inverted (test → production) → flip in place; description gets [direction corrected] audit marker
semantically broken (test ↔ test, prod ↔ prod, orphan endpoint, duplicate pair) → drop

Pass 2 — supplement via path conventions.
For tests Pass 1 didn't pair, walk candidate production paths from production_candidates(test_path) and emit a fresh production → test edge for the first match. Pairs already covered by Pass 1 are skipped.

Path-convention coverage (production_candidates):

JS/TS: sibling de-infix (.test./.spec.); walk-out from __tests__/, test/, spec/, tests/ subdirs; mirrored tests/ ↔ {src,app,lib,""} tree
Go: sibling <name>_test.go ↔ <name>.go (multi-source-per-test cases handled by Pass 1 swap, not by path heuristic)
Python: sibling test_<name>.py / <name>_test.py; in-package <pkg>/tests/test_<name>.py walk-out (Django apps); top-level tests/ mirror
Java/Kotlin: Maven/Gradle src/test/<lang>/... ↔ src/main/<lang>/...; sibling fallback
C#: sibling fallback; <svc>/tests/X.cs ↔ <svc>/X.cs and <svc>/src/.../X.cs (microservices-demo cartservice layout); .NET sibling-project mirror <App>.Tests/X.cs ↔ <App>/X.cs
C/C++: sibling de-prefix/de-suffix

Tagging is consolidated into a final pass over all canonical edges, so production nodes get the "tested" tag whether the edge came from Pass 1 (canonical / swapped) or Pass 2 (supplement). tags is coerced to a fresh list when it arrives malformed (None / string / int / dict from raw LLM batch JSON), since the TypeScript autoFixGraph normalizer runs downstream of this script.

link_tests returns (added, dropped, tagged, swapped); the merge report distinguishes the four counters.

The file-analyzer prompt keeps the tested_by row in the schema table — we now actively use the LLM-emitted edges as Pass 1 evidence. The note explains direction will be auto-canonicalized so the LLM doesn't need to be defensive about it.

2. Dashboard "tested" badge

Small green dot (bg-node-function, 6×6, subtle 4px glow) next to the existing complexity badge in CustomNode.tsx. Renders only when data.tags?.includes("tested") — older graphs without the tag look identical to before.

GraphView.tsx had two CustomNodeData construction sites missing tags: node.tags (the layer-detail topology builder and buildCustomFlowNode helper). Both plumbed. KnowledgeGraphView.tsx already passes tags; no change there.

Forward / backward compatibility

Pure additive, no shim needed:

	Old graph + new dashboard	New graph + old dashboard
`tags` field	Already in schema; auto-fix sets `[]` if missing. Old nodes have `tags=[]`, badge code uses `?.includes(...)`, no-op.	Old dashboard ignores extra string in `tags` array.
`tested_by` edge type	Already in schema. Inverted edges from old graphs render in the wrong direction until you re-run `/understand`, at which point Pass 1 swaps them in place.	Same edge type, just canonical direction. Old dashboard renders fine.
`tested` tag visibility	Badge does not render.	Tag chip in NodeInfo / NodeTooltip already shows it as a gold pill — fine.

No schema changes. No new edge types. No new node types. No new dashboard state.

Files

 understand-anything-plugin/agents/file-analyzer.md                                |   3 +-
 understand-anything-plugin/packages/dashboard/src/components/CustomNode.tsx       |  16 +-
 understand-anything-plugin/packages/dashboard/src/components/GraphView.tsx        |   2 +
 understand-anything-plugin/skills/understand/SKILL.md                             |   2 +-
 understand-anything-plugin/skills/understand/merge-batch-graphs.py                | 500+++
 understand-anything-plugin/skills/understand/test_merge_batch_graphs.py           | 800+++ (new file)
 .gitignore                                                                        |   2 +

Test plan

cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs.py — 47/47 pass (16 production_candidates + 11 is_test_path + 19 link_tests end-to-end + 1 merge_and_normalize integration)
pnpm --filter @understand-anything/core test — 654/654 pass (untouched)
pnpm --filter @understand-anything/dashboard test — 42/42 pass (untouched)
pnpm --filter @understand-anything/dashboard build — clean
Real-world validation on Google microservices-demo:
- Before this PR: 7 tested_by edges, 3 inverted, 0 production nodes tagged
- After early commits (strip-and-rederive): 4 edges, 0 inverted, 4 tagged — dropped 3 LLM signals the path-convention pass couldn't recover (Go multi-source-per-test, .NET Maven layout)
- After swap-then-supplement (latest commit): 7 edges, 0 inverted, 7 tagged — full coverage signal preserved, all 3 inverted edges flipped in place
link_tests regression on the shippingservice case: one Go _test.go covering main.go + tracker.go + quote.go is preserved by Pass 1 swap (no path-convention pair exists; Pass 2 finds none, by design)
Codex P1 — malformed tags (None / string / int / dict) — verified non-crashing for all five cases
Manual end-to-end on a real project: rerun /understand, confirm dashboard renders the green dot on tested files only, edges all flow production → test

Out of scope (deliberately, per #113 "minimal valuable thing")

"Show only untested" filter / per-layer coverage % — issue called these optional follow-ons.
Rust (tests are usually inline #[cfg(test)], no separate file).
C/C++ project-style mirrors (project structure varies wildly).
Auto-flipping inverted tested_by edges in already-loaded old graphs — Pass 1 handles this on the next merge run instead.

Pre-existing, not from this branch

pnpm lint errors with eslint: command not found (no eslint installed at root, no eslint.config.*). Same on main. Out of scope.

🤖 Generated with Claude Code

The file-analyzer LLM only sees the production↔test relationship when analyzing a test file (production files don't import their tests), so its emitted direction was unreliable across batches and recall was massively undercounted (~7% on a real Nuxt 4 + Directus repo). Move tested_by production entirely into the merge step. The linker: - Strips every tested_by edge from batch input (LLM direction unreliable). - Indexes file:* nodes and classifies each path as test or production. - For each test, walks ordered candidate production paths (sibling de-infix, __tests__/ walk-out, mirrored tests/→{src,app,lib,<root>} tree, Maven/Gradle src/test/...→src/main/...). - Emits canonical production → test edges and tags production nodes "tested". Supported conventions: JS/TS family (.test/.spec), Go (_test.go), Python (test_*.py, *_test.py), Java (*Test/*Tests/*IT.java), Kotlin (*Test/*Tests.kt), C# (*Test/*Tests.cs), C/C++ (test_*, *_test). Stdlib only, type-hinted in existing style. Hooked into merge_and_normalize between node dedup (Step 5) and edge dedup (Step 6). Reports drops under "Fixed" and additions under a new "Tested-by linker" section. Tests cover path classification, candidate generation, full link_tests behaviour (forward direction, idempotence, LLM-edge stripping, test-to-test rejection), and the merge integration. 31 cases, stdlib unittest, runnable with `python -m unittest test_merge_batch_graphs.py`. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Now that the merge script produces tested_by edges deterministically from path conventions, the LLM should not emit them — its direction is unreliable across batches and any emitted edges are stripped on merge. - Remove tested_by row from file-analyzer's edge table. - Add a note pointing to the deterministic linker. - Document the new behaviour in the merge section of SKILL.md. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

- is_test_path: collapse 7 per-language conditional blocks into a data-driven _TEST_NAME_PATTERNS table; JS/TS infix stays inline - production_candidates: extract _join + module-level _add_unique to drop the nested closure and the repeated trailing-slash idiom - Drop dead _TEST_DIR_SEGMENTS constant and the local _splitext reimplementation; use os.path.splitext - link_tests: drop the impossible-malformed-tags guard, tighten the docstring, change edge description to "Path-based pairing (deterministic)", drop redundant break comment - Trim Step 5b inline block that duplicated the module-level header - Convert file-analyzer Note from blockquote to bold paragraph to match surrounding prompt style Tests: split the strip-edges test from the unrelated-edges-survive test, add empty-input and missing-filePath cases, pin sibling-before- walkup and sibling-before-mirror priority order, drop brittle report text assertion. 36 tests, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Render a small green dot next to the complexity badge whenever a node's tags contain "tested" — surfacing the deterministic linker's signal so users can see at a glance which files have paired tests. Plumb node.tags through both CustomNodeData construction sites in GraphView.tsx; KnowledgeGraphView.tsx already passes tags. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c257a55f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T14:28:58Z

+            tags = prod_node.setdefault("tags", [])
+            if "tested" not in tags:
+                tags.append("tested")


Normalize tags type before adding "tested" tag

When a matched production file has malformed tags (for example null, a string, or any non-list value from an LLM batch), this block assumes list semantics and can raise (TypeError on membership or AttributeError on append), which aborts the whole merge. This regression is introduced by the new linker path and can break /understand on otherwise recoverable batch data; coerce non-list tags to [] (or another safe default) before checking/appending.

Useful? React with 👍 / 👎.

Codex flagged that prod_node.setdefault("tags", []) returns the existing value when the key is present, so a raw LLM batch with tags=None or tags="some string" would crash the whole merge on the next "tested" not in tags membership check. The TypeScript autoFixGraph normalizer that handles this case runs downstream of merge-batch-graphs.py, not before it, so the Python side has to defend itself. Coerce non-list tags to a fresh [] before the membership/append. Regression test exercises None / comma-string / single-string / int / dict inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Strip-and-rederive (current PR behaviour) drops real coverage signal on projects whose test layout doesn't match a naming convention. On the Google microservices-demo the LLM had emitted 7 valid tested_by edges (3 with inverted direction); the strip pass dropped them and the path- convention rederive could only re-pair 4 of them. Net: 7 → 4 edges, 3 production files lost their tested signal. Replace strip-and-rederive with two-pass swap-then-supplement: Pass 1 — walk LLM tested_by edges. Canonical (production → test) edges pass through unchanged. Inverted (test → production) edges are flipped in place; description gets a `[direction corrected]` audit marker. Edges with no recoverable meaning (test↔test, prod↔prod, orphan endpoint, duplicate pair) are dropped. Pass 2 — for tests not yet paired by Pass 1, walk path-convention candidates and emit a fresh production → test edge for the first match. Pairs already covered by Pass 1 are skipped. Tagging is consolidated into a final pass over all canonical edges so production nodes get the "tested" tag whether the edge came from Pass 1 (canonical / swapped) or Pass 2 (supplement). Multi-language audit of production_candidates revealed three real-world gaps surfaced by re-checking microservices-demo and common project layouts: - JS/TS walk-out only handled `__tests__/`. Extended to also walk out of `<dir>/test/`, `<dir>/spec/`, and `<dir>/tests/` (some JS/TS projects use these instead of __tests__/). - Python walk-out only handled top-level `tests/`. Added in-package `<pkg>/tests/test_<name>.py` → `<pkg>/<name>.py` (Django app style and any project that colocates tests with the package). - C# only had sibling fallback. Added two new mirrors: * `<svc>/tests/X.cs` ↔ `<svc>/X.cs` and `<svc>/src/.../X.cs` (microservices-demo cartservice exact layout). * `<App>.Tests/Foo/BarTests.cs` ↔ `<App>/Foo/Bar.cs` (.NET sibling-project convention). Go is intentionally not changed — the "one _test.go covers several .go files in the same package" pattern is now solved by Pass 1 (swapping LLM edges), not by trying to invent multi-pair path heuristics. The file-analyzer prompt is updated: the `tested_by` row is restored in the schema table because we now use those edges as evidence (Pass 1 canonicalizes the direction). The note explains direction will be auto-corrected so the LLM doesn't need to be defensive about it. link_tests now returns a 4-tuple (added, dropped, tagged, swapped); the merge_and_normalize report distinguishes "edges produced (supplement)" from "edges flipped" from "edges dropped". Real-world validation on microservices-demo: before: 7 tested_by edges, 3 inverted, 0 tagged after PR: 4 tested_by edges, 0 inverted, 4 tagged ← strip-and-rederive this: 7 tested_by edges, 0 inverted, 7 tagged ← swap-then-supplement Tests: 47 pass (was 37). New cases cover all swap branches, the shippingservice "one test, many sources" regression, and each new language pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Lum1104 · 2026-05-09T02:48:17Z

@codex review this

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4bb22fd9af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T02:52:01Z

+            if pair in covered:
+                # Duplicate canonical edge — drop the dup, keep the first.
+                dropped += 1
+                continue


Keep max-weight duplicate tested_by edge

When duplicate tested_by pairs appear, this branch drops later edges purely by arrival order (covered), so the stronger edge can be discarded before the generic Step 6 deduper (which normally keeps the highest weight) ever runs. In multi-batch runs where the same pair is emitted with different confidences, the resulting graph can retain a lower-confidence edge solely because its batch was processed first, which degrades ranking/visual confidence semantics for tested_by links.

Useful? React with 👍 / 👎.

Addressed in a4bdc1c. Pass 1 now mirrors Step 6's weight-aware dedup locally:

pair_to_idx tracks each kept pair's slot in the compacted edges list

on a duplicate, compare weights and replace the slot when strictly heavier (tie keeps the first arrival, matching Step 6)

swap operation is deferred until an edge is known to survive — no work spent canonicalizing a doomed duplicate

swapped_pairs set replaces the raw swapped counter so the reported number reflects the final output (replacing a swapped edge with a heavier canonical one drops it from the count, and vice versa)

Five new unit tests cover all four weight × direction combinations.

Lum1104 · 2026-05-09T02:58:59Z

Code review

Found 1 issue:

link_tests Pass 1 drops duplicate (production, test) pairs by arrival order before Step 6's weight-based dedup runs. When two batches both emit a tested_by edge for the same pair with different confidences (e.g. 0.3 vs 0.9), whichever edge is iterated first wins — the higher-weight one can be silently discarded. The general Step 6 deduper at merge-batch-graphs.py line 762 (_num(edge.get("weight", 0)) > _num(existing.get("weight", 0))) only sees one copy because the second was already discarded inside link_tests. Same bug for both the canonical-dup branch (line 583) and the inverted-dup branch (line 592). Independently flagged by Codex on this PR (feat: deterministic tested_by edges + dashboard badge (#113) #122 (comment)). Suggested fix: in both pair in covered branches, look up the existing kept edge and replace it when the new edge has a higher weight (mirroring Step 6).

Understand-Anything/understand-anything-plugin/skills/understand/merge-batch-graphs.py

Lines 578 to 600 in 4bb22fd

    
           # Both endpoints must be known file nodes; one test, one production. 
        
           # Anything else (orphan, test↔test, prod↔prod, non-file endpoint) 
        
           # has no recoverable meaning — drop it. 
        
           if (src_class, tgt_class) == ("prod", "test"): 
        
               pair = (src, tgt) 
        
               if pair in covered: 
        
                   # Duplicate canonical edge — drop the dup, keep the first. 
        
                   dropped += 1 
        
                   continue 
        
               covered.add(pair) 
        
               edges[write_idx] = edge 
        
               write_idx += 1 
        
           elif (src_class, tgt_class) == ("test", "prod"): 
        
               pair = (tgt, src) 
        
               if pair in covered: 
        
                   dropped += 1 
        
                   continue 
        
               covered.add(pair) 
        
               # Flip in place; mark provenance so reviewers can audit. 
        
               edge["source"] = tgt 
        
               edge["target"] = src 
        
               edge["direction"] = "forward" 
        
               prev = edge.get("description")

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Codex P2: link_tests Pass 1 dropped duplicate (production, test) pairs purely by arrival order — when two batches both emitted a tested_by edge for the same pair with different confidences (0.3 vs 0.9), the edge that happened to iterate first won. The general Step 6 deduper at line 762 mirrors `weight > existing.weight` semantics but it only ever saw one of the duplicates, so it couldn't rescue the heavier one. Refactor Pass 1 to mirror Step 6's weight comparison locally: - Track `pair_to_idx` mapping each kept (prod, test) pair to its slot in the compacted edges list. On a duplicate, look up the existing kept edge and compare weights; if the new edge is strictly heavier, swap (if needed) and replace the slot. Tie or lighter → drop the new edge. - Defer the swap operation until we know an edge will survive — no point canonicalizing a doomed duplicate. - Track surviving swap pairs in a separate `swapped_pairs` set so the `swapped` counter reflects the FINAL output, not the wasted work on edges that were later replaced. This means: replacing a swapped edge with a heavier canonical one drops the swap from the count; replacing a canonical edge with a heavier swapped one adds it. - Extract the swap-in-place mutation into `_swap_tested_by_in_place` so it can be invoked from both code paths. Five new unit tests cover all four weight-vs-direction combinations plus a tie case (existing test_drops_duplicate_canonical_edges, which still passes — tie → keep first, no swap counted). microservices-demo regression check unchanged: 7 → 7 edges, 3 swapped, 0 dropped, 7 tagged. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Lum1104 and others added 4 commits May 6, 2026 22:00

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

Lum1104 and others added 2 commits May 7, 2026 10:25

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

Lum1104 merged commit 3eb7700 into main May 9, 2026
1 check passed

Lum1104 deleted the feat/issue-113-tested-by-coverage branch May 9, 2026 03:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: deterministic tested_by edges + dashboard badge (#113)#122

feat: deterministic tested_by edges + dashboard badge (#113)#122
Lum1104 merged 7 commits into
mainfrom
feat/issue-113-tested-by-coverage

Lum1104 commented May 6, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

Lum1104 commented May 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Uh oh!

Lum1104 May 9, 2026

Uh oh!

Lum1104 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Lum1104 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's wrong

Fix

1. Canonicalize tested_by direction in merge-batch-graphs.py (two-pass linker)

2. Dashboard "tested" badge

Forward / backward compatibility

Files

Test plan

Out of scope (deliberately, per #113 "minimal valuable thing")

Pre-existing, not from this branch

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Lum1104 commented May 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Lum1104 May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Lum1104 commented May 9, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lum1104 commented May 6, 2026 •

edited

Loading

1. Canonicalize `tested_by` direction in `merge-batch-graphs.py` (two-pass linker)