Skip to content

feat: add graphify dry-run command#157

Open
nuthalapativarun wants to merge 2 commits intosafishamsi:v3from
nuthalapativarun:feat/dry-run-command
Open

feat: add graphify dry-run command#157
nuthalapativarun wants to merge 2 commits intosafishamsi:v3from
nuthalapativarun:feat/dry-run-command

Conversation

@nuthalapativarun
Copy link
Copy Markdown

@nuthalapativarun nuthalapativarun commented Apr 9, 2026

Summary

Adds a graphify dry-run [path] CLI command that scans the corpus and prints a file-count/health summary without writing any output files or building the graph.

This is a safe preview step — useful for validating what graphify sees before committing to a full extraction run that may consume LLM tokens.

Usage

$ graphify dry-run ./my-project
Corpus scan: /abs/path/my-project

  Code files          23
  Documents            7
  Total               30  (~84,200 words)

Corpus looks healthy — no warnings.

No files were written. Run without dry-run to build the graph.

With a large corpus:

warning: Large corpus: 312 files · ~620,000 words. Semantic extraction
will be expensive (many Claude tokens). Consider running on a subfolder,
or use --no-semantic to run AST-only.

Implementation

  • graphify/__main__.py — new elif cmd == "dry-run" branch + help text entry
  • Reuses detect.detect() entirely — no new detection logic
  • graphify-out/ is never created or touched

Test plan

  • test_dry_run_prints_summary — file-count table appears in output
  • test_dry_run_no_files_writtengraphify-out/ is not created
  • test_dry_run_default_path — defaults to current directory when path omitted
  • test_dry_run_missing_path — exits non-zero for a missing path
  • test_dry_run_no_graphify_out_written — "No files were written" in output

graphify dry-run [path] scans the corpus with detect() and prints a
file-count table with corpus health warnings without writing any
output files or building the graph.
@nuthalapativarun
Copy link
Copy Markdown
Author

Hey @safishamsi — just checking in on this one. Happy to rebase or make any adjustments if needed. Let me know!

@qodo-ai-reviewer
Copy link
Copy Markdown

Hi, graphify dry-run calls graphify.detect.detect(), but detect() can create graphify-out/converted/*.md sidecar files when it encounters .docx/.xlsx files. This violates the dry-run promise and can cause unexpected filesystem writes during what is advertised as a no-write preview step.

Severity: action required | Category: correctness

How to fix: Make detect side-effect-free

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

graphify dry-run must not write any files, but it currently calls graphify.detect.detect() which may write office conversion sidecars into graphify-out/converted/.

Issue Context

  • graphify/__main__.py dry-run branch calls _detect(root) and prints “No files were written”.
  • graphify/detect.py converts .docx/.xlsx by writing markdown sidecars.

Fix Focus Areas

  • Add a dry_run/write_sidecars/convert_office boolean parameter to graphify.detect.detect() (default preserving current behavior).
  • Ensure that when the flag is disabled, detect() does not create directories or write any files (skip conversion, and optionally count words directly from the office file or as 0).
  • Call detect(..., write_sidecars=False) (or equivalent) from the dry-run CLI branch.

References

  • graphify/main.py[794-823]
  • graphify/detect.py[347-376]
  • graphify/detect.py[187-213]

Found by Qodo code review

detect() now accepts write_sidecars=False; when disabled, office files
are counted directly without calling convert_office_file() or touching
graphify-out/converted/. The dry-run CLI branch passes this flag so the
no-write promise holds even for .docx/.xlsx corpora.

Adds test_dry_run_office_no_sidecar_written to assert convert_office_file
is never called during dry-run.
@nuthalapativarun
Copy link
Copy Markdown
Author

Good catch @qodo-ai-reviewer — fixed in 0b3e6eb.

detect() now accepts a write_sidecars=False keyword argument. When disabled, office files (.docx/.xlsx) are counted directly without calling convert_office_file() or touching graphify-out/converted/. The dry-run CLI branch passes this flag, so the no-write promise holds even for corpora containing office files.

Added test_dry_run_office_no_sidecar_written which mocks convert_office_file and asserts it is never called during a dry-run invocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants