Making the CLI args for train and inference consistent by vratins · Pull Request #57 · diff-use/WaterFlow

vratins · 2026-03-19T17:07:57Z

For issue #50

Summary by CodeRabbit

New Features
- Added options to configure logging output (log level and log file).
Chores
- Renamed CLI options for clarity (geometry cache, self-conditioning, RK4 steps).
Bug Fixes
- Improved handling of geometry cache resolution and metrics output to prefer the new cache name and record renamed fields.
- Tightened metrics handling and numeric accumulation to avoid incorrect aggregates.

Copilot

Pull request overview

This PR addresses issue #50 by aligning scripts/train.py and scripts/inference.py command-line flags and logging configuration so the two entrypoints use consistent naming and offer the same logging controls.

Changes:

Renamed --rk4_steps to --num_steps in scripts/train.py and updated the corresponding usage sites.
Updated scripts/inference.py CLI flags to match training (--geometry_cache_name, --use_self_cond) and added --log_level / --log_file while removing the early top-level logging setup in favor of configuring logging in main().
Added several # ty: ignore[...] suppressions around model/dataset construction and metric handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
scripts/train.py	Renames the integration-steps flag and updates eval sampling usage; adds `ty` ignore suppressions around model construction and training loop typing.
scripts/inference.py	Renames CLI flags for consistency, adds logging CLI options and moves logging setup into `main()`; adds `ty` ignore suppressions and updates self-conditioning plumbing.

Comments suppressed due to low confidence (1)

scripts/inference.py:604

The metrics.json config block now stores values under keys like "use_sc" and "geometry_cache", while the CLI flags are "--use_self_cond" and "--geometry_cache_name". This makes the output config inconsistent with the CLI and can confuse downstream consumers. Consider renaming these keys (or writing both old+new keys for backward compatibility) to match the new flag names.

                        "checkpoint": args.checkpoint,
                        "method": args.method,
                        "num_steps": args.num_steps,
                        "use_sc": args.use_self_cond,
                        "threshold": args.threshold,
                        "include_mates": include_mates,
                        "water_ratio": args.water_ratio,
                        "geometry_cache": geometry_cache_name,
                    },

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vratins · 2026-03-20T19:27:31Z

Will update this PR and remove the ty ignore comments once #43 and #38 are merged

…_prs

coderabbitai · 2026-03-23T16:53:50Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a2f9271a-ff1e-4436-9e59-95eaffa2fd01

📥 Commits

Reviewing files that changed from the base of the PR and between 8bfa060 and ab2c538.

📒 Files selected for processing (1)

scripts/inference.py

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/inference.py

📝 Walkthrough

Walkthrough

CLI flags renamed and logging initialization moved into inference.py's main; added log options; type annotation tightened; geometry cache resolution and metric keys updated; train.py RK4 flag renamed and metrics accumulation/conditioning logic hardened. (48 words)

Changes

Cohort / File(s)	Summary
Inference CLI & Behavior `scripts/inference.py`	Renamed flags: `--geometry_cache` → `--geometry_cache_name`, `--use_sc` → `--use_self_cond`. Updated downstream usage, metric JSON keys, and geometry-cache resolution to prefer `args.geometry_cache_name` → `config["geometry_cache_name"]` → legacy `config["geometry_cache"]`.
Inference Logging Init & Options `scripts/inference.py`	Moved `setup_logging_for_tqdm()` from module-level to `main()` after `args = parse_args()`; added CLI args `--log_level` (default `"INFO"`) and `--log_file` (default `None`) and wired them into logging initialization.
Inference Type Annotation `scripts/inference.py`	Updated `run_inference_batch` signature: `water_ratio: float = None` → `water_ratio: float
Train CLI & Evaluation `scripts/train.py`	Renamed CLI arg `--rk4_steps` → `--num_steps` and updated evaluation/sampling calls to pass `num_steps=args.num_steps`.
Train Metrics & Accumulation `scripts/train.py`	Tightened `train_epoch()` handling of `metrics["per_sample_info"]` by introducing a local `per_sample_info` and guarding dict access; added explicit `cast(float, ...)` when accumulating `total_loss` and `total_rmsd`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I renamed flags and tuned the logs just right,
Hopped types into unions, kept metrics tight,
Geometry caches now find their way,
Steps renamed for training’s day,
A tiny rabbit cheers the tidy flight.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main objective of the PR: making CLI arguments consistent between train.py and inference.py scripts through renames and refactoring.
Docstring Coverage	✅ Passed	Docstring coverage is 94.87% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev_prs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

src/utils.py (1)

187-193: Use the shared RBF defaults here too.

This helper still hardcodes 16 and 8.0, so callers that rely on its defaults will drift the next time NUM_RBF or RBF_CUTOFF changes.

♻️ Suggested change

 def compute_edge_features(
     pos: Tensor,
     edge_index: Tensor,
     pos_dst: Tensor | None = None,
-    num_gaussians: int = 16,
-    cutoff: float = 8.0,
+    num_gaussians: int = NUM_RBF,
+    cutoff: float = RBF_CUTOFF,
     clamp_min: float = 1e-5,
 ) -> tuple[Tensor, Tensor]:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils.py` around lines 187 - 193, compute_edge_features currently
hardcodes num_gaussians=16 and cutoff=8.0; change its signature to use the
shared defaults NUM_RBF and RBF_CUTOFF (e.g. num_gaussians: int = NUM_RBF,
cutoff: float = RBF_CUTOFF) and add the necessary import of NUM_RBF and
RBF_CUTOFF into this module so callers follow the global RBF defaults; keep
clamp_min as-is.

src/dataset.py (1)

1115-1127: Consider referencing NUM_RBF constant in docstring.

The docstring hardcodes edge_rbf: (E, 16) but this dimension is defined by NUM_RBF constant. Consider documenting as edge_rbf: (E, NUM_RBF) for consistency, especially if this value might change.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dataset.py` around lines 1115 - 1127, The docstring for
Dataset.__getitem__ documents edge_rbf as "(E, 16)" but that dimension is
defined by the NUM_RBF constant; update the docstring to reference NUM_RBF
(e.g., "edge_rbf: (E, NUM_RBF)") so the HeteroData description stays consistent
with the actual feature size used by the code (refer to __getitem__, NUM_RBF,
and the 'edge_rbf' field).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/inference.py`:
- Around line 428-431: The current logic for determining geometry_cache_name
only checks args.geometry_cache_name and config.get("geometry_cache_name"),
which ignores older saved runs using the legacy "geometry_cache" key; update the
resolution in scripts/inference.py so that geometry_cache_name =
args.geometry_cache_name or config.get("geometry_cache_name") or
config.get("geometry_cache") or "geometry" (i.e., fall back to the legacy
"geometry_cache" key before defaulting to "geometry"), and ensure any references
to geometry_cache_name (e.g., where it's used to locate cache directories)
continue to use the resolved value.

In `@scripts/train.py`:
- Around line 666-670: The call to flow_matcher.training_step uses a
now-mismatched signature: FlowMatcher.training_step(batch, optimizer,
grad_clip=1.0, use_self_conditioning=True). Remove the unsupported
accumulation_steps kwarg and pass the optimizer instance (e.g., the same
optimizer used for training) as the second positional argument; keep or pass
grad_clip and use_self_conditioning explicitly if needed (use args.use_self_cond
and args.grad_clip). If you still need gradient accumulation, implement
accumulation outside training_step by looping/accumulating gradients across
args.grad_accum_steps and only calling optimizer.step()/optimizer.zero_grad()
after that loop rather than relying on an accumulation_steps parameter.

In `@src/dataset.py`:
- Around line 662-669: Move the encoder_type validation to run before any
preprocessing to avoid wasted work: in src/dataset.py, check self.encoder_type
against {"gvp","slae","esm"} and raise the ValueError prior to calling
self._preprocess_all(); specifically, reorder the block so the validation of
encoder_type occurs before the call to self._preprocess_all(), keeping the same
error message and allowed set.

In `@tests/test_dataset.py`:
- Around line 49-88: The fixtures pdb_base_dir, pdb_6eey, pdb_2b5w, pdb_8dzt,
and pdb_1deu in tests/test_dataset.py shadow the portable fixtures defined in
tests/conftest.py (which use ENV_PDB_DIR / tests/test_files); remove these
duplicate fixture definitions from tests/test_dataset.py so the module uses the
conftest-provided fixtures instead (or if special behavior is required, rename
them to avoid shadowing and/or delegate to the conftest fixtures inside the new
fixture implementations such as calling the conftest pdb_base_dir fixture).

---

Nitpick comments:
In `@src/dataset.py`:
- Around line 1115-1127: The docstring for Dataset.__getitem__ documents
edge_rbf as "(E, 16)" but that dimension is defined by the NUM_RBF constant;
update the docstring to reference NUM_RBF (e.g., "edge_rbf: (E, NUM_RBF)") so
the HeteroData description stays consistent with the actual feature size used by
the code (refer to __getitem__, NUM_RBF, and the 'edge_rbf' field).

In `@src/utils.py`:
- Around line 187-193: compute_edge_features currently hardcodes
num_gaussians=16 and cutoff=8.0; change its signature to use the shared defaults
NUM_RBF and RBF_CUTOFF (e.g. num_gaussians: int = NUM_RBF, cutoff: float =
RBF_CUTOFF) and add the necessary import of NUM_RBF and RBF_CUTOFF into this
module so callers follow the global RBF defaults; keep clamp_min as-is.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b8c4a5d4-291a-4ea1-953b-8adacf86c6bc

📥 Commits

Reviewing files that changed from the base of the PR and between ea8ba8b and 5936135.

📒 Files selected for processing (10)

.github/workflows/build.yml
scripts/inference.py
scripts/train.py
src/constants.py
src/dataset.py
src/utils.py
tests/conftest.py
tests/test_dataset.py
tests/test_embedding_generation.py
tests/test_utils.py

💤 Files with no reviewable changes (1)

tests/test_embedding_generation.py

coderabbitai · 2026-03-23T17:05:11Z

        metrics = flow_matcher.training_step(
            batch,
            use_self_conditioning=args.use_self_cond,
-            accumulation_steps=args.grad_accum_steps,
+            accumulation_steps=args.grad_accum_steps,  # ty: ignore[unknown-argument]
        )


⚠️ Potential issue | 🔴 Critical

This call no longer matches FlowMatcher.training_step().

The current signature in src/flow.py (Lines 418-424) is training_step(batch, optimizer, grad_clip=1.0, use_self_conditioning=True). Line 669 adds an unsupported accumulation_steps keyword, and the required optimizer argument is no longer passed, so training will raise on the first batch instead of entering the accumulation loop.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/train.py` around lines 666 - 670, The call to flow_matcher.training_step uses a now-mismatched signature: FlowMatcher.training_step(batch, optimizer, grad_clip=1.0, use_self_conditioning=True). Remove the unsupported accumulation_steps kwarg and pass the optimizer instance (e.g., the same optimizer used for training) as the second positional argument; keep or pass grad_clip and use_self_conditioning explicitly if needed (use args.use_self_cond and args.grad_clip). If you still need gradient accumulation, implement accumulation outside training_step by looping/accumulating gradients across args.grad_accum_steps and only calling optimizer.step()/optimizer.zero_grad() after that loop rather than relying on an accumulation_steps parameter.

@vratins please check whether this is true, and if so fix it.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

marcuscollins · 2026-03-26T22:08:56Z

    )
    p.add_argument(
-        "--use_sc",
+        "--use_self_cond",


Are these new names reflected in the Dockerfile?

The docker file only exposes the input and output arguments, everything else is taken in as arguments by the use (and hydra configs soon)

marcuscollins · 2026-03-26T22:14:03Z

    p.add_argument("--eval_every", type=int, default=5)
    p.add_argument("--n_eval_samples", type=int, default=3)
-    p.add_argument("--rk4_steps", type=int, default=100)
+    p.add_argument("--num_steps", type=int, default=100)


@vratins please add an issue to add help strings to each argument.

Actually, is this worth an issue given that we will be switching to hydra configs as in #70

Well, probably not the same issue, but the config items need to be documented somewhere.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

scripts/inference.py:107

Changing the flag name from --geometry_cache to --geometry_cache_name will break existing inference invocations that still pass the old flag. You already handle backward compatibility for the config key later; consider also accepting --geometry_cache as an alias that maps to the same destination (geometry_cache_name) and (optionally) mark it deprecated.

    p.add_argument(
        "--geometry_cache_name",
        type=str,
        default=None,
        help="Subdirectory name within processed_dir specifying which water coordinate set to use. "
        "Options include 'geometry' (filtered waters meeting quality criteria) or "
        "'geometry_unfiltered' (all crystallographic waters). Overrides the model's config if specified.",
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

making the cli args for train and inference consistent

0da6fab

vratins linked an issue Mar 19, 2026 that may be closed by this pull request

Rename CLI flags to be consistent between train.py and inference.py #50

Open

vratins requested review from DorisMai and marcuscollins March 19, 2026 17:08

stephaniewankowicz marked this pull request as draft March 19, 2026 23:57

stephaniewankowicz marked this pull request as ready for review March 19, 2026 23:58

vratins requested a review from Copilot March 20, 2026 00:23

Copilot started reviewing on behalf of vratins March 20, 2026 00:24 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Comment thread scripts/train.py Outdated

Comment thread scripts/train.py

Comment thread scripts/inference.py Outdated

Comment thread scripts/inference.py

making the cli args for train and inference consistent

d6a1df4

vratins added 2 commits March 23, 2026 11:51

test commit

2bb978f

Merge branch 'dev_prs' of github.com:Wankowicz-Lab/WaterFlow into dev…

5936135

…_prs

Merge branch 'main' into dev_prs

9d6004e

Copilot AI review requested due to automatic review settings March 23, 2026 16:55

Copilot started reviewing on behalf of vratins March 23, 2026 16:55 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread src/dataset.py Outdated

Comment thread tests/test_utils.py

Comment thread tests/test_dataset.py Outdated

Comment thread src/dataset.py Outdated

Comment thread src/dataset.py

Comment thread .github/workflows/build.yml

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

vratins and others added 3 commits March 23, 2026 16:06

Merge branch 'main' into dev_prs

f958c5f

getting rig of ty ignore comments

82690bd

Auto-commit ruff fixes [skip ci]

8bfa060

Copilot AI review requested due to automatic review settings March 23, 2026 21:23

vratins review requested due to automatic review settings March 23, 2026 21:23

marcuscollins reviewed Mar 26, 2026

View reviewed changes

addressing CR

ab2c538

Copilot AI review requested due to automatic review settings March 26, 2026 23:11

vratins temporarily deployed to testing March 26, 2026 23:11 — with GitHub Actions Inactive

Copilot started reviewing on behalf of vratins March 26, 2026 23:11 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Comment thread scripts/train.py

Comment thread scripts/inference.py

Conversation

vratins commented Mar 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vratins commented Mar 20, 2026

Uh oh!

coderabbitai Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcuscollins Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

marcuscollins Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

vratins Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

marcuscollins Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

vratins Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcuscollins Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

vratins commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading

coderabbitai Bot Mar 23, 2026 •

edited

Loading

vratins Mar 26, 2026 •

edited

Loading