Skip to content

Making the CLI args for train and inference consistent#57

Open
vratins wants to merge 9 commits intomainfrom
dev_prs
Open

Making the CLI args for train and inference consistent#57
vratins wants to merge 9 commits intomainfrom
dev_prs

Conversation

@vratins
Copy link
Copy Markdown
Contributor

@vratins vratins commented Mar 19, 2026

For issue #50

Summary by CodeRabbit

  • New Features
    • Added options to configure logging output (log level and log file).
  • Chores
    • Renamed CLI options for clarity (geometry cache, self-conditioning, RK4 steps).
  • Bug Fixes
    • Improved handling of geometry cache resolution and metrics output to prefer the new cache name and record renamed fields.
    • Tightened metrics handling and numeric accumulation to avoid incorrect aggregates.

@vratins vratins linked an issue Mar 19, 2026 that may be closed by this pull request
@stephaniewankowicz stephaniewankowicz marked this pull request as draft March 19, 2026 23:57
@stephaniewankowicz stephaniewankowicz marked this pull request as ready for review March 19, 2026 23:58
@vratins vratins requested a review from Copilot March 20, 2026 00:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #50 by aligning scripts/train.py and scripts/inference.py command-line flags and logging configuration so the two entrypoints use consistent naming and offer the same logging controls.

Changes:

  • Renamed --rk4_steps to --num_steps in scripts/train.py and updated the corresponding usage sites.
  • Updated scripts/inference.py CLI flags to match training (--geometry_cache_name, --use_self_cond) and added --log_level / --log_file while removing the early top-level logging setup in favor of configuring logging in main().
  • Added several # ty: ignore[...] suppressions around model/dataset construction and metric handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
scripts/train.py Renames the integration-steps flag and updates eval sampling usage; adds ty ignore suppressions around model construction and training loop typing.
scripts/inference.py Renames CLI flags for consistency, adds logging CLI options and moves logging setup into main(); adds ty ignore suppressions and updates self-conditioning plumbing.
Comments suppressed due to low confidence (1)

scripts/inference.py:604

  • The metrics.json config block now stores values under keys like "use_sc" and "geometry_cache", while the CLI flags are "--use_self_cond" and "--geometry_cache_name". This makes the output config inconsistent with the CLI and can confuse downstream consumers. Consider renaming these keys (or writing both old+new keys for backward compatibility) to match the new flag names.
                        "checkpoint": args.checkpoint,
                        "method": args.method,
                        "num_steps": args.num_steps,
                        "use_sc": args.use_self_cond,
                        "threshold": args.threshold,
                        "include_mates": include_mates,
                        "water_ratio": args.water_ratio,
                        "geometry_cache": geometry_cache_name,
                    },

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/train.py Outdated
Comment thread scripts/train.py
Comment thread scripts/inference.py Outdated
Comment thread scripts/inference.py
@vratins
Copy link
Copy Markdown
Contributor Author

vratins commented Mar 20, 2026

Will update this PR and remove the ty ignore comments once #43 and #38 are merged

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a2f9271a-ff1e-4436-9e59-95eaffa2fd01

📥 Commits

Reviewing files that changed from the base of the PR and between 8bfa060 and ab2c538.

📒 Files selected for processing (1)
  • scripts/inference.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/inference.py

📝 Walkthrough

Walkthrough

CLI flags renamed and logging initialization moved into inference.py's main; added log options; type annotation tightened; geometry cache resolution and metric keys updated; train.py RK4 flag renamed and metrics accumulation/conditioning logic hardened. (48 words)

Changes

Cohort / File(s) Summary
Inference CLI & Behavior
scripts/inference.py
Renamed flags: --geometry_cache--geometry_cache_name, --use_sc--use_self_cond. Updated downstream usage, metric JSON keys, and geometry-cache resolution to prefer args.geometry_cache_nameconfig["geometry_cache_name"] → legacy config["geometry_cache"].
Inference Logging Init & Options
scripts/inference.py
Moved setup_logging_for_tqdm() from module-level to main() after args = parse_args(); added CLI args --log_level (default "INFO") and --log_file (default None) and wired them into logging initialization.
Inference Type Annotation
scripts/inference.py
Updated run_inference_batch signature: water_ratio: float = None → `water_ratio: float
Train CLI & Evaluation
scripts/train.py
Renamed CLI arg --rk4_steps--num_steps and updated evaluation/sampling calls to pass num_steps=args.num_steps.
Train Metrics & Accumulation
scripts/train.py
Tightened train_epoch() handling of metrics["per_sample_info"] by introducing a local per_sample_info and guarding dict access; added explicit cast(float, ...) when accumulating total_loss and total_rmsd.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I renamed flags and tuned the logs just right,
Hopped types into unions, kept metrics tight,
Geometry caches now find their way,
Steps renamed for training’s day,
A tiny rabbit cheers the tidy flight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main objective of the PR: making CLI arguments consistent between train.py and inference.py scripts through renames and refactoring.
Docstring Coverage ✅ Passed Docstring coverage is 94.87% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev_prs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI review requested due to automatic review settings March 23, 2026 16:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/dataset.py Outdated
Comment thread tests/test_utils.py
Comment thread tests/test_dataset.py Outdated
Comment thread src/dataset.py Outdated
Comment thread src/dataset.py
Comment thread .github/workflows/build.yml
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
src/utils.py (1)

187-193: Use the shared RBF defaults here too.

This helper still hardcodes 16 and 8.0, so callers that rely on its defaults will drift the next time NUM_RBF or RBF_CUTOFF changes.

♻️ Suggested change
 def compute_edge_features(
     pos: Tensor,
     edge_index: Tensor,
     pos_dst: Tensor | None = None,
-    num_gaussians: int = 16,
-    cutoff: float = 8.0,
+    num_gaussians: int = NUM_RBF,
+    cutoff: float = RBF_CUTOFF,
     clamp_min: float = 1e-5,
 ) -> tuple[Tensor, Tensor]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils.py` around lines 187 - 193, compute_edge_features currently
hardcodes num_gaussians=16 and cutoff=8.0; change its signature to use the
shared defaults NUM_RBF and RBF_CUTOFF (e.g. num_gaussians: int = NUM_RBF,
cutoff: float = RBF_CUTOFF) and add the necessary import of NUM_RBF and
RBF_CUTOFF into this module so callers follow the global RBF defaults; keep
clamp_min as-is.
src/dataset.py (1)

1115-1127: Consider referencing NUM_RBF constant in docstring.

The docstring hardcodes edge_rbf: (E, 16) but this dimension is defined by NUM_RBF constant. Consider documenting as edge_rbf: (E, NUM_RBF) for consistency, especially if this value might change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dataset.py` around lines 1115 - 1127, The docstring for
Dataset.__getitem__ documents edge_rbf as "(E, 16)" but that dimension is
defined by the NUM_RBF constant; update the docstring to reference NUM_RBF
(e.g., "edge_rbf: (E, NUM_RBF)") so the HeteroData description stays consistent
with the actual feature size used by the code (refer to __getitem__, NUM_RBF,
and the 'edge_rbf' field).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/inference.py`:
- Around line 428-431: The current logic for determining geometry_cache_name
only checks args.geometry_cache_name and config.get("geometry_cache_name"),
which ignores older saved runs using the legacy "geometry_cache" key; update the
resolution in scripts/inference.py so that geometry_cache_name =
args.geometry_cache_name or config.get("geometry_cache_name") or
config.get("geometry_cache") or "geometry" (i.e., fall back to the legacy
"geometry_cache" key before defaulting to "geometry"), and ensure any references
to geometry_cache_name (e.g., where it's used to locate cache directories)
continue to use the resolved value.

In `@scripts/train.py`:
- Around line 666-670: The call to flow_matcher.training_step uses a
now-mismatched signature: FlowMatcher.training_step(batch, optimizer,
grad_clip=1.0, use_self_conditioning=True). Remove the unsupported
accumulation_steps kwarg and pass the optimizer instance (e.g., the same
optimizer used for training) as the second positional argument; keep or pass
grad_clip and use_self_conditioning explicitly if needed (use args.use_self_cond
and args.grad_clip). If you still need gradient accumulation, implement
accumulation outside training_step by looping/accumulating gradients across
args.grad_accum_steps and only calling optimizer.step()/optimizer.zero_grad()
after that loop rather than relying on an accumulation_steps parameter.

In `@src/dataset.py`:
- Around line 662-669: Move the encoder_type validation to run before any
preprocessing to avoid wasted work: in src/dataset.py, check self.encoder_type
against {"gvp","slae","esm"} and raise the ValueError prior to calling
self._preprocess_all(); specifically, reorder the block so the validation of
encoder_type occurs before the call to self._preprocess_all(), keeping the same
error message and allowed set.

In `@tests/test_dataset.py`:
- Around line 49-88: The fixtures pdb_base_dir, pdb_6eey, pdb_2b5w, pdb_8dzt,
and pdb_1deu in tests/test_dataset.py shadow the portable fixtures defined in
tests/conftest.py (which use ENV_PDB_DIR / tests/test_files); remove these
duplicate fixture definitions from tests/test_dataset.py so the module uses the
conftest-provided fixtures instead (or if special behavior is required, rename
them to avoid shadowing and/or delegate to the conftest fixtures inside the new
fixture implementations such as calling the conftest pdb_base_dir fixture).

---

Nitpick comments:
In `@src/dataset.py`:
- Around line 1115-1127: The docstring for Dataset.__getitem__ documents
edge_rbf as "(E, 16)" but that dimension is defined by the NUM_RBF constant;
update the docstring to reference NUM_RBF (e.g., "edge_rbf: (E, NUM_RBF)") so
the HeteroData description stays consistent with the actual feature size used by
the code (refer to __getitem__, NUM_RBF, and the 'edge_rbf' field).

In `@src/utils.py`:
- Around line 187-193: compute_edge_features currently hardcodes
num_gaussians=16 and cutoff=8.0; change its signature to use the shared defaults
NUM_RBF and RBF_CUTOFF (e.g. num_gaussians: int = NUM_RBF, cutoff: float =
RBF_CUTOFF) and add the necessary import of NUM_RBF and RBF_CUTOFF into this
module so callers follow the global RBF defaults; keep clamp_min as-is.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b8c4a5d4-291a-4ea1-953b-8adacf86c6bc

📥 Commits

Reviewing files that changed from the base of the PR and between ea8ba8b and 5936135.

📒 Files selected for processing (10)
  • .github/workflows/build.yml
  • scripts/inference.py
  • scripts/train.py
  • src/constants.py
  • src/dataset.py
  • src/utils.py
  • tests/conftest.py
  • tests/test_dataset.py
  • tests/test_embedding_generation.py
  • tests/test_utils.py
💤 Files with no reviewable changes (1)
  • tests/test_embedding_generation.py

Comment thread scripts/inference.py Outdated
Comment thread scripts/train.py
Comment on lines 666 to 670
metrics = flow_matcher.training_step(
batch,
use_self_conditioning=args.use_self_cond,
accumulation_steps=args.grad_accum_steps,
accumulation_steps=args.grad_accum_steps, # ty: ignore[unknown-argument]
)
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

This call no longer matches FlowMatcher.training_step().

The current signature in src/flow.py (Lines 418-424) is training_step(batch, optimizer, grad_clip=1.0, use_self_conditioning=True). Line 669 adds an unsupported accumulation_steps keyword, and the required optimizer argument is no longer passed, so training will raise on the first batch instead of entering the accumulation loop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/train.py` around lines 666 - 670, The call to
flow_matcher.training_step uses a now-mismatched signature:
FlowMatcher.training_step(batch, optimizer, grad_clip=1.0,
use_self_conditioning=True). Remove the unsupported accumulation_steps kwarg and
pass the optimizer instance (e.g., the same optimizer used for training) as the
second positional argument; keep or pass grad_clip and use_self_conditioning
explicitly if needed (use args.use_self_cond and args.grad_clip). If you still
need gradient accumulation, implement accumulation outside training_step by
looping/accumulating gradients across args.grad_accum_steps and only calling
optimizer.step()/optimizer.zero_grad() after that loop rather than relying on an
accumulation_steps parameter.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vratins please check whether this is true, and if so fix it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment thread src/dataset.py Outdated
Comment thread tests/test_dataset.py Outdated
Copilot AI review requested due to automatic review settings March 23, 2026 21:23
@vratins vratins review requested due to automatic review settings March 23, 2026 21:23
Comment thread scripts/inference.py
)
p.add_argument(
"--use_sc",
"--use_self_cond",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these new names reflected in the Dockerfile?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker file only exposes the input and output arguments, everything else is taken in as arguments by the use (and hydra configs soon)

Comment thread scripts/inference.py Outdated
Comment thread scripts/inference.py
Comment thread scripts/train.py
p.add_argument("--eval_every", type=int, default=5)
p.add_argument("--n_eval_samples", type=int, default=3)
p.add_argument("--rk4_steps", type=int, default=100)
p.add_argument("--num_steps", type=int, default=100)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vratins please add an issue to add help strings to each argument.

Copy link
Copy Markdown
Contributor Author

@vratins vratins Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is this worth an issue given that we will be switching to hydra configs as in #70

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, probably not the same issue, but the config items need to be documented somewhere.

Copilot AI review requested due to automatic review settings March 26, 2026 23:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

scripts/inference.py:107

  • Changing the flag name from --geometry_cache to --geometry_cache_name will break existing inference invocations that still pass the old flag. You already handle backward compatibility for the config key later; consider also accepting --geometry_cache as an alias that maps to the same destination (geometry_cache_name) and (optionally) mark it deprecated.
    p.add_argument(
        "--geometry_cache_name",
        type=str,
        default=None,
        help="Subdirectory name within processed_dir specifying which water coordinate set to use. "
        "Options include 'geometry' (filtered waters meeting quality criteria) or "
        "'geometry_unfiltered' (all crystallographic waters). Overrides the model's config if specified.",
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/train.py
Comment thread scripts/inference.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename CLI flags to be consistent between train.py and inference.py

4 participants