cli: add option to not get the all-altloc selection string from find_altloc_selections.py#221
cli: add option to not get the all-altloc selection string from find_altloc_selections.py#221k-chrispens wants to merge 3 commits intomainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a CLI-controlled option to suppress the “all altloc residues per chain” selection emitted by find_altloc_selections(), enabling workflows that only want span-based selections.
Changes:
- Extend
find_altloc_selections()withinclude_all_altlocsto optionally omit the final per-chain “all altlocs” selection. - Add
--no-all-altlocstoscripts/eval/find_altloc_selections.pyto expose the behavior via CLI. - Minor formatting/maintenance updates (docs whitespace,
tyrule ordering, lockfile hash, trailing whitespace).
Reviewed changes
Copilot reviewed 3 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/sampleworks/utils/cif_utils.py |
Adds include_all_altlocs flag and gates emission of the final per-chain selection. |
scripts/eval/find_altloc_selections.py |
Wires CLI flag through to find_altloc_selections() and updates row processing. |
scripts/eval/EVALUATION.md |
Whitespace/formatting cleanup only. |
pyproject.toml |
Reorders tool.ty.rules entries (no functional behavior change expected). |
pixi.lock |
Updates local package hash due to changes. |
docker-entrypoint.sh |
Removes trailing whitespace in help text. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Spans of altlocs shorter than this are not yielded as selection strings, but ARE | ||
| included in the final selections which includes all residues with altlocs in each chain. | ||
| include_all_altlocs : bool | ||
| If True (default), yield a final per-chain selection string containing all residues | ||
| with altlocs regardless of span length. |
| @@ -38,6 +41,9 @@ def find_altloc_selections( | |||
| Minimum number of consecutive residues to consider an altloc selection. | |||
| Spans of altlocs shorter than this are not yielded as selection strings, but ARE | |||
| included in the final selections which includes all residues with altlocs in each chain. | |||
| include_all_altlocs : bool | |||
| If True (default), yield a final per-chain selection string containing all residues | |||
| with altlocs regardless of span length. | |||
|
|
|||
| Yields | |||
| ------ | |||
| @@ -72,12 +78,13 @@ def find_altloc_selections( | |||
| # FIXME use new style selection https://github.com/diff-use/sampleworks/issues/56 | |||
| yield f"chain {chain} and resi {start}-{end}" # old style, more compact, selection | |||
|
|
|||
| if chain not in all_altloc_selections: | |||
| all_altloc_selections[chain] = [] | |||
| if start == end: | |||
| all_altloc_selections[chain].append(f"(res_id == {start})") | |||
| else: | |||
| all_altloc_selections[chain].append(f"(res_id >= {start} and res_id <= {end})") | |||
| if include_all_altlocs: | |||
| if chain not in all_altloc_selections: | |||
| all_altloc_selections[chain] = [] | |||
| if start == end: | |||
| all_altloc_selections[chain].append(f"(res_id == {start})") | |||
| else: | |||
| all_altloc_selections[chain].append(f"(res_id >= {start} and res_id <= {end})") | |||
| find_altloc_selections(cif_file, altloc_label, min_span, include_all_altlocs) | ||
| ) | ||
| if not selections: | ||
| logger.warning(f"No altlocs found for {cif_file}") |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/sampleworks/utils/cif_utils.py (1)
40-46:⚠️ Potential issue | 🟡 MinorDocstring now overstates short-span inclusion behavior.
The
min_spandescription still reads as unconditional inclusion in final selections, but this is now conditional oninclude_all_altlocs=True. Please align this text to prevent API confusion.📝 Proposed docstring fix
min_span : int Minimum number of consecutive residues to consider an altloc selection. - Spans of altlocs shorter than this are not yielded as selection strings, but ARE - included in the final selections which includes all residues with altlocs in each chain. + Spans shorter than this are not yielded as individual span selections. + When ``include_all_altlocs`` is True, they are still included in the final + per-chain aggregate selections.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampleworks/utils/cif_utils.py` around lines 40 - 46, Update the docstring for the parameters min_span and include_all_altlocs in src/sampleworks/utils/cif_utils.py to clarify behavior: state that spans of altlocs shorter than min_span are not yielded as selection strings, and that those short spans will only be included in the final per-chain selection string if include_all_altlocs is True; mention both parameter names (min_span, include_all_altlocs) so the maintainer can locate the docstring to adjust the wording accordingly.
🧹 Nitpick comments (1)
scripts/eval/find_altloc_selections.py (1)
9-11: Add a NumPy-style docstring to_process_row().This function is modified in this PR but still lacks a NumPy-style docstring, and it has an observable side effect (warning log when selections are empty).
📚 Proposed docstring addition
def _process_row( row: pd.Series, altloc_label: str, min_span: int, include_all_altlocs: bool ) -> pd.Series: + """Convert one input row into the output selection schema. + + Parameters + ---------- + row : pd.Series + Input row with structure and map metadata. + altloc_label : str + CIF altloc field name. + min_span : int + Minimum span length for yielded altloc segments. + include_all_altlocs : bool + Whether to include per-chain aggregate altloc selections. + + Returns + ------- + pd.Series + Output row used by downstream evaluation scripts. + + Notes + ----- + Logs a warning when no altloc selection is found. + """As per coding guidelines, "Always include NumPy-style docstrings for every function and class."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/find_altloc_selections.py` around lines 9 - 11, Add a NumPy-style docstring to the function _process_row describing its purpose, parameters (row: pd.Series, altloc_label: str, min_span: int, include_all_altlocs: bool), return type (pd.Series) and behavior; explicitly document the observable side effect that it may emit a warning log when selections are empty and any exceptions or edge cases (e.g., empty inputs or filtered results). Keep the docstring in NumPy style with short summary, Parameters, Returns, and Notes/Warnings sections and reference the function's behavior on empty selections so callers know about the logging side effect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/sampleworks/utils/cif_utils.py`:
- Around line 40-46: Update the docstring for the parameters min_span and
include_all_altlocs in src/sampleworks/utils/cif_utils.py to clarify behavior:
state that spans of altlocs shorter than min_span are not yielded as selection
strings, and that those short spans will only be included in the final per-chain
selection string if include_all_altlocs is True; mention both parameter names
(min_span, include_all_altlocs) so the maintainer can locate the docstring to
adjust the wording accordingly.
---
Nitpick comments:
In `@scripts/eval/find_altloc_selections.py`:
- Around line 9-11: Add a NumPy-style docstring to the function _process_row
describing its purpose, parameters (row: pd.Series, altloc_label: str, min_span:
int, include_all_altlocs: bool), return type (pd.Series) and behavior;
explicitly document the observable side effect that it may emit a warning log
when selections are empty and any exceptions or edge cases (e.g., empty inputs
or filtered results). Keep the docstring in NumPy style with short summary,
Parameters, Returns, and Notes/Warnings sections and reference the function's
behavior on empty selections so callers know about the logging side effect.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8f537b4e-5d11-48a4-8fdc-9852fde03f03
⛔ Files ignored due to path filters (1)
pixi.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
docker-entrypoint.shpyproject.tomlscripts/eval/EVALUATION.mdscripts/eval/find_altloc_selections.pysrc/sampleworks/utils/cif_utils.py
|
I will add some tests, converting to draft |
17e624f to
146d29c
Compare
Summary by CodeRabbit
New Features
--no-all-altlocsCLI flag to control inclusion of per-chain altloc residue selections (enabled by default).Documentation
Chores