Skip to content

test(msa): add MSAManager unit tests (#178)#220

Merged
k-chrispens merged 2 commits intomainfrom
msa-tests
Apr 14, 2026
Merged

test(msa): add MSAManager unit tests (#178)#220
k-chrispens merged 2 commits intomainfrom
msa-tests

Conversation

@Abdelsalam-Abbas
Copy link
Copy Markdown
Contributor

@Abdelsalam-Abbas Abdelsalam-Abbas commented Apr 13, 2026

Summary

Adds tests/utils/test_msa.py covering the five tests requested in #178:

  1. _hash_arguments is deterministic for identical inputs and sensitive to sequence content, pairing strategy, and value order.
  2. Explicit msa_cache_dir is used (both Path and str forms); default None creates ~/.sampleworks/msa (with Path.home monkeypatched to tmp_path so the real home is untouched).
  3. get_msa calls _compute_msa when cache files are missing, and skips it on the second call when they exist. Cache-hit/api-call counters are asserted.
  4. _compute_msa forwards the expected arguments to run_mmseqs2 for both the paired and unpaired branches (2-sequence input so both fire), including auth_headers built from api_key_header / api_key_value.
  5. _compute_msa writes matching .csv and .a3m files: the second column of the csv equals the odd (sequence) lines of the a3m.

Notes

  • Mocks patch sampleworks.utils.msa.run_mmseqs2 and sampleworks.utils.msa._compute_msa (not the source modules) so the actual call sites in msa.py are intercepted — run_mmseqs2 is imported at module load, and _compute_msa is module-level.
  • Test 3 uses a side_effect that writes a valid csv/a3m pair so the unconditional _validate_msa_cache_contents call inside get_msa passes, doubling as a smoke check that the cache layout _compute_msa writes matches what get_msa expects.
  • Protenix branch is intentionally not tested per the issue; no new dependency on PROTENIX_AVAILABLE.

Test plan

  • pixi run -e boltz-dev python3 -m pytest tests/utils/test_msa.py -v → 7 passed in 0.03s
  • pixi run -e boltz-dev all-tests → 748 passed, 509 skipped, no regressions

Summary by CodeRabbit

  • Tests
    • Added a comprehensive test suite for MSA utilities covering deterministic input hashing, cache behavior and directory initialization, correct triggering of MSA computation when cache entries are missing, and subsequent cache hits. Also validates integration with the external alignment tool (correct call patterns and flags), temp-directory usage, and output consistency between generated alignment files.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f5044a80-6084-483c-9c8a-75906ee1a36b

📥 Commits

Reviewing files that changed from the base of the PR and between cdefe47 and 2450caf.

📒 Files selected for processing (1)
  • tests/utils/test_msa.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/utils/test_msa.py

📝 Walkthrough

Walkthrough

Adds a new pytest suite that tests MSA utilities: argument hashing, MSAManager cache-dir handling and cache hit/miss behavior, _compute_msa forwarding to run_mmseqs2, and writing/validation of .csv and .a3m outputs.

Changes

Cohort / File(s) Summary
MSA Test Suite
tests/utils/test_msa.py
New test file covering _hash_arguments determinism and sensitivity to input/order; MSAManager cache-dir handling for Path, str, and None (default Path.home()); get_msa cache miss/hit behavior with mocked _compute_msa; _compute_msa forwarding to run_mmseqs2 for paired/unpaired flows; and verification that <target>_0.csv and <target>_0.a3m are written and parsable.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hop through hashes, cache and file,
Pairing sequences, testing all the while.
A3Ms and CSVs tucked neat in rows,
Millisecond leaps where the mmseqs flow.
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(msa): add MSAManager unit tests (#178)' directly and clearly describes the main change: adding unit tests for the MSAManager class, covering five comprehensive test scenarios as detailed in the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch msa-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/utils/test_msa.py (1)

90-175: Add at least one black-box, no-mock path for MSAManager.get_msa

Most tests here are implementation-coupled (private methods + patched internals). Keeping these unit tests is fine, but add one public-interface test that pre-seeds cache files and calls get_msa without patching internals.

As per coding guidelines: **/test_*.py: Write black-box tests that verify behavior, not implementation. Use realistic inputs and avoid mocks. Test public interfaces with expected behaviors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_msa.py` around lines 90 - 175, Add a black-box test that
exercises the public MSAManager.get_msa without mocking internals: create a
realistic input dict and compute the expected hash_key the manager would use,
pre-seed the msa cache files (both CSV and A3M as used by _compute_msa's output
naming) under tmp_path using that hash_key, instantiate MSAManager (or use
existing manager fixture) pointing to tmp_path, call manager.get_msa(data,
pairing, structure_predictor=...) and assert the returned mapping points to the
pre-seeded files and that manager._cache_hits increments, avoiding any
patch.object on _compute_msa or run_mmseqs2 so the test verifies the public
behavior only.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/utils/test_msa.py`:
- Around line 149-155: The unpaired call assertions are missing checks that auth
and environment flags are forwarded like the paired call; update the assertions
for unpaired_call in tests/utils/test_msa.py to assert that unpaired_call.kwargs
contains the same auth_headers and env-related flags as the paired call (e.g.,
assert unpaired_call.kwargs["auth_headers"] == <expected_auth_headers> and
assert unpaired_call.kwargs["env"] == <expected_env_flags> or equivalent keys
used elsewhere), mirroring the paired-call assertions so unpaired forwarding
cannot regress silently; locate the test by the symbol unpaired_call and add
matching assertions for the auth and env keys used in the existing paired-call
checks.

---

Nitpick comments:
In `@tests/utils/test_msa.py`:
- Around line 90-175: Add a black-box test that exercises the public
MSAManager.get_msa without mocking internals: create a realistic input dict and
compute the expected hash_key the manager would use, pre-seed the msa cache
files (both CSV and A3M as used by _compute_msa's output naming) under tmp_path
using that hash_key, instantiate MSAManager (or use existing manager fixture)
pointing to tmp_path, call manager.get_msa(data, pairing,
structure_predictor=...) and assert the returned mapping points to the
pre-seeded files and that manager._cache_hits increments, avoiding any
patch.object on _compute_msa or run_mmseqs2 so the test verifies the public
behavior only.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 19ebbd21-8f22-4916-9999-7a77a5ec7c0f

📥 Commits

Reviewing files that changed from the base of the PR and between f57044e and 87a005b.

📒 Files selected for processing (1)
  • tests/utils/test_msa.py

Comment thread tests/utils/test_msa.py Outdated
Closes #178. Covers hash determinism/sensitivity, default vs explicit
cache directory handling, get_msa cache hit/miss wiring, run_mmseqs2
argument forwarding (paired and unpaired branches), and csv/a3m file
content consistency. All mocks patch the names as imported into
sampleworks.utils.msa so the module-level call sites are intercepted.
Mirror the paired-call assertions on the unpaired call so forwarding of
use_env and auth_headers can't regress silently.
Copy link
Copy Markdown
Collaborator

@k-chrispens k-chrispens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me, though I don't see where the original issue said to not add Protenix branch related tests. I believe the tests requested in the issue behave similarly for either branch, so will approve here (differs just what server is called and how things are linked up for Protenix).

@k-chrispens k-chrispens linked an issue Apr 14, 2026 that may be closed by this pull request
@k-chrispens k-chrispens merged commit a777440 into main Apr 14, 2026
8 of 11 checks passed
@k-chrispens k-chrispens deleted the msa-tests branch April 14, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tests for MSAManager

2 participants