Add MPS support for Apple Mac (M series chips)#145
Open
fnachon wants to merge 7 commits into
Open
Conversation
Changes made to run without errors on the Mac MPS device: torch.autocast, number of devices and workers to use on M1-5 chips, workaround for CUDA-specific code, handling of float64 incompatibilities for MPS.
This was referenced Jan 10, 2026
|
Thank you for your work on the Mac ! Here is a benchmark on my Jan 2025 M5-MBP Linux: 24 CPU i9, Blackwell RTX6000Pro, 128 GB RAM |
Author
|
Thank you. Is the same 1g13prot.yaml examples? |
|
yes, it's the same .yaml file. Perhaps the upcoming M5-pro or M5-max will be better. I have an M1-pro that I can test, as well. |
Author
|
I did not expect that an M1 Max would perform better than an M5 on steps 1, 3, and 4. But the M1 Max reportedly has higher unified memory bandwidth, and its larger size (64 GB vs 32 GB) reduced memory pressure. That makes the difference. |
Replace hardcoded torch.autocast("cuda") with device-agnostic
device_type=tensor.device.type in confidence_utils, inverse_fold,
and writer modules introduced in the upstream merge.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Python pickle does not preserve RDKit atom-level SetProp values. When PyTorch DataLoader spawns worker processes (default num_workers=1 on macOS), self.canonicals is pickled and all atom 'name' properties are lost, causing KeyError in process_atom_features. Fix: load all required molecules directly from the moldir zip inside each get_sample() / get_feat() call instead of using the pickled self.canonicals. The moldir zip handle is cached per-process by _get_zipfile(), so there is no repeated I/O overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ders - Disable pin_memory on MPS (unsupported, causes UserWarning) - Enable persistent_workers when num_workers > 0 (avoids repeated worker init overhead and the PL suggestion warning) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello,
I made a few minor modifications for full MPS compatibility (GPU).
The modifications conserve the CUDA and CPU compatibility and can be merged to the main branch without issues
On my old Apple M1 Max (64 Gb), the 1g13prot.yaml example (10 designs) runs just fine on the GPU with the following performance:
[1] design : 803 s
[2] inverse_folding : 38 s
[3] folding : 1097 s
[4] design_folding : 327 s
[5] analysis : 30 s
[6] filtering : 9 s
Curious to see how it performs on a M5?
Let me know if you find any bug...
Florian