Skip to content

Add MPS support for Apple Mac (M series chips)#145

Open
fnachon wants to merge 7 commits into
HannesStark:mainfrom
fnachon:main
Open

Add MPS support for Apple Mac (M series chips)#145
fnachon wants to merge 7 commits into
HannesStark:mainfrom
fnachon:main

Conversation

@fnachon
Copy link
Copy Markdown

@fnachon fnachon commented Jan 10, 2026

Hello,

I made a few minor modifications for full MPS compatibility (GPU).
The modifications conserve the CUDA and CPU compatibility and can be merged to the main branch without issues

On my old Apple M1 Max (64 Gb), the 1g13prot.yaml example (10 designs) runs just fine on the GPU with the following performance:
[1] design : 803 s
[2] inverse_folding : 38 s
[3] folding : 1097 s
[4] design_folding : 327 s
[5] analysis : 30 s
[6] filtering : 9 s

Curious to see how it performs on a M5?
Let me know if you find any bug...

Florian

fnachon and others added 2 commits January 10, 2026 15:54
Changes made to run without errors on the Mac MPS device: torch.autocast, number of devices and workers to use on M1-5 chips, workaround for CUDA-specific code, handling of float64 incompatibilities for MPS.
@21tesla
Copy link
Copy Markdown

21tesla commented Jan 26, 2026

Thank you for your work on the Mac ! Here is a benchmark on my Jan 2025 M5-MBP

Linux: 24 CPU i9, Blackwell RTX6000Pro, 128 GB RAM
Mac: 10 CPU M5, 32 GB RAM

         Linux      Mac 
Step 1   75.9      1126.1 
Step 2    7.7        20.4
Step 3   99.1      1780.2
Step 4   69.5       468.6
Step 5   17.0        18.1
Step 6    5.0         5.1

@fnachon
Copy link
Copy Markdown
Author

fnachon commented Jan 26, 2026

Thank you. Is the same 1g13prot.yaml examples?

@21tesla
Copy link
Copy Markdown

21tesla commented Jan 26, 2026

yes, it's the same .yaml file. Perhaps the upcoming M5-pro or M5-max will be better. I have an M1-pro that I can test, as well.

@fnachon
Copy link
Copy Markdown
Author

fnachon commented Jan 28, 2026

I did not expect that an M1 Max would perform better than an M5 on steps 1, 3, and 4. But the M1 Max reportedly has higher unified memory bandwidth, and its larger size (64 GB vs 32 GB) reduced memory pressure. That makes the difference.
Can't wait for an M5 max :)

fnachon and others added 5 commits March 31, 2026 15:18
Replace hardcoded torch.autocast("cuda") with device-agnostic
device_type=tensor.device.type in confidence_utils, inverse_fold,
and writer modules introduced in the upstream merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Python pickle does not preserve RDKit atom-level SetProp values. When
PyTorch DataLoader spawns worker processes (default num_workers=1 on
macOS), self.canonicals is pickled and all atom 'name' properties are
lost, causing KeyError in process_atom_features.

Fix: load all required molecules directly from the moldir zip inside
each get_sample() / get_feat() call instead of using the pickled
self.canonicals. The moldir zip handle is cached per-process by
_get_zipfile(), so there is no repeated I/O overhead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ders

- Disable pin_memory on MPS (unsupported, causes UserWarning)
- Enable persistent_workers when num_workers > 0 (avoids repeated
  worker init overhead and the PL suggestion warning)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants