-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a crash in NeMo 2.0 during module._apply(lambda t: t.cpu()) #1502
Conversation
Signed-off-by: Guyue Huang <[email protected]>
Signed-off-by: Guyue Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the race condition that this is fixing? From what I can tell, Float8Tensor.cpu
should already synchronize the GPU:
TransformerEngine/transformer_engine/pytorch/tensor/quantized_tensor.py
Lines 321 to 323 in e70f913
def cpu(self, memory_format=torch.preserve_format) -> torch.Tensor: | |
# pylint: disable=missing-function-docstring | |
return self.dequantize().cpu(memory_format=memory_format) |
The actual problem is later in
Float8Tensor._set_data
:self.data = self._quantizer.quantize(tensor) |
We are passing a CPU tensor into the quantize kernel, and I don't think we ever move it to GPU. This doesn't explain why this PR fixes the IMA, so I could have missed something.
If my interpretation is the actual root cause, the quickest fix is to modify Float8Tensor._set_data
with:
self.data = self._quantizer.quantize(tensor.to(device=self.device))
More long-term fixes are to handle CPU tensors in the quantize function or to support CPU Float8Tensor
s.
@timmoon10 I also don't fully understand the race condition, it just happened to work. I will dig into it, and reply here my findings. |
Findings:
|
Signed-off-by: Guyue Huang <[email protected]>
I decide to revert the |
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Fix a crash with module._apply(lambda t: t.cpu()) Signed-off-by: Guyue Huang <[email protected]> * Add comments Signed-off-by: Guyue Huang <[email protected]> * Make sure tensor is moved to dst device before quantizer quantizes Signed-off-by: Guyue Huang <[email protected]> --------- Signed-off-by: Guyue Huang <[email protected]> Co-authored-by: Tim Moon <[email protected]>
Description
In Nemo 2.0 during job exit, lightning calls a
module._apply(lambda t: t.cpu())
on the GPT model which triggers an illegal memory access error in the TE dequantize kernel. This PR fixes the issue.Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: