-
Couldn't load subscription status.
- Fork 6.8k
[doc][rdt] Add the limitations of rdt #58063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a documentation note about a limitation of Ray Direct Transport (RDT) with NIXL. The change is clear and helps inform users about a known issue. I have one minor suggestion to improve the wording for better clarity and professionalism.
|
|
||
| For NIXL: | ||
|
|
||
| * Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects. We will fix this problem soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "We will fix this problem soon" is informal and vague. It's better to remove it for conciseness and professionalism, especially since the introduction to this section already states that limitations may be addressed in future releases.
| * Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects. We will fix this problem soon. | |
| * Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give like 2 small examples -
-
sending 2 lists of tensors that overlap
[a, b, c], [c, d, e] -
sending the same tensor twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might as well do the error detection in the same PR so we can provide a code sample showing what error to expect.
Also, I think this assumes too much system knowledge from the user.
Due to an issue with our implementation of memory deregistration,
-> "Due to a known issue"
repeated transfers of tensors that share the same memory space but belong to different objects.
Technically this is possible, but you have to make sure to free the first object before the second. It's probably clearer with code examples, like dhyey said.
Signed-off-by: Dhyey Shah <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
Co-authored-by: Stephanie Wang <[email protected]> Signed-off-by: Qiaolin Yu <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
Signed-off-by: Dhyey Shah <[email protected]>
Signed-off-by: Dhyey Shah <[email protected]> Signed-off-by: Qiaolin-Yu <[email protected]> Signed-off-by: Qiaolin Yu <[email protected]> Co-authored-by: Dhyey Shah <[email protected]> Co-authored-by: Stephanie Wang <[email protected]>
## Description Cherry-picking #58063 to throw an exception when trying to double send the same ref before gc because it can trigger a NIXL error. Also adding documentation for this. Signed-off-by: Dhyey Shah <[email protected]> Signed-off-by: Qiaolin-Yu <[email protected]> Signed-off-by: Qiaolin Yu <[email protected]> Co-authored-by: Qiaolin Yu <[email protected]> Co-authored-by: Stephanie Wang <[email protected]>
Signed-off-by: Dhyey Shah <[email protected]> Signed-off-by: Qiaolin-Yu <[email protected]> Signed-off-by: Qiaolin Yu <[email protected]> Co-authored-by: Dhyey Shah <[email protected]> Co-authored-by: Stephanie Wang <[email protected]> Signed-off-by: xgui <[email protected]>
Description
rdt currently has some limitations. update it in the doc to clarify. Disable some tests for the new assertion.
Related issues
Additional information