Skip to content

Conversation

@Qiaolin-Yu
Copy link
Member

@Qiaolin-Yu Qiaolin-Yu commented Oct 23, 2025

Description

rdt currently has some limitations. update it in the doc to clarify. Disable some tests for the new assertion.

Related issues

Additional information

@Qiaolin-Yu Qiaolin-Yu requested review from a team as code owners October 23, 2025 22:02
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a documentation note about a limitation of Ray Direct Transport (RDT) with NIXL. The change is clear and helps inform users about a known issue. I have one minor suggestion to improve the wording for better clarity and professionalism.


For NIXL:

* Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects. We will fix this problem soon.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrase "We will fix this problem soon" is informal and vague. It's better to remove it for conciseness and professionalism, especially since the introduction to this section already states that limitations may be addressed in future releases.

Suggested change
* Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects. We will fix this problem soon.
* Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give like 2 small examples -

  1. sending 2 lists of tensors that overlap
    [a, b, c], [c, d, e]

  2. sending the same tensor twice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well do the error detection in the same PR so we can provide a code sample showing what error to expect.

Also, I think this assumes too much system knowledge from the user.

Due to an issue with our implementation of memory deregistration,

-> "Due to a known issue"

repeated transfers of tensors that share the same memory space but belong to different objects.

Technically this is possible, but you have to make sure to free the first object before the second. It's probably clearer with code examples, like dhyey said.

@ray-gardener ray-gardener bot added docs An issue or change related to documentation core Issues that should be addressed in Ray Core data Ray Data-related issues labels Oct 24, 2025
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Qiaolin-Yu <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
cursor[bot]

This comment was marked as outdated.

@Qiaolin-Yu Qiaolin-Yu changed the title [doc][rdt] Add the limitations of rdt with nixl [doc][rdt] Add the limitations of rdt Oct 24, 2025
Signed-off-by: Qiaolin-Yu <[email protected]>
cursor[bot]

This comment was marked as outdated.

@dayshah dayshah added the go add ONLY when ready to merge, run all tests label Oct 24, 2025
Signed-off-by: Qiaolin-Yu <[email protected]>
Qiaolin-Yu and others added 2 commits October 24, 2025 15:13
Co-authored-by: Stephanie Wang <[email protected]>
Signed-off-by: Qiaolin Yu <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
cursor[bot]

This comment was marked as outdated.

@dayshah dayshah merged commit f38b9d2 into ray-project:master Oct 25, 2025
6 checks passed
dayshah added a commit to dayshah/ray that referenced this pull request Oct 25, 2025
Signed-off-by: Dhyey Shah <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
Signed-off-by: Qiaolin Yu <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Co-authored-by: Stephanie Wang <[email protected]>
aslonnie pushed a commit that referenced this pull request Oct 25, 2025
## Description
Cherry-picking #58063 to throw an exception when trying to double send
the same ref before gc because it can trigger a NIXL error. Also adding
documentation for this.

Signed-off-by: Dhyey Shah <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
Signed-off-by: Qiaolin Yu <[email protected]>
Co-authored-by: Qiaolin Yu <[email protected]>
Co-authored-by: Stephanie Wang <[email protected]>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
Signed-off-by: Dhyey Shah <[email protected]>
Signed-off-by: Qiaolin-Yu <[email protected]>
Signed-off-by: Qiaolin Yu <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Co-authored-by: Stephanie Wang <[email protected]>
Signed-off-by: xgui <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core data Ray Data-related issues docs An issue or change related to documentation go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants