Skip to content

[FEA]: Add a release() method to cuda.core.Buffer to side-step the Python garbage collector #567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
leofang opened this issue Apr 22, 2025 · 2 comments · May be fixed by #649
Open
1 task done

[FEA]: Add a release() method to cuda.core.Buffer to side-step the Python garbage collector #567

leofang opened this issue Apr 22, 2025 · 2 comments · May be fixed by #649
Assignees
Labels
cuda.core Everything related to the cuda.core module feature New feature or request triage Needs the team's attention

Comments

@leofang
Copy link
Member

leofang commented Apr 22, 2025

Is this a duplicate?

Area

cuda.core

Is your feature request related to a problem? Please describe.

@realarnavgoel brought up this issue offline that we currently have a fallback weak finalizer

weakref.finalize(buffer_obj, self.close)

in case users forget to explicitly call .close() to free the memory. This could interfere with projects such as nvmath-python and nvSHMEM who require explicit control over the buffer lifetime and deallocation timing.

Describe the solution you'd like

We discussed offline and came up with this proposal that we'll add a .release() method to the buffer object. The idea is that if it is called, the weak finalizer will be unregistered, so that the only way to free the buffer is to call .close() explicitly later.

Describe alternatives you've considered

No response

Additional context

No response

@leofang leofang added cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do! labels Apr 22, 2025
@leofang leofang added this to the cuda.core beta 4 milestone Apr 22, 2025
@leofang leofang self-assigned this Apr 26, 2025
@leofang leofang linked a pull request May 20, 2025 that will close this issue
2 tasks
@rwgk
Copy link
Collaborator

rwgk commented May 22, 2025

I'm reviewing #649. I'm coming back here to learn more about the exact requirements, based on high-level concerns related to this added comment in #649:

After this method is called, the caller is responsible for calling :meth:close when done using this buffer, otherwise there would be a memory leak!

  • Do we really have to open this hole? In complex applications, it's only a matter of time for accidents to happen.

  • If we give away ownership of mr, then ptr could become a dangling pointer.

Quoting from the issue description here:

This could interfere with projects such as nvmath-python and nvSHMEM who require explicit control over the buffer lifetime and deallocation timing.

What exactly are the requirements?

explicit control over the buffer lifetime

For keeping the buffer alive: Keeping a (Python) reference to the Buffer object will do that.

deallocation timing

Calling buf.close() will do that.

I'm trying to guess: Do you have a situation in which you cannot keep the Python reference around, and therefore cannot keep the Buffer object alive, or call any methods?

@leofang
Copy link
Member Author

leofang commented May 23, 2025

We discussed in the Thursday design meeting this week, summarizing here with more context.

Today we have 2 ways to trigger the destructor of Buffer to deallocate it and return the memory back to the MR:

  1. Explicitly call the .close() method, or
  2. When the refcount goes to zero, the finalizer kicks in automatically
    • This is meant to be a fallback because we do want users to call .close()

The semantics of the proposed .release() is: If it is called, the 2nd way of deallocation is no longer available. If users don't call .close() the memory will leak and not return to the MR.

  • In the Tuesday team meeting, this was referred to by me as a "std::shared_ptr::reset()-like behavior," where we have a backdoor for users to explicit manage the object lifetime themselves, and not bound by the language constructs (C++ or Python destructor). This viewpoint was discussed/agreed upon in an internal Python/C++ meeting prior to opening this issue.

The use cases that this proposal can help include

  • buffer deallocation that must not rely on refcounts, as the triggering time of garbage collector can be non-deterministic
  • libraries requiring users to explicitly call a custom, library-provided free() function that wraps Buffer.close(), in which case the library expectation is its free() must be called

Originally nvshmem4py was meant to be the first adopter of this proposal, because their use case meets both scenarios above:

  • nvSHMEM buffers must have precise, deterministic deallocation timing due to its collective nature
  • nvshmem4py has a library-wide free() function that its users must call

However, during the design meeting we found that nvshmem4py internally holds (strong) references to the allocated buffers, and the way its free() is implemented is to just remove the internal reference. Therefore, it is unlikely that the proposed .release() could make any difference given the current nvshmem4py implementation, as there is no way for the buffers to be have zero refcount and thus garbage collected.

I believe the use cases listed above are legit (for example, nvshmem4py could choose to hold weak instead of strong references, like shmem4py). However, we agreed to put this proposal on hold until other user requests appear.

@leofang leofang added triage Needs the team's attention and removed P0 High priority - Must do! labels May 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module feature New feature or request triage Needs the team's attention
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants