[FEA]: Add a `release()` method to `cuda.core.Buffer` to side-step the Python garbage collector #567

leofang · 2025-04-22T19:06:07Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

cuda.core

Is your feature request related to a problem? Please describe.

@realarnavgoel brought up this issue offline that we currently have a fallback weak finalizer

cuda-python/cuda_core/cuda/core/experimental/_memory.py

Line 51 in d425a88

weakref.finalize(buffer_obj, self.close)

in case users forget to explicitly call .close() to free the memory. This could interfere with projects such as nvmath-python and nvSHMEM who require explicit control over the buffer lifetime and deallocation timing.

Describe the solution you'd like

We discussed offline and came up with this proposal that we'll add a .release() method to the buffer object. The idea is that if it is called, the weak finalizer will be unregistered, so that the only way to free the buffer is to call .close() explicitly later.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

rwgk · 2025-05-22T18:56:03Z

I'm reviewing #649. I'm coming back here to learn more about the exact requirements, based on high-level concerns related to this added comment in #649:

After this method is called, the caller is responsible for calling :meth:close when done using this buffer, otherwise there would be a memory leak!

Do we really have to open this hole? In complex applications, it's only a matter of time for accidents to happen.
If we give away ownership of mr, then ptr could become a dangling pointer.

Quoting from the issue description here:

This could interfere with projects such as nvmath-python and nvSHMEM who require explicit control over the buffer lifetime and deallocation timing.

What exactly are the requirements?

explicit control over the buffer lifetime

For keeping the buffer alive: Keeping a (Python) reference to the Buffer object will do that.

deallocation timing

Calling buf.close() will do that.

I'm trying to guess: Do you have a situation in which you cannot keep the Python reference around, and therefore cannot keep the Buffer object alive, or call any methods?

leofang · 2025-05-23T22:58:33Z

We discussed in the Thursday design meeting this week, summarizing here with more context.

Today we have 2 ways to trigger the destructor of Buffer to deallocate it and return the memory back to the MR:

Explicitly call the .close() method, or
When the refcount goes to zero, the finalizer kicks in automatically
- This is meant to be a fallback because we do want users to call .close()

The semantics of the proposed .release() is: If it is called, the 2nd way of deallocation is no longer available. If users don't call .close() the memory will leak and not return to the MR.

In the Tuesday team meeting, this was referred to by me as a "std::shared_ptr::reset()-like behavior," where we have a backdoor for users to explicit manage the object lifetime themselves, and not bound by the language constructs (C++ or Python destructor). This viewpoint was discussed/agreed upon in an internal Python/C++ meeting prior to opening this issue.

The use cases that this proposal can help include

buffer deallocation that must not rely on refcounts, as the triggering time of garbage collector can be non-deterministic
libraries requiring users to explicitly call a custom, library-provided free() function that wraps Buffer.close(), in which case the library expectation is its free() must be called

Originally nvshmem4py was meant to be the first adopter of this proposal, because their use case meets both scenarios above:

nvSHMEM buffers must have precise, deterministic deallocation timing due to its collective nature
nvshmem4py has a library-wide free() function that its users must call

However, during the design meeting we found that nvshmem4py internally holds (strong) references to the allocated buffers, and the way its free() is implemented is to just remove the internal reference. Therefore, it is unlikely that the proposed .release() could make any difference given the current nvshmem4py implementation, as there is no way for the buffers to be have zero refcount and thus garbage collected.

I believe the use cases listed above are legit (for example, nvshmem4py could choose to hold weak instead of strong references, like shmem4py). However, we agreed to put this proposal on hold until other user requests appear.

leofang added cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do! labels Apr 22, 2025

leofang added this to the cuda.core beta 4 milestone Apr 22, 2025

leofang self-assigned this Apr 26, 2025

leofang linked a pull request May 20, 2025 that will close this issue

Add Buffer.release() #649

Draft

2 tasks

leofang modified the milestones: cuda.core beta 4, cuda.core parking lot May 23, 2025

leofang added triage Needs the team's attention and removed P0 High priority - Must do! labels May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA]: Add a `release()` method to `cuda.core.Buffer` to side-step the Python garbage collector #567

[FEA]: Add a `release()` method to `cuda.core.Buffer` to side-step the Python garbage collector #567

leofang commented Apr 22, 2025 •

edited

Loading

rwgk commented May 22, 2025

Uh oh!

leofang commented May 23, 2025 •

edited

Loading

Uh oh!

[FEA]: Add a release() method to cuda.core.Buffer to side-step the Python garbage collector #567

[FEA]: Add a release() method to cuda.core.Buffer to side-step the Python garbage collector #567

Comments

leofang commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

rwgk commented May 22, 2025

Uh oh!

leofang commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

[FEA]: Add a `release()` method to `cuda.core.Buffer` to side-step the Python garbage collector #567

[FEA]: Add a `release()` method to `cuda.core.Buffer` to side-step the Python garbage collector #567

leofang commented Apr 22, 2025 •

edited

Loading

leofang commented May 23, 2025 •

edited

Loading