Skip to content

Conversation

@YiXR
Copy link
Collaborator

@YiXR YiXR commented Dec 8, 2025

Description

Split shm alloc logic into a helper class.
Thus, the APP like SGLang can directly use this helper to create share mem.

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @YiXR, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the shared memory management within the Mooncake Store by introducing a dedicated ShmHelper class. This helper centralizes the logic for allocating and managing shared memory, making it more robust and reusable. A key benefit is the exposure of this functionality to Python, allowing external applications like SGLang to directly interact with the shared memory system. The changes also streamline the DummyClient's internal shared memory handling and update the IPC mechanism to align with the use of anonymous shared memory.

Highlights

  • Shared Memory Helper Class: Introduced a ShmHelper class to centralize and encapsulate shared memory allocation and management logic, promoting reusability and cleaner code.
  • Python Binding for SHM Allocation: Exposed the ShmHelper functionality to Python through a new MooncakeHostMemAllocatorPyWrapper class, enabling Python applications like SGLang to directly allocate shared memory.
  • Simplified DummyClient SHM Management: The DummyClient now leverages the ShmHelper for its shared memory operations, removing redundant allocation/deallocation logic from the client itself and simplifying its internal state.
  • Anonymous Shared Memory Adoption: Transitioned to using anonymous shared memory via memfd_create, which removes the need for explicit shared memory names in the IPC registration process, streamlining communication.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ShmHelper singleton to centralize shared memory allocation, which is a good step towards code reuse. However, the current implementation of the singleton pattern has several critical flaws. It leaks the singleton object and the underlying shared memory resources. The allocation method is not thread-safe against multiple calls, which can lead to resource leaks. Most critically, client code (DummyClient) incorrectly manages the lifecycle of the global singleton, leading to potential use-after-free bugs. I've provided detailed comments and suggestions to address these issues by adopting a modern C++ singleton pattern and correcting the resource management logic.

LOG(ERROR) << "Failed to register SHM via IPC";
munmap(shm_base_addr_, shm_size_);
close(shm_fd_);
shm_helper_->cleanup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Calling shm_helper_->cleanup() from DummyClient::setup_dummy on registration failure is incorrect. ShmHelper is a process-wide singleton, and one client failing should not tear down a global resource that other components (like MooncakeHostMemAllocatorPyWrapper or other DummyClient instances) might be using. This can lead to use-after-free bugs and crashes. The singleton should manage its own lifecycle. Please remove this call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

close(shm_fd_);
shm_fd_ = -1;
}
shm_helper_->cleanup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Calling shm_helper_->cleanup() from DummyClient::tearDownAll is incorrect. ShmHelper is a process-wide singleton, and tearing down one client should not destroy a global resource that other components might still be using. This can lead to use-after-free bugs and crashes. The singleton should manage its own lifecycle and clean up its resources only when the process exits. Please remove this call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 29 to 31
ShmHelper* ShmHelper::getInstance() {
if (instance_ == nullptr) {
std::lock_guard<std::mutex> lock(instance_mutex_);
if (instance_ == nullptr) {
instance_ = new ShmHelper();
}
}
return instance_;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current singleton implementation for ShmHelper leaks memory because the instance created with new is never deleted. This also prevents the destructor from running, which would leak the shared memory file descriptor and mapping.

A better approach in modern C++ is to use a static local variable (Meyers' Singleton). This is thread-safe since C++11 and guarantees proper cleanup at program exit.

To apply this, you should also:

  1. Remove the instance_ and instance_mutex_ static members from the ShmHelper class definition in dummy_client.h.
  2. Remove their definitions at the top of dummy_client.cpp (lines 18-19).
ShmHelper* ShmHelper::getInstance() {
    static ShmHelper instance;
    return &instance;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

void* ShmHelper::allocate(size_t size) {
shm_size_ = size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The ShmHelper::allocate method is not safe to be called more than once. If it's called a second time, it will overwrite the internal shm_fd_ and shm_base_addr_ without cleaning up the old resources, leading to leaked file descriptors and memory mappings. The check in DummyClient::setup_dummy is not sufficient to prevent this in a multi-threaded environment, as it creates a race condition.

To make this thread-safe and prevent resource leaks, you should add a guard inside allocate to prevent re-allocation.

    if (shm_fd_ != -1) {
        throw std::runtime_error("Shared memory has already been allocated.");
    }
    shm_size_ = size;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#else
errno = ENOSYS;
return -1;
return -1; // Or appropriate fallback/error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The memfd_create_wrapper function no longer sets errno when __NR_memfd_create is not available. This can lead to misleading error messages in ShmHelper::allocate, as strerror(errno) will report an unrelated or old error. Please restore setting errno to ENOSYS in the else branch to ensure correct error reporting on systems that do not support memfd_create.

Suggested change
return -1; // Or appropriate fallback/error
errno = ENOSYS;
return -1;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this,it‘s ok

@YiXR YiXR changed the title [Store]: Introduce shm helper for dummy feat[Store]: Introduce shm helper for dummy Dec 8, 2025
}

ShmHelper* ShmHelper::getInstance() {
static ShmHelper instance;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why make ShmHelper a singleton implementation .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shm is a global mem pool. It can only init once. Once app like SGLang call shm alloc, the dummy client will reuse the shm and doesn't need to alloc a new one.

py::gil_scoped_release release;
if (!self.shm_helper_) {
LOG(ERROR) << "Shared memory allocator is not initialized";
return static_cast<uintptr_t>(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise a fatal error may be better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


py::class_<MooncakeHostMemAllocatorPyWrapper>(m, "MooncakeHostMemAllocator")
.def(py::init<>())
.def("alloc", [](MooncakeHostMemAllocatorPyWrapper &self, size_t size) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a free api is also needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

shm_size_ = 0;
}

void* ShmHelper::allocate(size_t size) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a mutex guard

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +31 to +33
int shm_fd_ = -1;
void *shm_base_addr_ = nullptr;
size_t shm_size_ = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we need an optional shm_name here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use abstract shm, thus shm name can not be the identifier, we don't use it any more.


void ShmHelper::cleanup() {
if (shm_base_addr_) {
munmap(shm_base_addr_, shm_size_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

munmap failure requires error handling.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new ShmHelper singleton class to extract and centralize shared memory allocation logic from DummyClient, making it reusable by external applications like SGLang. The refactoring simplifies the IPC protocol by removing the shm_name parameter (now generated internally by RealClient) and exposes the helper through Python bindings via MooncakeHostMemAllocator.

Key Changes

  • Introduced ShmHelper singleton class for shared memory allocation using memfd_create
  • Refactored DummyClient::setup_dummy() to use ShmHelper instead of inline allocation
  • Removed shm_name from ShmRegisterRequest struct and map_shm_internal() signature
  • Added Python bindings for MooncakeHostMemAllocatorPyWrapper to expose shared memory allocation

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
mooncake-store/include/dummy_client.h Defines new ShmHelper singleton class with allocation/cleanup methods and updates DummyClient to use ShmHelper* instead of shm_name
mooncake-store/src/dummy_client.cpp Implements ShmHelper with memfd-based allocation, refactors setup_dummy() to use singleton, simplifies tearDownAll() cleanup
mooncake-store/include/real_client.h Updates map_shm_internal() signature to remove shm_name parameter
mooncake-store/src/real_client.cpp Generates shm_name internally in map_shm_internal(), removes formatting-only changes to log statements
mooncake-store/include/pyclient.h Removes shm_name field from ShmRegisterRequest struct
mooncake-integration/store/store_py.cpp Adds MooncakeHostMemAllocatorPyWrapper class to expose ShmHelper to Python with alloc() method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

close(shm_fd_);
shm_fd_ = -1;
shm_base_addr_ = nullptr;
shm_size_ = 0;
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When IPC registration fails, the code resets the local member variables but doesn't clean up the allocated shared memory in the singleton ShmHelper. Since ShmHelper is a singleton with process-wide lifetime, the allocated memory will remain until process termination. Consider calling shm_helper_->cleanup() on failure, or document this behavior if intentional.

Suggested change
shm_size_ = 0;
shm_size_ = 0;
if (shm_helper_) {
shm_helper_->cleanup();
}

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not cleanup here, as the shm_helper_ is still valid

Comment on lines 47 to 90
void* ShmHelper::allocate(size_t size) {
if (shm_fd_ != -1) {
throw std::runtime_error("Shared memory has already been allocated.");
}

shm_size_ = size;

// Create memfd
shm_fd_ = memfd_create_wrapper("mooncake_shm", MFD_CLOEXEC);
if (shm_fd_ == -1) {
throw std::runtime_error("Failed to create anonymous shared memory: " +
std::string(strerror(errno)));
}

// Set size
if (ftruncate(shm_fd_, shm_size_) == -1) {
close(shm_fd_);
throw std::runtime_error("Failed to set shared memory size: " +
std::string(strerror(errno)));
}

// Map memory
shm_base_addr_ = mmap(nullptr, shm_size_, PROT_READ | PROT_WRITE,
MAP_SHARED, shm_fd_, 0);
if (shm_base_addr_ == MAP_FAILED) {
close(shm_fd_);
throw std::runtime_error("Failed to map shared memory: " +
std::string(strerror(errno)));
}

return shm_base_addr_;
}
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocate method is not thread-safe. Multiple threads could concurrently call this method on the singleton instance, leading to race conditions when checking shm_fd_ != -1 and when modifying member variables. Consider adding a mutex to protect the allocation process, or document that this method should only be called once from a single thread.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

shm_base_addr_ = mmap(nullptr, shm_size_, PROT_READ | PROT_WRITE,
MAP_SHARED, shm_fd_, 0);
if (shm_base_addr_ == MAP_FAILED) {
close(shm_fd_);
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When mmap fails, the function closes shm_fd_ but doesn't reset it to -1 or reset shm_size_ before throwing. This leaves the object in an inconsistent state. Add shm_fd_ = -1; and shm_size_ = 0; after close(shm_fd_); to ensure proper cleanup on error.

Suggested change
close(shm_fd_);
close(shm_fd_);
shm_fd_ = -1;
shm_size_ = 0;

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 677 to 680
if (!self.shm_helper_) {
LOG(ERROR) << "Shared memory allocator is not initialized";
return static_cast<uintptr_t>(0);
}
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous issue, the null check if (!self.shm_helper_) will never be true because ShmHelper::getInstance() always returns a non-null pointer. This check can be removed, or if you want to handle initialization failures, this should be checked after the getInstance() call in the constructor.

Suggested change
if (!self.shm_helper_) {
LOG(ERROR) << "Shared memory allocator is not initialized";
return static_cast<uintptr_t>(0);
}

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


// Set size
if (ftruncate(shm_fd_, shm_size_) == -1) {
close(shm_fd_);
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When ftruncate fails, the function closes shm_fd_ but doesn't reset it to -1 before throwing. This leaves shm_fd_ in an invalid state (pointing to a closed file descriptor). Add shm_fd_ = -1; after close(shm_fd_); to maintain consistency with cleanup() pattern.

Suggested change
close(shm_fd_);
close(shm_fd_);
shm_fd_ = -1;

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 301 to 304
if (!shm_helper_) {
LOG(ERROR) << "Failed to get shared memory allocator";
return -1;
}
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null check if (!shm_helper_) is unnecessary because ShmHelper::getInstance() always returns a non-null pointer to a static instance. This check will never be true and can be removed.

Suggested change
if (!shm_helper_) {
LOG(ERROR) << "Failed to get shared memory allocator";
return -1;
}

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 309 to 311
shm_base_addr_ = shm_helper_->allocate(shm_size_);
if (shm_base_addr_ == MAP_FAILED) {
LOG(ERROR) << "Failed to allocate shared memory";
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check if (shm_base_addr_ == MAP_FAILED) will never be true because allocate() throws an exception on failure and never returns MAP_FAILED. Either handle the exception with a try-catch block, or remove this check. Currently, if allocation fails, the exception will propagate and the error log won't be printed.

Suggested change
shm_base_addr_ = shm_helper_->allocate(shm_size_);
if (shm_base_addr_ == MAP_FAILED) {
LOG(ERROR) << "Failed to allocate shared memory";
try {
shm_base_addr_ = shm_helper_->allocate(shm_size_);
} catch (const std::exception& e) {
LOG(ERROR) << "Failed to allocate shared memory: " << e.what();

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 675 to 683
.def("alloc", [](MooncakeHostMemAllocatorPyWrapper &self, size_t size) {
py::gil_scoped_release release;
if (!self.shm_helper_) {
LOG(ERROR) << "Shared memory allocator is not initialized";
return static_cast<uintptr_t>(0);
}
void *ptr = self.shm_helper_->allocate(size);
return reinterpret_cast<uintptr_t>(ptr);
});
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alloc method should handle exceptions from shm_helper_->allocate(size), which can throw std::runtime_error. Currently, if allocation fails, an exception will propagate to Python. Consider wrapping the call in a try-catch block to log the error and return 0, or let the exception propagate but document this behavior.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mem is necessary for Python, so it must throw error.

} else {
// Shared memory already allocated, just get the base address and size
shm_base_addr_ = shm_helper_->get_base_addr();
shm_size_ = shm_helper_->get_size();
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch logic assumes that if shared memory is already allocated, the existing allocation size (shm_helper_->get_size()) matches the requested size (local_buffer_size + mem_pool_size). However, if setup_dummy is called with different parameters on subsequent calls, this could lead to mismatches between the actual buffer size and the expected size. Consider validating that shm_size_ matches the requested size, or document that the first allocation parameters are permanent.

Suggested change
shm_size_ = shm_helper_->get_size();
shm_size_ = shm_helper_->get_size();
size_t requested_size = local_buffer_size + mem_pool_size;
if (shm_size_ != requested_size) {
LOG(ERROR) << "Shared memory size mismatch: existing size (" << shm_size_
<< ") does not match requested size (" << requested_size << ")";
return -1;
}

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@stmatengss stmatengss merged commit 5c3d04f into kvcache-ai:main Dec 11, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants