Skip to content

Conversation

@AscendTransport
Copy link
Contributor

@AscendTransport AscendTransport commented Nov 27, 2025

Description

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @AscendTransport, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the mooncake transfer engine by introducing a new HeterogeneousTcpTransport layer. This addition enables efficient data transfer between Ascend NPUs and GPUs over TCP, complementing existing transport methods. The changes involve updating the build configuration, integrating the new transport into the multi-transport system, and implementing the necessary memory management and transfer logic for the TCP-based heterogeneous communication.

Highlights

  • New Heterogeneous TCP Transport: Introduced HeterogeneousTcpTransport to facilitate data transfer between Ascend NPUs and GPUs using TCP.
  • Build System Integration: Added a new CMake option USE_ASCEND_HETEROGENEOUS_TCP and integrated it into the build system for conditional compilation and linking.
  • Transfer Engine Updates: Modified the core TransferEngine and MultiTransport components to recognize and utilize the new TCP-based heterogeneous transport.
  • Memory Management for Transfers: Implemented mechanisms within HeterogeneousTcpTransport for managing host and device memory, including asynchronous memory copies, to optimize data movement.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new HeterogeneousTcpTransport to enable communication between Ascend NPUs and GPUs over TCP. The changes are extensive, touching numerous CMake files to integrate the new build option and source files, and adding the new transport's header and implementation. My review has identified several critical issues, including syntax errors in C++ code and logical flaws in CMake scripts that are likely to cause build failures. Additionally, I've noted a high-severity performance issue related to busy-waiting and multiple medium-severity concerns regarding code maintainability, style, and potential copy-paste errors. I have provided specific code suggestions to address these findings.

Comment on lines +241 to +245
#ifdef USE_ASCEND_HETEROGENEOUS_TCP
else if (std::string(proto) == "ascend") {
transport = new HeterogeneousTcpTransport();
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block continues a pattern of broken if/else if chains due to the use of separate #ifdef blocks. If both USE_ASCEND_HETEROGENEOUS and USE_ASCEND_HETEROGENEOUS_TCP are defined, this will cause a compile error. The logic for selecting the 'ascend' transport should be consolidated into a single else if block with nested #if/#elif/#endif preprocessor directives to correctly handle mutual exclusivity and prevent build failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintain consistency with the aforementioned branch

free_list_.resize(kSlabSizeKBTabLen);
#if !defined(USE_ASCEND) && !defined(USE_ASCEND_DIRECT) && \
!defined(USE_ASCEND_HETEROGENEOUS)
!defined(USE_ASCEND_HETEROGENEOUS) && && !defined(USE_ASCEND_HETEROGENEOUS_TCP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a syntax error in the preprocessor directive. The && && is a typo and will cause a compilation failure. It should be a single &&.

Suggested change
!defined(USE_ASCEND_HETEROGENEOUS) && && !defined(USE_ASCEND_HETEROGENEOUS_TCP)
!defined(USE_ASCEND_HETEROGENEOUS) && !defined(USE_ASCEND_HETEROGENEOUS_TCP)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 51 to 55
if (transferQueues_.empty()) {
transfer_cond_.wait(lock, [this] { return !transferQueues_.empty() || !running_; });
if (transferQueues_.empty()) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block of code for waiting on the transfer queue is syntactically incorrect and will not compile. The if statement is unnecessary and its structure is malformed. The correct pattern is to call wait on the condition variable directly with a predicate. This also correctly handles spurious wakeups.

        transfer_cond_.wait(lock, [this] { return !transferQueues_.empty() || !running_; });
        if (transferQueues_.empty()) {
            continue;
        }

Comment on lines +32 to 35
if (USE_ASCEND_HETEROGENEOUS OR USE_ASCEND_HETEROGENEOUS_TCP)
add_subdirectory(ascend_transport)
target_sources(transport PUBLIC $<TARGET_OBJECTS:ascend_transport>)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This if block is redundant. The if block on lines 27-30 already handles adding the ascend_transport subdirectory for other USE_ASCEND flags. If any of those flags are enabled along with USE_ASCEND_HETEROGENEOUS or USE_ASCEND_HETEROGENEOUS_TCP, add_subdirectory(ascend_transport) will be called twice, causing a CMake error. These conditions should be combined into a single if block.

Comment on lines 413 to 415
while (transfer_counter_.load() < usedHugeDevNum) {
std::this_thread::yield();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This while loop implements a busy-wait using std::this_thread::yield(). This is inefficient and can consume significant CPU resources, especially if the wait is long. A more efficient approach would be to use a std::condition_variable or a C++20 std::counting_semaphore to wait for the transfer tasks to complete without busy-looping.

Comment on lines +185 to 197
if (USE_ASCEND_HETEROGENEOUS OR USE_ASCEND_HETEROGENEOUS_TCP)
file(GLOB ASCEND_TOOLKIT_ROOT "/usr/local/Ascend/ascend-toolkit/latest/*-linux")
set(ASCEND_LIB_DIR "${ASCEND_TOOLKIT_ROOT}/lib64")
set(ASCEND_INCLUDE_DIR "${ASCEND_TOOLKIT_ROOT}/include")
add_compile_definitions(USE_ASCEND_HETEROGENEOUS)
if (USE_ASCEND_HETEROGENEOUS)
add_compile_definitions(USE_ASCEND_HETEROGENEOUS)
endif()
if (USE_ASCEND_HETEROGENEOUS_TCP)
add_compile_definitions(USE_ASCEND_HETEROGENEOUS_TCP)
endif()
include_directories(/usr/local/include /usr/include ${ASCEND_INCLUDE_DIR})
link_directories(${ASCEND_LIB_DIR})
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is duplicated logic for setting up the Ascend toolkit across multiple CMake files (e.g., mooncake-transfer-engine/CMakeLists.txt, mooncake-transfer-engine/src/CMakeLists.txt). To improve maintainability and avoid inconsistencies, this configuration should be centralized in this common file. The other files should not repeat this setup.

return -1;
}

hostAddr_ = static_cast<char*>(aligned_alloc(64, HUGE_HOST_SIZE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The aligned_alloc function is part of the C11 standard. Since this is a C++20 project, it would be more idiomatic to use std::aligned_alloc from the <cstdlib> header. Note that memory allocated with std::aligned_alloc must also be deallocated with std::free.

Comment on lines 187 to 188
LOG(ERROR) << "rdma transport registerLocalMemory error, ret: "
<< ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message mentions 'rdma transport', but this is the HeterogeneousTcpTransport. This appears to be a copy-paste error and should be corrected to 'tcp transport' for clarity.

            LOG(ERROR) << "tcp transport registerLocalMemory error, ret: "
                       << ret;

Comment on lines 207 to 208
LOG(ERROR) << "rdma transport unregisterLocalMemory error, ret: "
<< ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message mentions 'rdma transport', but this is the HeterogeneousTcpTransport. This appears to be a copy-paste error and should be corrected to 'tcp transport' for clarity.

            LOG(ERROR) << "tcp transport unregisterLocalMemory error, ret: "
                       << ret;

Comment on lines 25 to 27
#define HUGE_HOST_SIZE (3ULL * 1024 * 1024 * 1024)
#define HUGE_DEVICE_SIZE (8 * 1024 * 1024)
#define HUGE_DEVICE_NUM 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In modern C++, it is preferred to use constexpr for compile-time constants instead of #define macros. constexpr provides type safety and is scoped, which avoids potential issues with macro expansion.

Suggested change
#define HUGE_HOST_SIZE (3ULL * 1024 * 1024 * 1024)
#define HUGE_DEVICE_SIZE (8 * 1024 * 1024)
#define HUGE_DEVICE_NUM 4
constexpr uint64_t HUGE_HOST_SIZE = 3ULL * 1024 * 1024 * 1024;
constexpr uint64_t HUGE_DEVICE_SIZE = 8 * 1024 * 1024;
constexpr int HUGE_DEVICE_NUM = 4;

@AscendTransport AscendTransport force-pushed the ascend-tcp-agg branch 3 times, most recently from d513b3a to 7608bd5 Compare November 27, 2025 08:08
Status getTransferStatus(BatchID batch_id, size_t task_id,
TransferStatus &status) override;

private:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to rdma.h, for heterogeneous computing cards, some interfaces need to be exposed for external calls.

What is this change for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between ascend_transport and tcp_transport?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It supports heterogeneous devices, while ascend_transport does not support heterogeneous devices.
  2. This pull request is only suitable for situations without RDMA hardware, so it is placed in this directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants