TOOLS/DEVICE: support channel id in perftest #10993

jeynmann · 2025-11-06T03:39:22Z

What?

Support channel id in ucx perftest.

Why?

Improve performance by distribute request on qps.

How?

Select channel by thread id modulo numbers of channel.

Summary by CodeRabbit

New Features
- Multi-channel device endpoint support for CUDA performance tests, allowing workloads to be distributed across configurable device channels for more accurate measurements.
Chores
- New channel count parameter added with a default of 1 and initialization added so tests behave consistently when channels are unspecified.

coderabbitai · 2025-11-06T03:39:35Z

Walkthrough

Added a device endpoint channel count parameter and propagated it through perf test initialization into CUDA context and kernel code. CUDA runtime now computes a per-thread channel_id (threadIdx.x % num_channels) and uses it in device_put_* calls; device-side arrays are copied via a new device_vector helper.

Changes

Cohort / File(s)	Summary
Public config `src/tools/perf/api/libperf.h`	Added `unsigned device_ep_channel_count` to `ucx_perf_params_t`.
Test initialization `src/tools/perf/perftest.c`, `src/tools/perf/perftest_params.c`	Initialize `device_ep_channel_count = 1` in perftest params and set it in adjust_test_params when device send is used.
CUDA context `src/tools/perf/cuda/cuda_kernel.cuh`	Added `unsigned num_channels` to `ucx_perf_cuda_context` and populate it from `perf.params.device_ep_channel_count` in the constructor.
CUDA kernel & runtime `src/tools/perf/cuda/ucp_cuda_kernel.cu`	Added `num_channels` to `ucp_perf_cuda_params`; initialize from perf params; compute `channel_id = threadIdx.x % num_channels` and pass it to `device_put_single`, `device_put_multi`, and `device_put_multi_partial`; added `device_vector<T>` helper and use it to allocate/copy indices, offsets, lengths to device.

Sequence Diagram

sequenceDiagram
    participant Config as Config (libperf)
    participant Test as Test Init
    participant Host as Host runtime
    participant GPU as CUDA Kernel
    participant NIC as Device ops

    Config->>Test: device_ep_channel_count
    Test->>Host: populate params (num_channels)
    Host->>Host: device_vector(copy indices/offsets/lengths)
    Host->>GPU: launch kernel (params, device arrays)
    GPU->>GPU: channel_id = threadIdx.x % num_channels
    GPU->>NIC: device_put_* (with channel_id)
    NIC-->>GPU: completion

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review areas needing attention:
- Correctness of channel_id computation and its usage across all device_put_* call sites.
- Memory lifetime and error handling in device_vector<T> (allocation/copy/free).
- Consistency of num_channels propagation between host structs and device-visible params.

Suggested reviewers

brminich
iyastreb
ofirfarjun7

Poem

🐰 New channels hum where threads once played,
I hop through loops where indices are laid.
From config seed to kernel's bright art,
Each thread finds a lane — a channeled heart. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'TOOLS/DEVICE: support channel id in perftest' directly and accurately summarizes the main change: adding channel ID support to the perftest tool.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84da1e1 and d4803a0.

📒 Files selected for processing (1)

src/tools/perf/cuda/ucp_cuda_kernel.cu (4 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-06T09:04:19.215Z

Learnt from: iyastreb
Repo: openucx/ucx PR: 10906
File: src/tools/perf/cuda/ucp_cuda_kernel.cu:70-91
Timestamp: 2025-11-06T09:04:19.215Z
Learning: In UCX device API (src/ucp/api/device/ucp_device_impl.h), nullptr is a valid and supported value for the ucp_device_request_t* parameter in functions like ucp_device_put_single, ucp_device_put_multi, etc. This is an intentional performance optimization where operations are posted without per-request tracking overhead. The API explicitly handles nullptr in ucp_device_request_init and UCP_DEVICE_SEND_BLOCKING macro.

Applied to files:

src/tools/perf/cuda/ucp_cuda_kernel.cu

🧬 Code graph analysis (1)

src/tools/perf/cuda/ucp_cuda_kernel.cu (1)

src/ucp/api/device/ucp_device_impl.h (2)

ucp_device_put_single (143-166)

ucp_device_put_multi (264-290)

🔇 Additional comments (3)

src/tools/perf/cuda/ucp_cuda_kernel.cu (3)

113-113: LGTM!

The addition of the num_channels field to the parameter structure is appropriate for supporting channel-based routing.

214-217: LGTM!

The device_vector helper correctly allocates device memory and copies host data, with proper cleanup in the destructor. The usage for transferring indices and offsets to the device is appropriate.

Also applies to: 229-237

126-126: No issues found—device_ep_channel_count is always initialized to 1.

Verification confirms all code paths set device_ep_channel_count = 1 before use (perftest.c:224 and perftest_params.c:760). The modulo operation at line 249 cannot cause division by zero with current code. Defensive validation is unnecessary unless the TODO in perftest_params.c (reading channel count from ucp config) is implemented in the future.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87e88f7 and 7355ab4.

📒 Files selected for processing (9)

contrib/test_jenkins.sh (1 hunks)
contrib/ucx_perftest_config/test_types_ucp_device_cuda (2 hunks)
src/tools/perf/api/libperf.h (3 hunks)
src/tools/perf/cuda/cuda_kernel.cuh (4 hunks)
src/tools/perf/cuda/ucp_cuda_kernel.cu (7 hunks)
src/tools/perf/perftest.c (2 hunks)
src/tools/perf/perftest.h (1 hunks)
src/tools/perf/perftest_params.c (3 hunks)
test/gtest/common/test_perf.cc (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

src/tools/perf/perftest.c (2)

src/tools/perf/lib/ucp_tests.cc (4)

params (197-219)

params (197-199)

params (221-231)

params (221-221)

src/ucs/sys/sys.c (1)

ucs_get_page_size (530-543)

src/tools/perf/perftest_params.c (1)

src/tools/perf/lib/uct_tests.cc (2)

params (118-130)

params (118-118)

src/tools/perf/cuda/ucp_cuda_kernel.cu (1)

src/ucp/api/device/ucp_device_impl.h (5)

ucs_status_t (83-104)

ucp_device_progress_req (441-451)

ucp_device_put_single (143-166)

ucp_device_put_multi (264-290)

ucp_device_put_multi_partial (347-377)

coderabbitai · 2025-11-06T03:45:24Z

src/tools/perf/cuda/cuda_kernel.cuh

+        if ((threadIdx.x == 0) && ucs_unlikely(completed >= m_next_report_iter)) {
+            ucx_perf_cuda_time_t cur_time  = ucx_perf_cuda_get_time_ns();
+            ucx_perf_cuda_time_t iter_time = (cur_time - m_last_report_time) /
+                                             (completed - m_ctx.completed_iters);
+            m_last_report_time             = cur_time;
+            m_ctx.completed_iters          = completed;
+            __threadfence_system();
+
+            m_next_report_iter = ucs_min(completed + (m_report_interval_ns / iter_time),
+                                         m_max_iters);


⚠️ Potential issue | 🔴 Critical

Guard against zero deltas before dividing.

Here we divide by both (completed - m_ctx.completed_iters) and iter_time. If the progress callback fires twice with the same completed (e.g. because m_report_interval_ns / iter_time rounded to zero) or if the GPU clock hasn’t advanced yet, both denominators become zero and the SM traps with a divide-by-zero. Please bail out when delta == 0 and clamp iter_time to at least 1 before the final division.

@@ - if ((threadIdx.x == 0) && ucs_unlikely(completed >= m_next_report_iter)) { - ucx_perf_cuda_time_t cur_time = ucx_perf_cuda_get_time_ns(); - ucx_perf_cuda_time_t iter_time = (cur_time - m_last_report_time) / - (completed - m_ctx.completed_iters); + if ((threadIdx.x == 0) && ucs_unlikely(completed >= m_next_report_iter)) { + ucx_perf_counter_t delta_iters = completed - m_ctx.completed_iters; + if (delta_iters == 0) { + return; + } + + ucx_perf_cuda_time_t cur_time = ucx_perf_cuda_get_time_ns(); + ucx_perf_cuda_time_t delta_time = cur_time - m_last_report_time; + if (delta_time == 0) { + delta_time = 1; + } + + ucx_perf_cuda_time_t iter_time = ucs_max<ucx_perf_cuda_time_t>(delta_time / delta_iters, 1); m_last_report_time = cur_time; m_ctx.completed_iters = completed; __threadfence_system(); - m_next_report_iter = ucs_min(completed + (m_report_interval_ns / iter_time), - m_max_iters); + ucx_perf_cuda_time_t next_interval = m_report_interval_ns / iter_time; + if (next_interval == 0) { + next_interval = 1; + } + m_next_report_iter = ucs_min(completed + next_interval, m_max_iters); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if ((threadIdx.x == 0) && ucs_unlikely(completed >= m_next_report_iter)) {

ucx_perf_cuda_time_t cur_time = ucx_perf_cuda_get_time_ns();

ucx_perf_cuda_time_t iter_time = (cur_time - m_last_report_time) /

(completed - m_ctx.completed_iters);

m_last_report_time = cur_time;

m_ctx.completed_iters = completed;

__threadfence_system();

m_next_report_iter = ucs_min(completed + (m_report_interval_ns / iter_time),

m_max_iters);

if ((threadIdx.x == 0) && ucs_unlikely(completed >= m_next_report_iter)) {

ucx_perf_counter_t delta_iters = completed - m_ctx.completed_iters;

if (delta_iters == 0) {

return;

}

ucx_perf_cuda_time_t cur_time = ucx_perf_cuda_get_time_ns();

ucx_perf_cuda_time_t delta_time = cur_time - m_last_report_time;

if (delta_time == 0) {

delta_time = 1;

}

ucx_perf_cuda_time_t iter_time = ucs_max<ucx_perf_cuda_time_t>(delta_time / delta_iters, 1);

m_last_report_time = cur_time;

m_ctx.completed_iters = completed;

__threadfence_system();

ucx_perf_cuda_time_t next_interval = m_report_interval_ns / iter_time;

if (next_interval == 0) {

next_interval = 1;

}

m_next_report_iter = ucs_min(completed + next_interval, m_max_iters);

}

🤖 Prompt for AI Agents

In src/tools/perf/cuda/cuda_kernel.cuh around lines 52 to 61, the code can divide by zero when computing per-iteration times because both delta = (completed - m_ctx.completed_iters) and iter_time can be zero; compute and store delta first, if delta == 0 bail out/skip the reporting block to avoid dividing by zero, then compute iter_time and clamp it to at least 1 (e.g., use a max with 1) before using it in any division or in calculating m_next_report_iter so no division by zero occurs.

coderabbitai · 2025-11-06T03:45:24Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

+        if (fc && (m_pending_count < m_size) && (m_pending[index] < m_fc_window)) {
+            req   = nullptr;
+            flags = static_cast<ucp_device_flags_t>(0);
+        } else {
+            req            = &m_requests[index];
+            m_pending_map |= UCS_BIT(index);
+        }


⚠️ Potential issue | 🔴 Critical

Do not hand out a null request under flow control

When get_request() returns req == nullptr, the very next call to ucp_perf_cuda_send_async() passes that pointer straight into ucp_device_put_single/multi/multi_partial, all of which dereference req unconditionally. With device_fc_window > 1, we therefore crash every time this branch fires. Please always return a valid ucp_device_request_t* here and keep the existing bitmap semantics for the FC “prefetch” case.

- if (fc && (m_pending_count < m_size) && (m_pending[index] < m_fc_window)) { - req = nullptr; - flags = static_cast<ucp_device_flags_t>(0); - } else { - req = &m_requests[index]; - m_pending_map |= UCS_BIT(index); - } + req = &m_requests[index]; + if (fc && (m_pending_count < m_size) && (m_pending[index] < m_fc_window)) { + flags = static_cast<ucp_device_flags_t>(0); + } else { + m_pending_map |= UCS_BIT(index); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (fc && (m_pending_count < m_size) && (m_pending[index] < m_fc_window)) {

req = nullptr;

flags = static_cast<ucp_device_flags_t>(0);

} else {

req = &m_requests[index];

m_pending_map |= UCS_BIT(index);

}

req = &m_requests[index];

if (fc && (m_pending_count < m_size) && (m_pending[index] < m_fc_window)) {

flags = static_cast<ucp_device_flags_t>(0);

} else {

m_pending_map |= UCS_BIT(index);

}

coderabbitai · 2025-11-06T03:45:25Z

src/tools/perf/perftest_params.c

+    case 'F':
+        return parse_int(opt_arg, &params->super.device_fc_window,
+                         "device flow control window size", 1, INT_MAX);
    case 'y':


⚠️ Potential issue | 🔴 Critical

Fix the -F parser’s target type.

parse_int() expects an int*, but we are handing it an unsigned*. With -Werror (the default in CI) this is an incompatible pointer type error, so the whole build breaks. Please parse into a local int and assign it back once validation succeeds.

@@ - case 'F': - return parse_int(opt_arg, &params->super.device_fc_window, - "device flow control window size", 1, INT_MAX); + case 'F': { + int fc_window; + + status = parse_int(opt_arg, &fc_window, + "device flow control window size", 1, INT_MAX); + if (status == UCS_OK) { + params->super.device_fc_window = fc_window; + } + return status; + }

🤖 Prompt for AI Agents

In src/tools/perf/perftest_params.c around lines 728 to 731, the -F option passes an unsigned int pointer to parse_int which expects an int*, causing an incompatible pointer type error under -Werror; fix it by parsing into a local int variable (call parse_int with &local_int and the same bounds), check parse_int succeeded, then assign local_int to params->super.device_fc_window (cast/convert to the unsigned field) after validation; ensure you preserve the same bounds and error handling behavior.

ofirfarjun7 · 2025-11-13T07:52:01Z

src/tools/perf/cuda/ucp_cuda_kernel.cu


+    void init_params(const ucx_perf_context_t &perf)
+    {
+        m_params.num_channels = perf.params.device_ep_channel_count;


Maybe just set it in ucp_perf_cuda_params_handler and remove this func.

src/tools/perf/api/libperf.h

yosefe · 2025-11-30T11:10:49Z

src/tools/perf/perftest.c

+    params->super.ucp.send_datatype       = UCP_PERF_DATATYPE_CONTIG;
+    params->super.ucp.recv_datatype       = UCP_PERF_DATATYPE_CONTIG;
+    params->super.ucp.am_hdr_size         = 0;
+    params->super.device_ep_channel_count = 1;


IMO the default should be some large value like UINT_MAX, so by default each thread would use a different channels, since this is also controlled by the NUM_CHANNELS configuration of GDAKI transport

yosefe · 2025-11-30T11:12:59Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

                         ucx_perf_counter_t idx, ucp_device_request_t *req,
                         ucp_device_flags_t flags = UCP_DEVICE_FLAG_NODELAY)
 {
+    const unsigned channel_id = threadIdx.x % params.num_channels;


IMO the channel id should be some random value (or at least have a "mode" for channel where it's generated randomly)
channel_mode = enum { single, random, per-thread }
because in real scenario the "expert" index is "random"

yosefe

@iyastreb can you pls review as well?

yosefe · 2025-12-04T10:14:46Z

src/tools/perf/api/libperf.h

+typedef enum {
+    UCX_PERF_CHANNEL_MODE_SINGLE,    /* Use a single fixed channel ID (0) */
+    UCX_PERF_CHANNEL_MODE_RANDOM,    /* Use random channel ID per operation */
+    UCX_PERF_CHANNEL_MODE_PER_THREAD,/* Use thread ID modulo num_channels */
+    UCX_PERF_CHANNEL_MODE_LAST
+} ucx_perf_channel_mode_t;


yosefe · 2025-12-04T11:06:29Z

src/tools/perf/perftest_params.c

    printf("                    request is sent.\n");
+    printf("     -N <mode>      channel selection mode for device tests (single)\n");
+    printf("                    single     - use a single fixed channel (channel 0, default)\n");
+    printf("                    random     - use random channel per operation\n");


maybe add the seed as part of the random mode string:
random:1245
to not consume another argument letter ('S')

yosefe · 2025-12-04T11:07:15Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

    unsigned reqs_count        = ucs_div_round_up(ctx.max_outstanding,
                                                  ctx.device_fc_window);
    ucp_device_request_t *reqs = &shared_requests[reqs_count * thread_index];
+    curandState rand_state;


maybe add rand_state inside ucp_perf_cuda_request_manager, to not pass another parameter to all functions

iyastreb · 2025-12-05T06:15:41Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

+        m_params.num_threads  = perf.params.device_thread_count;
+        m_params.num_channels = perf.params.device_num_channels;
+        m_params.channel_mode = perf.params.device_channel_mode;
+        m_params.random_seed  = perf.params.random_seed;


The idea was that params represent a request parameters.
Now it's a mixture of static config + request params. Since perf context is already passed to the kernel, what's the point of duplicating config in params?

iyastreb · 2025-12-05T06:23:45Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

+    uint64_t                     *counter_recv;
+};
+
+template<ucs_device_level_t level>


IMO making the entire class a a template is not worth it:

it's needed just for 2 functions. Expanding the scope just makes compilation longer for no reason

The callers code becomes ugly:

ucs_status_t status = req_mgr.template get_request<fc>(req, flags);

What's the rationale for this change?

right, if it was needed just because of adding m_rand_state, it is not worth it (or add it as a pointer)

yosefe · 2025-12-07T05:02:40Z

src/tools/perf/perftest_params.c

    printf("                    request is sent.\n");
+    printf("     -N <mode>      channel selection mode for device tests (single)\n");
+    printf("                    single        - use a single fixed channel (channel 0, default)\n");
+    printf("                    random:<seed> - use random channel per operation with the given seed\n");


Suggested change

printf(" random:<seed> - use random channel per operation with the given seed\n");

printf(" random[:<seed>] - use random channel per operation with optional random seed\n");

yosefe · 2025-12-07T05:04:08Z

src/tools/perf/perftest.c

+    params->super.ucp.am_hdr_size         = 0;
+    params->super.device_num_channels     = UINT_MAX;
+    params->super.device_channel_mode     = UCX_PERF_CHANNEL_MODE_SINGLE;
+    params->super.random_seed             = time(0) ^ getpid();;


Suggested change

params->super.random_seed = time(0) ^ getpid();;

params->super.random_seed = ucs_generate_uuid((uintptr_t)params);

iyastreb · 2025-12-08T07:48:39Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

+    unsigned channel_id;
+
+    switch (ctx.channel_mode) {
+    case UCX_PERF_CHANNEL_MODE_SINGLE:
+        channel_id = 0;
+        break;
+    case UCX_PERF_CHANNEL_MODE_RANDOM:
+        channel_id = curand(rand_state) % ctx.num_channels;
+        break;
+    case UCX_PERF_CHANNEL_MODE_PER_THREAD:
+    default:
+        channel_id = (blockIdx.x *
+                      ucx_perf_cuda_thread_index<level>(ctx.num_threads) +
+                      ucx_perf_cuda_thread_index<level>(threadIdx.x)) %
+                     ctx.num_channels;
+        break;
+    }


This change adds 2 more dependencies to the function, and makes it violating a single responsibility principle.

I propose 3 things:

This function should take only a precomputed channel_id

channel_id should be calculated in request manager. So that manager returns the valid channel_id, not req_mgr.get_rand_state();

Inside the manager we may cache num_threads and num_channels to avoid expensive reads from the shared memory (ctx)

yosefe · 2025-12-09T07:10:48Z

src/tools/perf/perftest_params.c

+        const char *channel_str = getenv("UCX_RC_GDA_NUM_CHANNELS");
+        if (channel_str) {
+            params->super.device_num_channels = atoi(channel_str);
+        }


seems very weird, why is this needed?
we can pass arbitrarily large channel id to the UCP/UCT device APIs, and the transport should anyway do % operation to select the right channel according to the number of QPs

UCP/UCT doesn't do %. Perftest needs to know how many channels created and do % at user-app side.
Maybe add an option for perftest and adjust ucp config in perftest? Or any better suggestions?

UCT should do %.
because from API perspective, there is no function that returns number of channels. So any number should work.

src/tools/perf/perftest.c

yosefe · 2025-12-09T07:25:49Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

+            return (blockIdx.x *
+                    ucx_perf_cuda_thread_index<level>(m_num_threads) +
+                    ucx_perf_cuda_thread_index<level>(threadIdx.x)) %
+                   m_num_channels;


can we do smth like
threadIdx.x + blockIdx.x * blockDim.x
IMO, no need to calculate the modulo by number of channels

then we can remove m_num_threads/m_num_channels member variables

Removed m_num_threads. We may still need ucx_perf_cuda_thread_index<level>(..) to get thread_id, because the perftest might run at warp level.

can we simplify the calculation to be
ucx_perf_cuda_thread_index<level>(threadIdx.x + blockIdx.x * blockDim.x)

yosefe · 2025-12-09T07:28:26Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

+        case UCX_PERF_CHANNEL_MODE_SINGLE:
+            return 0;
+        case UCX_PERF_CHANNEL_MODE_RANDOM:
+            return curand(m_rand_state) % m_num_channels;


maybe

Suggested change

return curand(m_rand_state) % m_num_channels;

return curand(m_rand_state) % (gridDim.x * blockDim.x);

do we really need to save num_channels?
can we calculate ucx_perf_cuda_thread_index<level>(gridDim.x * blockDim.x) ?

yosefe · 2025-12-09T07:31:46Z

src/tools/perf/cuda/ucp_cuda_kernel.cu

+    unsigned thread_index     = ucx_perf_cuda_thread_index<level>(threadIdx.x);
+    unsigned num_threads      = ucx_perf_cuda_thread_index<level>(
+            ctx.num_threads);
+    unsigned global_thread_id = blockIdx.x * num_threads + thread_index;


IMO the global thread id can be calculated in a much simpler way
threadIdx.x + blockIdx.x * blockDim.x
so the helper function ucp_perf_cuda_init_rand_state is not needed

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

TOOLS/DEVICE: support channel id in perftest

84da1e1

jeynmann force-pushed the perf_chan_id branch from 7355ab4 to 84da1e1 Compare November 6, 2025 17:33

ofirfarjun7 reviewed Nov 13, 2025

View reviewed changes

TOOLS/DEVICE: fix according to comments

d4803a0

jeynmann changed the title ~~[DNM][WIP] TOOLS/DEVICE: support channel id in perftest~~ TOOLS/DEVICE: support channel id in perftest Nov 14, 2025

yosefe reviewed Nov 30, 2025

View reviewed changes

JeynmannZ added 3 commits December 4, 2025 05:38

TOOLS/DEVICE: fix according to comments

518731f

TOOLS/DEVICE: fix according to comments

4439cdc

TOOLS/DEVICE: fix according to comments

b3fe6fd

yosefe reviewed Dec 4, 2025

View reviewed changes

JeynmannZ added 2 commits December 4, 2025 19:43

TOOLS/DEVICE: fix according to comments

787cc97

TOOLS/DEVICE: fix according to comments

bac836b

iyastreb reviewed Dec 5, 2025

View reviewed changes

yosefe reviewed Dec 7, 2025

View reviewed changes

JeynmannZ added 3 commits December 8, 2025 03:45

TOOLS/DEVICE: fix code according to comments

6578694

TOOLS/DEVICE: fix code according to comments

5a01df1

TOOLS/DEVICE: fix code according to comments

52ea812

iyastreb reviewed Dec 8, 2025

View reviewed changes

JeynmannZ added 4 commits December 9, 2025 03:42

Merge branch 'master' into perf_chan_id

d6547fc

TOOLS/DEVICE: fix code according to comments

ecb49cc

TOOLS/DEVICE: read num channels from env

57fcf39

TOOLS/DEVICE: fix dup opts

2c0fc11

yosefe reviewed Dec 9, 2025

View reviewed changes

JeynmannZ added 3 commits December 9, 2025 13:08

UCT/DEVICE: fix according to comments

4722cf4

TOOLS/DEVICE: fix according to comments

cf2e146

TOOLS/DEVICE: fix code according to comments

8aa534f

iyastreb approved these changes Dec 10, 2025

View reviewed changes

	printf(" random:<seed> - use random channel per operation with the given seed\n");
	printf(" random[:<seed>] - use random channel per operation with optional random seed\n");

	params->super.random_seed = time(0) ^ getpid();;
	params->super.random_seed = ucs_generate_uuid((uintptr_t)params);

	return curand(m_rand_state) % m_num_channels;
	return curand(m_rand_state) % (gridDim.x * blockDim.x);

TOOLS/DEVICE: support channel id in perftest #10993

Are you sure you want to change the base?

TOOLS/DEVICE: support channel id in perftest #10993

Conversation

jeynmann commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

How?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yosefe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeynmann commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 6, 2025 •

edited

Loading