Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dummy memory key from TCP, CMA, Cuda #10470

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yosefe
Copy link
Contributor

@yosefe yosefe commented Feb 2, 2025

Why

  • Reduce size of rkey and remove stub functions
  • Make UCP more consistent - don't require registration if not needed

@yosefe yosefe force-pushed the topic/uct-remove-dummy-mkey branch 7 times, most recently from 2ddcf14 to 2446293 Compare February 3, 2025 07:24
@yosefe yosefe added the WIP-DNM Work in progress / Do not review label Feb 3, 2025
@@ -628,8 +619,8 @@ UCS_TEST_P(test_ucp_proto_mock_tcp, am_send_1_lane)
check_ep_config(sender(), {
{0, 8184, "short", "tcp/mock"},
{8185, 65528, "zero-copy", "tcp/mock"},
{65529, 366985, "multi-frag zero-copy", "tcp/mock"},
{366986, INF, "rendezvous zero-copy fenced write to remote", "tcp/mock"},
{65529, 366864, "multi-frag zero-copy", "tcp/mock"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering why this has changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe because we don't need registration anymore. though that perf estimation does not seem correct since we do rcache lookup anyway.

@yosefe yosefe force-pushed the topic/uct-remove-dummy-mkey branch 2 times, most recently from 3fdd934 to 15eb94c Compare February 4, 2025 07:57
@yosefe yosefe removed the WIP-DNM Work in progress / Do not review label Feb 4, 2025
src/uct/base/uct_mem.c Outdated Show resolved Hide resolved
src/uct/ib/base/ib_md.c Outdated Show resolved Hide resolved
src/uct/ib/mlx5/gga/gga_mlx5.c Outdated Show resolved Hide resolved
iyastreb
iyastreb previously approved these changes Feb 4, 2025
@@ -2563,6 +2563,31 @@ void ucp_memory_detect_slowpath(ucp_context_h context, const void *address,
ucs_memory_info_set_host(mem_info);
}

void ucp_context_memaccess_tl_bitmap(ucp_context_t *context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, for consistency:

Suggested change
void ucp_context_memaccess_tl_bitmap(ucp_context_t *context,
void ucp_context_memaccess_tl_bitmap(ucp_context_h context,

void
ucp_context_dev_tl_bitmap(ucp_context_h context, const char *dev_name,
ucp_tl_bitmap_t *tl_bitmap);
void ucp_context_memaccess_tl_bitmap(ucp_context_t *context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void ucp_context_memaccess_tl_bitmap(ucp_context_t *context,
void ucp_context_memaccess_tl_bitmap(ucp_context_h context,

@@ -877,6 +873,17 @@ static void ucp_wireup_unset_tl_by_md(const ucp_wireup_select_params_t *sparams,
}
}

static void ucp_wireup_memaccess_bitmap(ucp_context_t *context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static void ucp_wireup_memaccess_bitmap(ucp_context_t *context,
static void ucp_wireup_memaccess_bitmap(ucp_context_h context,

rakhmets
rakhmets previously approved these changes Feb 5, 2025
@yosefe yosefe force-pushed the topic/uct-remove-dummy-mkey branch from 14c9dce to 03d591c Compare February 5, 2025 16:33
Comment on lines 3715 to 3716
* compared to attr->transports.entry_size before the field is
* updated. If the field's ending offset is greater than the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems unrelated

src/ucp/wireup/select.c Show resolved Hide resolved
@@ -900,13 +908,15 @@ static UCS_F_NOINLINE ucs_status_t ucp_wireup_add_memaccess_lanes(
/* Select best transport which can reach registered memory */
snprintf(title, sizeof(title), criteria->title, "registered");
mem_criteria.title = title;
mem_criteria.local_md_flags = UCT_MD_FLAG_REG | criteria->local_md_flags;
mem_criteria.local_md_flags = criteria->local_md_flags;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like criteria->local_md_flags are not set by any of the calling functions and looks redundant here

src/ucp/wireup/select.c Show resolved Hide resolved
@@ -1315,7 +1315,7 @@ ucs_status_t uct_ib_md_open_common(uct_ib_md_t *md,

/* Check for GPU-direct support */
if (md_config->enable_gpudirect_rdma != UCS_NO) {
/* Check peer memory driver is loaded, different driver versions use
/* Check peer memory driver is loaded, different driver versions use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated

@@ -1326,7 +1326,6 @@ ucs_status_t uct_ib_md_open_common(uct_ib_md_t *md,
uct_ib_check_gpudirect_driver(
md, "/sys/module/nv_peer_mem/version",
UCS_MEMORY_TYPE_CUDA);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated

@@ -3208,7 +3209,7 @@ static void uct_ib_mlx5dv_check_dc(uct_ib_device_t *dev)

static uct_ib_md_ops_t uct_ib_mlx5_md_ops;

static ucs_status_t
static ucs_status_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated

test/gtest/ucp/test_ucp_mmap.cc Show resolved Hide resolved
test/gtest/ucp/test_ucp_proto_mock.cc Show resolved Hide resolved
Comment on lines +207 to +214

/* Some transports may send larger buffers or can be slower, so limit
also by time */
if (ucs_get_accurate_time() > (start_time + timeout)) {
UCS_TEST_MESSAGE << "Stopped after " << i
<< " iterations due to time limit";
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it unrelated fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is needed because cuda_copy tests are being run with up to 1GB buffer now and take much more time to complete

}
};

UCS_TEST_P(test_ucp_proto_mock_gpu, cuda_managed_ppln_host_frag,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, is it possible to add test for the changes in this PR? I.e. to check that cma can still be selected for rndv?
AFAIU, this test flow is not affected by the changes in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a CMA test:

class test_ucp_proto_mock_cma : public test_ucp_proto_mock {

- Update lane selection to not require REG flag when memh/rkey are not
  needed
- Remove memory registration support from cma, tcp
- Update mock test - TCP rndv is now selected earlier
@yosefe yosefe force-pushed the topic/uct-remove-dummy-mkey branch from 12c907a to 0813925 Compare February 13, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants