Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/RKEY: Acquire context lock when calling ucp_rkey_pack_memh #10462

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

iyastreb
Copy link
Contributor

@iyastreb iyastreb commented Jan 29, 2025

What?

UCP memh objects are shared between multiple workers, and therefore mutable operations must be protected by the same mutex. Currently we acquire context->mt_lock when calling rkey_pack from user application, but do it without locking when packing UCP requests internally.

Why?

UCT memh caches rkeys created on first mkey_pack -> concurrent calls may cause memory leak -> must be protected by the same mutex (context->mt_lock)

How?

Added new MT mode for context: UCP_MT_TYPE_WORKER_ASYNC.
This new mode is meaningful only in MULTI_THREAD_WORKER case, when mt-single context is used by a single mt-shared worker. In this case the worker progress flow is already protected by worker mutex, and we don't need to lock inside that flow. This is to protect certain API calls that can be triggered from the user thread without holding a worker mutex. Essentially this context mutex inherits a worker mutex.

Added critical sections where appropriate (ucp_rkey_pack_memh)
Added unit test reproducing the memory leak when lock is not used

@@ -367,16 +368,20 @@ ucp_proto_request_pack_rkey(ucp_request_t *req, ucp_md_map_t md_map,
ucp_memh_disable_gva(memh, md_map);
}

/* TODO: context lock is not scalable. Consider fine-grained lock per memh,
* immutable memh with rkey cache, RCU/COW */
UCP_THREAD_CS_ENTER(&context->mt_lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move locks to ucp_rkey_pack_memh? or create a wrapper which would use the lock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but finally decided not to go this way for 2 reasons:

  1. ucp_rkey_pack_memh is also called from ucp_memh_pack->ucp_memh_pack_internal->ucp_memh_do_pack, where lock is already taken in ucp_memh_pack_internal with some extra scope.
    Since we don't want to have reentrant lock (currently not supported by mt_lock pthread impl), we still need to keep the ucp_rkey_pack_memh without lock acquired
  2. That would be just a second function with a large argument list, the only purpose of which is to take a lock.. so I thought that adding explicit locks is actually better approach

src/ucp/proto/proto_common.inl Show resolved Hide resolved
Comment on lines 371 to 372
/* TODO: context lock is not scalable. Consider fine-grained lock per memh,
* immutable memh with rkey cache, RCU/COW */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this comment is redundant since immutable memh probably not possible because of the need to extend the registration md_map

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there is a way to determine when the memh is fully constructed? In cases of repeated operations, this should always be true except for the first operation. If so, maybe we could skip taking the lock for pack? md_map we modify from ucp_memh_get_slow() under the lock.

Copy link
Contributor Author

@iyastreb iyastreb Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good question(s)..

Regarding immutable memh - I believe it's possible, let me explain the idea, it's not that trivial.
Basically the idea is to have a sort of MVCC. Let's say, we have a mutable ucp_memh like we have today, and some immutable ucp_memh_view, that we can use during the entire transaction (from ucp_memh_get till ucp_memh_put). Essentially this view is a copy of master, but it does not reflect the changes made within a transaction:

  • ucp_memh_get
    takes lock, returns a valid immutable view to the caller

  • during the "transaction" we can safely use view without acquiring locks. Maybe we still need a double-check locking when calling mkey_pack, but this will be one-time action, not in the fast-path

  • ucp_memh_put
    takes lock, checks whether view is still valid, destroys it otherwise

BTW, for this purpose we can adapt/reuse the concept of memory window.
If we assume that the only change that can happen with ucp_memh is extension, then this must work

determine when the memh is fully constructed?

memh is "almost" immutable, but still can be extended with extra registrations, which is a very rare operation. I'm not aware of approach to make it thread-safe without some form of concurrency control:

  1. We either acquire a mutex (coarse-grained or fine-grained)
  2. Use some form of RCU/COW - but this requires memory reclamation mechanism & reference counting
  3. Make the object immutable / MVCC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we can assume MD mkey_pack is thread safe. so IMO we can skip local for pack only if we have cache for packed rkey_buffer per memh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we can assume MD mkey_pack is thread safe

I'm ok with this statement, that means we don't want fine grained lock solution

IMO we can skip local for pack only if we have cache for packed rkey_buffer per memh

Yes, exactly, this is what I'm actually proposing with immutable memh view/window/derived/whatever
If we have a rkey_buffer cache per memh - that can work without mutex acquisition in hot-path (besides the first time when cache is NULL). BUT this approach only works if memh does never change after cache is created for the first time. If memh might be modified/extended afterwards - then cache approach only works with lock - and this is not good. That's the whole point.

And actually derived memh can be exactly this immutable view of UCP memh, and we can benefit from it for invalidation and packing purposes.

src/ucp/proto/proto_common.inl Outdated Show resolved Hide resolved
src/ucp/rndv/rndv.c Show resolved Hide resolved
src/ucp/rndv/rndv_rtr.c Show resolved Hide resolved
test/gtest/ucp/test_ucp_rma_mt.cc Show resolved Hide resolved
yosefe
yosefe previously approved these changes Jan 30, 2025
brminich
brminich previously approved these changes Jan 30, 2025
@yosefe
Copy link
Contributor

yosefe commented Jan 30, 2025

@iyastreb pls squash

@iyastreb iyastreb force-pushed the ucp/rkey-pack-memh-lock branch from 750c69b to 8c692ca Compare January 30, 2025 13:27
@yosefe yosefe enabled auto-merge January 31, 2025 07:48
@iyastreb iyastreb disabled auto-merge January 31, 2025 08:04
@iyastreb iyastreb added the WIP-DNM Work in progress / Do not review label Jan 31, 2025
@iyastreb
Copy link
Contributor Author

@yosefe
This MT test fails on BF3 devices, I'm investigating, please DNM yet

@iyastreb
Copy link
Contributor Author

Actually this test reveals the real issue.
Basically there are 2 kinds of MT support in UCX:

        MULTI_THREAD_CONTEXT, /* workers are single-threaded, context is mt-shared */
context->mt_lock is a central lock, it does the job correctly

        MULTI_THREAD_WORKER   /* workers are multi-threaded, context is mt-single */
Here the expectation is that all the operations are protected implicitly by the worker mutex.
context->mt_lock is no-op

So the issue happens with MULTI_THREAD_WORKER, because we have an API that can be called directly by user without taking a worker mutex: ucp_rkey_pack.

I also need to update the test to mimic worker behavior

@iyastreb iyastreb dismissed stale reviews from yosefe and brminich via 8e5530a February 4, 2025 15:22
@iyastreb iyastreb removed the WIP-DNM Work in progress / Do not review label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants