Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/PROTO: Handle request status correctly and refactor #10473

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ofirfarjun7
Copy link
Contributor

What?

  • Handle request progress status properly.
  • Refactor Code.

Why?

In some flows of send operations we can reach ucs_fatal because status is not handled properly.

How?

Instead of returning unexpected status code we should abort operation and return UCS_OK

@@ -134,8 +134,8 @@ static UCS_F_ALWAYS_INLINE ucs_status_t ucp_am_eager_multi_bcopy_send_func(
packed_size = uct_ep_am_bcopy(uct_ep, UCP_AM_ID_AM_FIRST,
ucp_am_eager_multi_bcopy_pack_args_first,
&pack_ctx, 0);
status = ucp_proto_bcopy_send_func_status(packed_size);
status = ucp_proto_am_handle_user_header_send_status(req, status);
status = ucp_am_handle_user_header_send_status_common(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reproduce the original failure with unit test?
If so it would be helpful to add it

Copy link
Contributor Author

@ofirfarjun7 ofirfarjun7 Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea but not for this PR because for now it's not practical.
To do this we need to add some hint to UCX mpool object that will cause allocation to fail, I don't think this is something this PR suppose to do but we can consider it for the future to test this case and other use cases that involve memory allocation.

Comment on lines +137 to +139
ucp_am_handle_user_header_send_status_common(ucp_request_t *req,
ucs_status_t status, int abort,
int release)
Copy link
Contributor

@evgeny-leksikov evgeny-leksikov Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10452 (comment) reference to initial conversation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants