Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shutdown API to JDS along with task handler #1376

Conversation

Shourya742
Copy link
Contributor

@Shourya742 Shourya742 commented Jan 22, 2025

Partially Closes: #1320

This PR introduces a shutdown API, similar to the implementation in JDC and TProxy. Additionally, to facilitate a graceful shutdown, we have added a task handler that manages the lifecycle of all tasks spawned within the start method.

Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 9.52381% with 19 lines in your changes missing coverage. Please review.

Project coverage is 18.81%. Comparing base (0f6d89b) to head (bfa7b81).
Report is 362 commits behind head on main.

Files with missing lines Patch % Lines
roles/jd-server/src/lib/mod.rs 9.52% 19 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1376      +/-   ##
==========================================
- Coverage   19.09%   18.81%   -0.28%     
==========================================
  Files         166      166              
  Lines       11062    11235     +173     
==========================================
+ Hits         2112     2114       +2     
- Misses       8950     9121     +171     
Flag Coverage Δ
binary_codec_sv2-coverage 0.00% <ø> (ø)
binary_serde_sv2-coverage 3.55% <ø> (ø)
binary_sv2-coverage 5.34% <ø> (ø)
bip32_derivation-coverage 0.00% <ø> (ø)
buffer_sv2-coverage 25.02% <ø> (ø)
codec_sv2-coverage 0.01% <ø> (ø)
common_messages_sv2-coverage 0.13% <ø> (ø)
const_sv2-coverage 0.00% <ø> (ø)
error_handling-coverage 0.00% <ø> (ø)
framing_sv2-coverage 0.28% <ø> (ø)
jd_client-coverage 0.00% <ø> (ø)
jd_server-coverage 6.63% <9.52%> (-1.16%) ⬇️
job_declaration_sv2-coverage 0.00% <ø> (ø)
key-utils-coverage 2.39% <ø> (ø)
mining-coverage 2.44% <ø> (ø)
mining_device-coverage 0.00% <ø> (ø)
mining_proxy_sv2-coverage 0.70% <ø> (ø)
noise_sv2-coverage 4.44% <ø> (ø)
pool_sv2-coverage 2.05% <ø> (ø)
protocols 24.57% <ø> (ø)
roles 6.29% <9.52%> (-0.27%) ⬇️
roles_logic_sv2-coverage 7.93% <ø> (ø)
sv2_ffi-coverage 0.00% <ø> (ø)
template_distribution_sv2-coverage 0.00% <ø> (ø)
translator_sv2-coverage 9.60% <ø> (ø)
utils 25.13% <ø> (ø)
v1-coverage 2.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@jbesraa jbesraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that bfa7b81 is somehow over-enginnering or is a result of some code debt. I would prefer for us to move all code to tokio first and then start handling async tasks forced shutdown. If this is a blocker for any role for "shutdown" function, I would wait with that until then.

We might endup eventually with the same design, but we would be more confident with it if we had all networking setup is handled first.

@Shourya742
Copy link
Contributor Author

My gut feeling is that bfa7b81 is somehow over-enginnering or is a result of some code debt. I would prefer for us to move all code to tokio first and then start handling async tasks forced shutdown. If this is a blocker for any role for "shutdown" function, I would wait with that until then.

We might endup eventually with the same design, but we would be more confident with it if we had all networking setup is handled first.

Yes, I agree this adds a significant amount of code and feels abit of overengineering, particularly for creating a construct to manage the lifecycle of child tasks. However, during the roles refactor, we should aim to design a mechanism that simplifies task lifecycle management as much as possible.That said, it’s critical to merge this now because, without it, shutting down the start method within a different runtime would leave child tasks lingering in the executor. These tasks would continue consuming CPU cycles, holding resources such as ports, and potentially causing performance regressions. In the future, we can work on creating a cleaner and more robust abstraction for managing task lifecycles. For now, though, this approach is necessary to ensure proper shutdown behavior for the APIs and to prevent resource leaks.

@jbesraa
Copy link
Contributor

jbesraa commented Jan 22, 2025

Dunno really. I understand the urgency but in the same time we lived with this for a bit now. Lets see what others think.

@plebhash plebhash self-requested a review January 22, 2025 17:04
@GitGab19
Copy link
Collaborator

Do we still need this PR?

Don't we have a way to shutdown JDS?

@Shourya742
Copy link
Contributor Author

Do we still need this PR?

Don't we have a way to shutdown JDS?

We dont..

@GitGab19
Copy link
Collaborator

But it seems to me that we were able to migrate all tests without needing this.

Shouldn't we close this for now and open specific issues when we are gonna need them?

@Shourya742
Copy link
Contributor Author

But it seems to me that we were able to migrate all tests without needing this.

Shouldn't we close this for now and open specific issues when we are gonna need them?

Yes that also sounds good to me, considering we need to refactor how we currently manage lifecycle of our task. Even the shutdown method what we have in other roles are pretty suboptimal in my opinion, so those needs to be considered as well.

@Shourya742 Shourya742 closed this Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Killing a running process(role) in integration tests
3 participants