Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

feat: Add background-task for kernel-pull-progress #477

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

youngjun0627
Copy link

@youngjun0627 youngjun0627 commented Sep 1, 2021

refs backend.ai-issue#227

  • Manager TODOs
    • Spawn a background task when creating a new session, returning the task ID to the client after max_wait seconds.
    • Let the background task relay the kernel_pull_progress events.

About Background-Task

The bgtask contains two asynchronous functions, one of them selects kernel's status from 'kernels' table, the other updates progress. To ensure synchronization, I use Asyncio-lock. Update_progress function updates progress status, reacting to KernelPullprogress Event. And, Progress Reporter is updated by the function. So, manager sends the progress-reporter to client, and, client displays kernel-pull-progress using the reporter.

@youngjun0627 youngjun0627 force-pushed the feature/select-and-update-kernelpullprogress branch 3 times, most recently from 022c866 to c4a8e16 Compare September 2, 2021 16:47
@youngjun0627 youngjun0627 changed the title feat: Select and update kernel-pull-progress feat: Add background-task for kernel-pull-progress Sep 2, 2021
@youngjun0627 youngjun0627 force-pushed the feature/select-and-update-kernelpullprogress branch 3 times, most recently from 9801126 to 6322ca5 Compare September 6, 2021 15:03
@youngjun0627 youngjun0627 force-pushed the feature/select-and-update-kernelpullprogress branch from 6322ca5 to 38c3f51 Compare September 6, 2021 15:10
@codecov
Copy link

codecov bot commented Sep 8, 2021

Codecov Report

Merging #477 (60bfc69) into main (15acec7) will not change coverage.
The diff coverage is n/a.

❗ Current head 60bfc69 differs from pull request most recent head e2f01f4. Consider uploading reports for the commit e2f01f4 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##             main     #477   +/-   ##
=======================================
  Coverage   48.89%   48.89%           
=======================================
  Files          54       54           
  Lines        8908     8908           
=======================================
  Hits         4356     4356           
  Misses       4552     4552           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15acec7...e2f01f4. Read the comment docs.

@youngjun0627 youngjun0627 force-pushed the feature/select-and-update-kernelpullprogress branch from 4b5c9cd to b5deb3a Compare September 8, 2021 06:47

task_id = await root_ctx.background_task_manager.start(
kernelpullprogress,
name='kernel-pull-progress'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name='kernel-pull-progress'
name='kernel-pull-progress',

@youngjun0627
Copy link
Author

I have finished to fix the code reflected your feedback.
And, following images show difference between when max_wait is 0 or not.

  • Value of max-wait is equal 0
    image

  • Set max-wait seconds
    image

Also, the code in client-py is fixed. Formally, I don't check 'background_task'. After fixed, I check 'background_task'. If true, kernel-pull-progress is displayed to client during pulling(downloading image). If not, client can't see the progress and wait it.

@achimnol
Copy link
Member

achimnol commented Sep 9, 2021

I have finished to fix the code reflected your feedback.
And, following images show difference between when max_wait is 0 or not.

  • Value of max-wait is equal 0
    image
  • Set max-wait seconds
    image

Also, the code in client-py is fixed. Formally, I don't check 'background_task'. After fixed, I check 'background_task'. If true, kernel-pull-progress is displayed to client during pulling(downloading image). If not, client can't see the progress and wait it.

Just a note: Formally -> Formerly / Previously
This made me confused for a moment. ;)

Copy link
Member

@achimnol achimnol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor fixes please!

@youngjun0627 youngjun0627 force-pushed the feature/select-and-update-kernelpullprogress branch from 05f19e0 to 45026a2 Compare September 9, 2021 08:33
@youngjun0627 youngjun0627 force-pushed the feature/select-and-update-kernelpullprogress branch from 45026a2 to e150d63 Compare September 9, 2021 08:38
…nto feature/select-and-update-kernelpullprogress
@@ -513,7 +515,62 @@ async def _create(request: web.Request, params: Any) -> web.Response:
resp['servicePorts'] = []
resp['created'] = True

if not params['enqueue_only']:
async def monitor_kernel_preparation(reporter: ProgressReporter) -> None:
Copy link
Member

@achimnol achimnol Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we extract this as a separate function for better testability? (Let's avoid too much nested functions!)

@youngjun0627
Copy link
Author

I have no idea how to send some data(e.g, kernel_id, root_ctx, ...) to this as separate function . So, I change this to callable class and separate this from '_create' function.

@achimnol achimnol added this to the 21.09 milestone Oct 30, 2021
@achimnol
Copy link
Member

Callable class is generally considered an anti-pattern unless you are writing a kind of low-level library.
Just pass any required information as additional arguments.

@achimnol
Copy link
Member

achimnol commented Oct 31, 2021

To handle multi-node scenarios (cluster sessions) where multiple agents generate the same KernelPullProgressEvent with their own progresses, I think there should be a logic to aggregate them in the session-level and calculate the "total" progress of the multiple agents. Maybe this could be handled in the client-side, by observing multiple bgtasks concurrently.

@achimnol
Copy link
Member

To handle multi-node scenarios (cluster sessions) where multiple agents generate the same KernelPullProgressEvent with their own progresses, I think there should be a logic to aggregate them in the session-level and calculate the "total" progress of the multiple agents. Maybe this could be handled in the client-side, by observing multiple bgtasks concurrently.

To implement this in the client-side, we need to:

  • manager: Return the multiple kernel IDs and bgtask IDs for individual kernels from api.session._create()
  • client-py: Open multiple SSE connections to the bgtask IDs and show the per-kernel progress like multi-layer progress view in Docker CLI's pull command implementation, or aggregate the per-kernel progresses into a single progress bar.

To implement this in the manager-side:

  • manager: Return the single bgtask ID just like now.
  • manager: In the bgtask, aggregate the multiple kernel pull progress events that belongs to the current session (which may have multiple kernels) and return the aggregated result.

Personally, I think it's better to implement in the client side, as depending on the client-side implementation, we would have runtime options whether to expose more per-kernel details or show the overall progress only.

@achimnol achimnol self-assigned this Nov 25, 2021
@youngjun0627
Copy link
Author

youngjun0627 commented Nov 27, 2021

Sorry for the late reply.
I have a question.

  • command 1 : backend.ai run --cluster-mode multi-node --cluster-size 2 'print("hello")' python
  • command 2 : backend.ai run -c 'import os; print("Hello world, {}".format(os.environ["CASENO"]))' -r cpu=1 -r mem=320m -e 'CASENO=$X' --env-range=X=case:1,2 python
    Which one does multi-node scenarios (cluster sessions) mean, command 1 or command 2?

And, if we implement the multi scenario in the client-side, is there no need to fix it anymore?

@mindgitrwx mindgitrwx self-assigned this Dec 15, 2021
@achimnol achimnol modified the milestones: 21.09, 22.03 Mar 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants