Skip to content

[Feature Request] Sample request: ProcessPoolExecutor for workflow worker multiprocessing #255

@cretz

Description

@cretz

Describe the solution you'd like

Workflow tasks are CPU bound and therefore effectively can't run in parallel in Python. Activities on the other hand often do IO and therefore can yield to other threads to do Python work.

While potential larger-effort improvements are being discussed for the SDK itself, one of the most obvious ways to scale workflow workers is to use multiprocessing via ProcessPoolExecutor. Specifically this sample should:

  • Have at lease one workflow defined and one activity the workflow calls defined
  • Have the normal parent worker process start and run an activity-only worker as normal
    • Comment that most users run activity workers in a separate process
  • Have the parent worker process use ProcessPoolExecutor to run many workflow-only workers
    • Make sure that the first line of code in the multiprocessing thing is Runtime.set_default(Runtime(), error_if_already_set=False). Alternatively, do not let the outer parent set a default runtime (even lazily) by only using explicit runtimes everywhere
    • Make sure the worker options are set for small workflow task processing (e.g. maybe only 1 concurrent workflow task, and maybe only 1 poller, would have to investigate how it affects sticky)

Use the README to explain Python limitations and why this approach may help people

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions