Skip to content

Conversation

@tphung3
Copy link
Contributor

@tphung3 tphung3 commented Dec 16, 2024

Description

This PR introduces the function context feature in TaskVine to the TaskVineExecutor. In short, a traditional function can now specify its computational context to be shared across multiple invocations of the same function, allowing drastic improvements in execution performance.

For example, machine learning models, especially LLMs, have a large overhead of model creation to do one inference. Instead of coupling model creation and inferences in the same function, a user now can specify the model creation as the context of the actual inference function, allowing the de-duplication of the model creation cost.

Helpful blog: https://cclnd.blogspot.com/2025/10/reducing-overhead-of-llm-integrated.html.

Tests are added to make sure the feature works as intented.

Changed Behaviour

TaskVineExecutor now has a new feature allowing functions to specify computational contexts to be shared.

Type of change

  • New feature

while written < len(serialized_obj):
written += f_out.write(serialized_obj[written:])

def _cloudpickle_serialize_object_to_file(self, path, obj):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we talked about this somewhere before but I can't remember where: you should be using the parsl serialization libraries not cloudpickle unless you have a specific reason that needs different serialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object I serialize is a list containing a function and other Python objects. https://github.com/Parsl/parsl/pull/3724/files#diff-c5ce2bce42f707d31639e986d8fea5c00d31b5eead8fa510f7fe7e3181e67ccfR458-R461

Because it is a list, Parsl serialize uses methods_for_data to serialize it which eventually uses pickle, and this can't serialize a function by value. So I'm using cloudpickle serialization only for this case. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant to happen is that parsl.serialize tries pickle and if that fails it tries dill -- and dill does similar serialization of functions as cloudpickle. I just tried swapping these cloudpickle references for parsl.serialize to validate that.

If you're seeing instances where this doesn't work, that's a problem with parsl serialization in general that I'm interested in distinct from taskvine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. From my point of view, the problem is that the pickle doesn't fail at the process of serialization (the serialization doesn't return an error), but the output of this serialization process is unusable for TaskVine. That is, this line doesn't raise an exception, but result is unusable for TaskVine. Had it raised an exception and dill been tried next then parsl.serialize probably would have worked with my use case.

In my case the function was serialized "successfully" by pickle via parsl.serialize so I had to drop it for cloudpickle (reference versus value as you pointed out).

Copy link
Contributor Author

@tphung3 tphung3 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to brainstorm, would adding a parameter to parsl.serialize that allows users to choose the serialization method solve the problem?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at that code last week and its possible that parsl.serialize would be fine using only dill and not pickle: trying out multiple options comes from a time when all the options were very poorly understood and we just kept trying loads of random methods hoping one of them would be magic.

But I would like to see the failing example so I can understand why its failing, because I don't want to be fiddling around hoping for magic still: the test suite passed ok using parsl.serialize when I tried a week or so ago.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the root of the problem, which is pickle and the importability of functions.

I tried using parsl.serialize-deserialize instead of cloudpickle and run pytest parsl/tests/ -k "not cleannet and taskvine" --config parsl/tests/configs/taskvine_ex.py --random-order --durations 10 locally, and the tests passed, but for the wrong reason.

I inspected the content of the serialization output and confirmed that pickle did the serialization. As you know, pickle serializes functions by reference or "fully qualified name" (reference here, in the "Note that functions" bit). When pickle deserializes these functions, it tries to import them by these names. The names of the test functions are fully importable because they are in the parsl directory (e.g., parsl.tests.test_vineex.test_function_context.f_context) and the TaskVine worker and library processes share the directory tree. This importability make pickle pass regular test cases. So the magic is purely because all relevant processes are on the same machine (or use a shared filesystem), giving pickle an easy time to import functions by names at the deserialization time.

On another local test setup that I have where test functions are defined in a Python script (so they have their names like "__main__.f_context"), they are serialized with the "__main__" prefix in their names, so when they are reconstructed elsewhere, the other process can't find the functions in their main module, causing an error like this AttributeError: Can't get attribute 'f_serverless_context' on <module '__main__' from '/tmp/worker-1000-8577/libr.51/library_code.py'>.

So the moral of the story is for pickle to work, it needs functions to have importable names. cloudpickle and maybe dill sidesteps this requirement as they serialize by value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the __main__ module is one (maybe the only?) place where pickle doesn't behave as a strict subset of dill.

I'm kinda inclined to change the parsl.serialize behaviour to not do pickle first - because behaviour like you report above with __main__ is usually wrong anyway, as far as a Parsl-style distributed environment.

@benclifford
Copy link
Collaborator

This runs serverless functions several times faster than current Parsl master, when I measure with parsl-perf. 722 tasks per second vs 240 tasks per second on a 10000 task batch. I'm not clear why though.

@tphung3
Copy link
Contributor Author

tphung3 commented Nov 20, 2025

This runs serverless functions several times faster than current Parsl master, when I measure with parsl-perf. 722 tasks per second vs 240 tasks per second on a 10000 task batch. I'm not clear why though.

This bypasses the overhead from run_parsl_function, and the library hosts a given function in its address space on the remote node. So now a function is serialized, shipped, and deserialized to the remote node once and invoc'ed multiple times, instead of one serialization/deserialization per invocation.

https://github.com/Parsl/parsl/pull/3724/files#diff-394c24a1ea1b5e8b91de1f0725846f311d12ed8ef0dd496360335078855b72acL288-R336

This also adds some caching of serialization cost as well.

https://github.com/Parsl/parsl/pull/3724/files#diff-c5ce2bce42f707d31639e986d8fea5c00d31b5eead8fa510f7fe7e3181e67ccfL413-R476

@tphung3 tphung3 requested a review from benclifford November 20, 2025 17:47
Copilot AI review requested due to automatic review settings December 8, 2025 21:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a function context feature to the TaskVineExecutor, enabling functions to specify computational contexts that are shared across multiple invocations. This significantly reduces overhead for operations like machine learning model initialization by separating one-time setup (context creation) from repeated execution (inference calls). The feature is implemented in serverless execution mode, with one library per function storing the shared context.

Key changes:

  • Added function context support in serverless execution mode with context serialization, input file handling, and variable loading utilities
  • Modified TaskVineExecutor to deduplicate function serialization and manage per-function libraries
  • Enhanced test coverage with parametrized tests for the new feature

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
parsl/tests/test_vineex/test_function_context.py New test file validating function context computation with single and multiple tasks
parsl/tests/configs/taskvine_ex.py Updated test config to enable shared filesystem and use factory worker launch method
parsl/executors/taskvine/utils.py Added function context parameters to ParslTaskToVine and new load_variable_in_serverless helper
parsl/executors/taskvine/manager.py Implemented per-function library creation with context support and double serialization handling
parsl/executors/taskvine/executor.py Added function context file handling, deduplication logic, and staging directory options

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tphung3
Copy link
Contributor Author

tphung3 commented Dec 9, 2025

Copilot PR review is enabled by default on my account, sorry for the unneeded clutter :)

@tphung3
Copy link
Contributor Author

tphung3 commented Dec 9, 2025

The new test is being ignored at the moment and only ThreadPoolExecutor is testing it.
For example, pytest parsl/tests/ -k "not cleannet" --config parsl/tests/configs/taskvine_ex.py --random-order --durations 10 says this. Removing @pytest.mark.local causes other Github actions to run this test and fails. Do you have any suggestions?

@tphung3 tphung3 requested a review from benclifford December 9, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants