Feature Proposal: Extend Python UDF operators to allow users to modify parameters without editing the code #4154

zyratlo · 2026-01-13T04:40:53Z

zyratlo
Jan 13, 2026

Problem Summary

Currently, modifying values used inside a Python UDF (for example, hyperparameters in an ML training algorithm) requires users to directly modify the UDF's source code. This presents a barrier for non-coding users, who often work with workflows created by other developers and may not feel comfortable editing code. We have received this feedback from many collaborators already.

It would be beneficial to allow workflow developers to expose configurable parameters outside of the UDF source code. This would enable non-technical users to adjust these values through a separate interface without needing to modify or understand the source code.

Possible Solutions

Introduce a dedicated input operator for users to provide values, which are passed into the UDF operator
Extend the UDF properties panel to include a space for users to provide parameter values

chenlica · 2026-01-13T04:44:06Z

chenlica
Jan 13, 2026
Collaborator

@zyratlo Please use a real example with screenshots to show the first solution.

1 reply

zyratlo Jan 13, 2026
Author

Sure. Here is a simple workflow to demonstrate how we can expose parameters using a Text Input Operator.

The Python UDF takes a CSV file as input and performs an ML training algorithm. In this example, the developer has chosen to make the test_size and random_seed parameters configurable. To support this, an additional Text Input operator has been added as input to the UDF. The source code of the UDF must take into account both input ports, which is shown in the below screenshot:

In the code, the if block executes on port 0, which is the parameter input. This code sets the class variable parameters to hold the values for test_size and random_seed. The else block will execute on any other port, so it runs with the CSV file input and will use the inputted values for the two parameters.

To ensure the parameters are processed first before the rest of the code, we ensure that the File Scan Port has a dependency on port 0 (since input ports are numbered top-down beginning at 0). This will mean that the File Scan Port will not be processed until the Parameter Port has finished:

Now, when the non-coding user wants to modify the parameters, they can simply edit the text in the Text Input operator instead of directly changing the UDF code.

In this example, the developer assumes the formatting and order of the parameters will be consistent. However, it is easy to convert this to be dynamic parsing instead.

aglinxinyuan · 2026-01-13T04:46:52Z

aglinxinyuan
Jan 13, 2026
Collaborator

For the first option, we can solve this by composing two existing operators. A Text Input operator is used to capture parameter values as strings (one per line, or in any format the workflow author defines). Its output is then fed into a Python UDF operator, which has two input ports:

Port 0 receives the parameter string from the Text Input operator
Port 1 receives the actual data from upstream operators

Port 1 is configured to depend on Port 0, so the parameters are always available before the main UDF logic runs. This lets us reuse the current execution model without introducing any new engine features. If needed, we can wrap the Text Input + UDF pair into a macro so that it appears as a single operator (similar to what we did for HashJoin).

The second option—adding parameter fields directly to the Python UDF operator—would require significantly more work. Today the engine treats the UDF as a raw Python string, so we would need new plumbing to pass values from the UI into the runtime and then inject or rewrite them into that string in a safe and predictable way.

Given that, I recommend going with the first approach. It is simpler, avoids changes to the UDF execution model, and builds on mechanisms we already have.

0 replies

chenlica · 2026-01-13T05:24:34Z

chenlica
Jan 13, 2026
Collaborator

I also believe option 1 is better. Here are more insights from the perspective of the engine.

In the Amber engine, the UDF's Python code is sent from JVM to PVM as a string to be executed on the PVM.

In option 1, the user-provided parameters are passed from the top-left operator as two data tuples to the Python UDF operator. In the UDF, the Python code for port 0, which runs on PVM, parses the data tuples sent from JVM to retrieve the two parameters. The Python code for port 1, which also runs on the same PVM, uses the two parameters in its computation.

In option 2, the parameters are provided by the user through the frontend fields of the Python UDF (as part of its property panel). For these parameters to be obtained by the Python code on PVM, we need to send them from JVM to PVM as a control message, which is doable but not easy. We could avoid sending a control message by using a "Python code template" on JVM and plugging in the user's parameters, but that's NOT how our engine generates the Python code.

Therefore, option 1 is cleaner and easier to adopt.

0 replies

kunwp1 · 2026-01-14T01:07:28Z

kunwp1
Jan 14, 2026
Collaborator

The current R UDF doesn't support multiple ports, and we will have a separate effort to add this support. The idea should be the same as option 1.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Proposal: Extend Python UDF operators to allow users to modify parameters without editing the code #4154

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feature Proposal: Extend Python UDF operators to allow users to modify parameters without editing the code #4154

Uh oh!

Uh oh!

zyratlo Jan 13, 2026

Problem Summary

Possible Solutions

Replies: 4 comments · 1 reply

Uh oh!

chenlica Jan 13, 2026 Collaborator

Uh oh!

zyratlo Jan 13, 2026 Author

Uh oh!

aglinxinyuan Jan 13, 2026 Collaborator

Uh oh!

chenlica Jan 13, 2026 Collaborator

Uh oh!

kunwp1 Jan 14, 2026 Collaborator

zyratlo
Jan 13, 2026

Replies: 4 comments 1 reply

chenlica
Jan 13, 2026
Collaborator

zyratlo Jan 13, 2026
Author

aglinxinyuan
Jan 13, 2026
Collaborator

chenlica
Jan 13, 2026
Collaborator

kunwp1
Jan 14, 2026
Collaborator