Replies: 4 comments 1 reply
-
|
@zyratlo Please use a real example with screenshots to show the first solution. |
Beta Was this translation helpful? Give feedback.
-
|
For the first option, we can solve this by composing two existing operators. A Text Input operator is used to capture parameter values as strings (one per line, or in any format the workflow author defines). Its output is then fed into a Python UDF operator, which has two input ports:
Port 1 is configured to depend on Port 0, so the parameters are always available before the main UDF logic runs. This lets us reuse the current execution model without introducing any new engine features. If needed, we can wrap the Text Input + UDF pair into a macro so that it appears as a single operator (similar to what we did for HashJoin). The second option—adding parameter fields directly to the Python UDF operator—would require significantly more work. Today the engine treats the UDF as a raw Python string, so we would need new plumbing to pass values from the UI into the runtime and then inject or rewrite them into that string in a safe and predictable way. Given that, I recommend going with the first approach. It is simpler, avoids changes to the UDF execution model, and builds on mechanisms we already have. |
Beta Was this translation helpful? Give feedback.
-
|
I also believe option 1 is better. Here are more insights from the perspective of the engine. In the Amber engine, the UDF's Python code is sent from JVM to PVM as a string to be executed on the PVM. In option 1, the user-provided parameters are passed from the top-left operator as two data tuples to the Python UDF operator. In the UDF, the Python code for port 0, which runs on PVM, parses the data tuples sent from JVM to retrieve the two parameters. The Python code for port 1, which also runs on the same PVM, uses the two parameters in its computation. In option 2, the parameters are provided by the user through the frontend fields of the Python UDF (as part of its property panel). For these parameters to be obtained by the Python code on PVM, we need to send them from JVM to PVM as a control message, which is doable but not easy. We could avoid sending a control message by using a "Python code template" on JVM and plugging in the user's parameters, but that's NOT how our engine generates the Python code. Therefore, option 1 is cleaner and easier to adopt. |
Beta Was this translation helpful? Give feedback.
-
|
The current R UDF doesn't support multiple ports, and we will have a separate effort to add this support. The idea should be the same as option 1. |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem Summary
Currently, modifying values used inside a Python UDF (for example, hyperparameters in an ML training algorithm) requires users to directly modify the UDF's source code. This presents a barrier for non-coding users, who often work with workflows created by other developers and may not feel comfortable editing code. We have received this feedback from many collaborators already.
It would be beneficial to allow workflow developers to expose configurable parameters outside of the UDF source code. This would enable non-technical users to adjust these values through a separate interface without needing to modify or understand the source code.
Possible Solutions
Beta Was this translation helpful? Give feedback.
All reactions