Skip to content

initial update of UDF sections #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 2, 2025
14 changes: 13 additions & 1 deletion doc/distributed_python/advanced.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,14 @@
Advanced
--------
--------

From a performance perspective, which programming language you should use in an UDF script depends on the purpose and context of the script, since specific elements may have different capacities in each language. For example, string processing can be faster in one language while XML parsing can be faster in another. This means that one language cannot be said to have better performance in all circumstances. However, if overall performance is the most important criteria, we recommend using Lua. Lua is integrated in Exasol in the most native way, and therefore has the smallest process overhead.

During the processing of a SELECT statement, multiple virtual machines are started for each script and node. These virtual machines process the data independently. For scalar functions, the input rows are distributed across those virtual machines to achieve maximum parallelism. For SET input tuples, the virtual machines are used per group if you specify a GROUP BY clause. Otherwise there will be only one group, which means only one node and virtual machine can process the data.

The following pages contain information about more advanced UDF functionality:

* `UDF Instance Limiting <https://docs.exasol.com/db/latest/database_concepts/udf_scripts/udf_instance_limit.htm>`_

* `Hiding Access tokens and secrets <https://docs.exasol.com/db/latest/database_concepts/udf_scripts/hide_access_keys_passwords.htm>`_

* `Managing Script Language Containers <https://docs.exasol.com/db/latest/database_concepts/udf_scripts/adding_new_packages_script_languages.htm>`_
4 changes: 3 additions & 1 deletion doc/distributed_python/debugging.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
Debugging
---------
---------

For version of Python 3.X we recommend using `pyexasol <https://exasol.github.io/pyexasol/master/index.html>`_ and the `script output functionality <https://exasol.github.io/pyexasol/master/user_guide/udf_script_output.html>`_ to debug your UDFs.
10 changes: 9 additions & 1 deletion doc/distributed_python/intro.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,10 @@
Intro to UDFs
-------------
-------------

UDF scripts allow you to program your own analysis, processing, and generation functions, and to execute these functions in parallel inside an Exasol cluster.
By using UDF scripts you can solve problems that are not possible to solve with SQL statements.

Exasol supports the programming languages Java, Lua, R, and Python in UDF scripts. These languages provide different functionalities (for example, statistical functions in R) and different libraries.

UDFs are the key to unlocking much of Exasol's AI, ML and Data Science potential, as well as customizing Exasol to suite your unique use cases.
UDFs are executed by Exasol's massively parallel query engine and scale across available hardware in the same way SQL queries do - this gives them significant performance potential.
33 changes: 32 additions & 1 deletion doc/distributed_python/usage.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,33 @@
Creating and running UDFs
-------------------------
-------------------------

For best performance, we recommend that you create a script using CREATE SCRIPT and then use this script within a SELECT statement. Embedding the script within SQL in this way provides the best performance and scalability.

In the CREATE SCRIPT command, you must define the type of input and output values.
There are two types of UDF inputs (set and scalar) and two types of UDF outputs (returns and emits).
These can be combined as needed to suite your use case.

- Input values

- **SCALAR** Specifies that the script processes single input rows. The code is therefore called once per input row.

- **SET** Specifies that the processing refers to a set of input values. Within the code, you can iterate through those values.

- Output values

- **RETURNS** Specifies that the script returns a single value.

- **EMITS** Specifies that the script can create (emit) multiple result rows (tuples).

You can define the data types of input and output parameters to specify the conversion between internal data types and the database SQL data types. If you do not specify the data types, the script has to handle that dynamically.
Each UDF script must contain the main function run(). This function is called with a parameter providing access to the input data of Exasol. If your script processes multiple input tuples (using SET), you can iterate through the single tuples using this parameter.
You can specify an ORDER BY clause either when creating a script or when calling it. This clause sorts the processing of the groups of SET input data. If it is necessary for the algorithm, you should specify this clause when creating the script to avoid wrong results due to misuse.

Input parameters in scripts are always case sensitive, similar to the script code. This is different to SQL identifiers, which are only case sensitive if they are delimited.

You can use this `UDF Generator <https://htmlpreview.github.io/?https://github.com/EXASOL/script-languages/blob/master/udf-script-signature-generator/udf-script-signature-generator.html>`_ to help you get started building your own UDFs.

Examples
^^^^^^^^^

You can view examples of UDFs `here <https://docs.exasol.com/db/latest/database_concepts/udf_scripts/udf_examples.htm>`_.
Loading