diff --git a/doc/data_science.rst b/doc/data_science.rst new file mode 100644 index 0000000..2aa4ce0 --- /dev/null +++ b/doc/data_science.rst @@ -0,0 +1,15 @@ +Data Science with Exasol +------------------------- + +Exasol has significant capabilities for implementing data science workflows - from classic machine learning to Gen AI and language model solutions. + +The best way to get started is with `Exasol's AI Lab `_. + +This video walks through `getting started with AI Lab `_. + +AI Lab includes various workbooks that you can run to load data into Exasol. +This video walks through `loading data `_ in more detail. + +If you want to leverage Exasol to build Gen AI and LM-based solutions we recommend starting with the Exasol `Transformers Extension `_. + +This video showcases the potential `applications of the Exasol Transformers Extension `_ . \ No newline at end of file diff --git a/doc/distributed_python/advanced.rst b/doc/distributed_python/advanced.rst index a378715..de78d0f 100644 --- a/doc/distributed_python/advanced.rst +++ b/doc/distributed_python/advanced.rst @@ -1,2 +1,14 @@ Advanced --------- \ No newline at end of file +-------- + +From a performance perspective, which programming language you should use in an UDF script depends on the purpose and context of the script, as specific elements may have different capacities in each language. For example, string processing can be faster in one language while XML parsing can be faster in another. This means that one language cannot be said to have better performance in all circumstances. However, if overall performance is the most important criteria, we recommend using Lua. Lua is integrated in Exasol in the most native way, and therefore, it has the smallest process overhead. + +During the processing of a SELECT statement, multiple virtual machines are started for each script and node. These virtual machines process the data independently. For scalar functions, the input rows are distributed across those virtual machines to achieve maximum parallelism. For SET input tuples, the virtual machines are used per group if you specify a GROUP BY clause. Otherwise, there will be only one group, which means only one node and virtual machine can process the data. + +The following pages contain information about more advanced UDF functionality: + +* `UDF Instance Limiting `_ + +* `Hiding Access tokens and secrets `_ + +* `Managing Script Language Containers `_ \ No newline at end of file diff --git a/doc/distributed_python/debugging.rst b/doc/distributed_python/debugging.rst index a3cd0d6..510d244 100644 --- a/doc/distributed_python/debugging.rst +++ b/doc/distributed_python/debugging.rst @@ -1,2 +1,4 @@ Debugging ---------- \ No newline at end of file +--------- + +For Python versions 3.x, we recommend using `pyexasol `_ and the `script output functionality `_ to debug your UDFs. \ No newline at end of file diff --git a/doc/distributed_python/intro.rst b/doc/distributed_python/intro.rst index 2089e26..f214b30 100644 --- a/doc/distributed_python/intro.rst +++ b/doc/distributed_python/intro.rst @@ -1,2 +1,10 @@ Intro to UDFs -------------- \ No newline at end of file +------------- + +UDF scripts allow you to program your own analysis, processing, and generation functions, and to execute these functions in parallel inside an Exasol cluster. +By using UDF scripts, you can solve problems that are not possible to solve with SQL statements. + +Exasol supports the programming languages Java, Lua, R, and Python in UDF scripts. These languages provide different functionalities (for example, statistical functions in R) and different libraries. + +UDFs are the key to unlocking much of Exasol's AI, ML and Data Science potential, as well as customizing Exasol to suit your unique use cases. +UDFs are executed by Exasol's massively parallel query engine and scale across available hardware in the same way SQL queries do - this gives them significant performance potential. \ No newline at end of file diff --git a/doc/distributed_python/usage.rst b/doc/distributed_python/usage.rst index 8cb812b..176325e 100644 --- a/doc/distributed_python/usage.rst +++ b/doc/distributed_python/usage.rst @@ -1,2 +1,30 @@ Creating and running UDFs -------------------------- \ No newline at end of file +------------------------- + +In the CREATE SCRIPT command, you must define the type of input and output values. +There are two types of UDF inputs (set and scalar) and two types of UDF outputs (returns and emits). +These can be combined as needed to suite your use case. + +- Input values + + - **SCALAR** Specifies that the script processes single input rows. The code is therefore called once per input row. + + - **SET** Specifies that the processing refers to a set of input rows. Within the code, you can iterate through those rows. + +- Output values + + - **RETURNS** Specifies that the script returns a single value. + + - **EMITS** Specifies that the script can create (emit) multiple result rows (tuples). + +Each UDF script must contain the main function run(). This function is called with a parameter providing access to the input data of Exasol. If your script processes multiple input tuples (using SET), you can iterate through the single tuples using this parameter. +You can specify an ORDER BY clause either when creating a script or when calling it. This clause sorts the processing of the groups of SET input data. If it is necessary for the algorithm, you should specify this clause when creating the script to avoid wrong results due to misuse. + +Input parameters in scripts are always case sensitive, similar to the script code. This is different to SQL identifiers, which are only case sensitive if they are delimited. + +You can use this `UDF Generator `_ to help you get started building your own UDFs. + +Examples +^^^^^^^^^ + +You can view examples of UDFs `here `_. \ No newline at end of file diff --git a/doc/index.rst b/doc/index.rst index 147bf17..67a174b 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -10,6 +10,7 @@ Documentation and resources for data scientists and programmatic users to perfor getting_started data_ingestion distributed_python/index.rst + data_science examples environments integrations diff --git a/doc/integrations.rst b/doc/integrations.rst index 9922dd5..28baf1a 100644 --- a/doc/integrations.rst +++ b/doc/integrations.rst @@ -60,4 +60,4 @@ Ibis Please refer to the `IBIS documentation `_. - +You can also watch `this video `_ for a step by step walk through of using Ibis with Exasol via AI Lab.