diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 318f9d810..6b2a9e7d1 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -11,7 +11,9 @@ jobs:
if: github.event_name != 'push'
runs-on: ubuntu-latest
steps:
- - uses: actions/checkout@v1
+ - uses: actions/checkout@v2
+ with:
+ submodules: true
- uses: actions/setup-node@v1
with:
node-version: '16.x'
diff --git a/.gitignore b/.gitignore
index 4c3563ba5..8623faf48 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,3 +3,6 @@ yarn.lock
.idea
.docusaurus
node_modules
+
+# These files are copied from the sqrl submodule
+docs/sqrl/*
\ No newline at end of file
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 000000000..801b22039
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "sqrl"]
+ path = sqrl
+ url = git@github.com:DataSQRL/sqrl.git
diff --git a/docs/getting-started/concepts/datasqrl.md b/docs/getting-started/concepts/datasqrl.md
index 1a3ae40a0..825f5e74c 100644
--- a/docs/getting-started/concepts/datasqrl.md
+++ b/docs/getting-started/concepts/datasqrl.md
@@ -26,7 +26,7 @@ DataSQRL compiles the SQL script and output specification into a data pipeline t
DataSQRL has a pluggable engine architecture which allows it to support various stream processors, databases, data warehouses, data streams, and API servers. Feel free to contribute your favorite data technology as a DataSQRL engine to the open-source, wink wink.
-DataSQRL can generate data pipelines with multiple topologies. Take a look at the [types of data products](/docs/reference/concepts/data-product#types) that DataSQRL can build. You can further customize those pipeline topologies in the DataSQRL [package configuration](/docs/reference/sqrl/datasqrl-spec/) which defines the data technologies at each stage of the resulting data pipeline.
+DataSQRL can generate data pipelines with multiple topologies. Take a look at the [types of data products](/docs/reference/concepts/data-product#types) that DataSQRL can build. You can further customize those pipeline topologies in the DataSQRL [package configuration](/docs/sqrl/datasqrl-spec) which defines the data technologies at each stage of the resulting data pipeline.
DataSQRL compiles executables for each engine in the pipeline which can be deployed on the data technologies and cloud services you already use.
In addition, DataSQRL provides development tooling that makes it easy to run and test data pipelines locally to speed up the development cycle.
diff --git a/docs/getting-started/concepts/when-datasqrl.md b/docs/getting-started/concepts/when-datasqrl.md
index d56dc2f20..75a126661 100644
--- a/docs/getting-started/concepts/when-datasqrl.md
+++ b/docs/getting-started/concepts/when-datasqrl.md
@@ -65,7 +65,7 @@ That said, if you have multiple orange data products, DataSQRL is still worth it
DataSQRL is a compiler that generates data pipelines using proven data technologies like Apache Flink, Kafka, or Postgres. DataSQRL has a pluggable engine architecture that supports various types of databases, stream processors, and API servers. Hence, the other thing to consider is whether DataSQRL supports the data technologies you are already using or plan to use in your data pipelines.
-Take a look at the [engines documentation](/docs/reference/sqrl/datasqrl-spec#engines) to see whether your favorite data technologies are supported. And if not, [reach out to us](/contact) to see if and when DataSQRL will support the data technology of your choice. DataSQRL is still young and we are continuously expanding support.
+Take a look at the [engines documentation](/docs/sqrl/datasqrl-spec#engines) to see whether your favorite data technologies are supported. And if not, [reach out to us](/contact) to see if and when DataSQRL will support the data technology of your choice. DataSQRL is still young and we are continuously expanding support.
### Expressivity
diff --git a/docs/getting-started/intro/deploy.md b/docs/getting-started/intro/deploy.md
index 35406fad9..c1fc6c072 100644
--- a/docs/getting-started/intro/deploy.md
+++ b/docs/getting-started/intro/deploy.md
@@ -96,7 +96,7 @@
[//]: # (The package configuration specifies what engines DataSQRL compiles to. DataSQRL calls the data technologies that execute the components of a data pipeline "**engines**". For example, DataSQRL supports [Apache Flink](https://flink.apache.org/) as a stream engine, [Apache Kafka](https://kafka.apache.org/) as a log engine, [Postgres](https://www.postgresql.org/) as a database engine, and [Vert.x](https://vertx.io/) as a server engine.)
[//]: # ()
-[//]: # (Check out [all the engines](/docs/reference/operations/engines/overview) that DataSQRL supports and how to configure them in the [package configuration](/docs/reference/sqrl/datasqrl-spec/#packagejson). )
+[//]: # (Check out [all the engines](/docs/reference/operations/engines/overview) that DataSQRL supports and how to configure them in the [package configuration](/sqrl/datasqrl-spec/#packagejson). )
[//]: # ()
[//]: # (That concludes our introductory tutorial! Great job and enjoy building with data(sqrl)!)
diff --git a/docs/intro.md b/docs/intro.md
index 0a26eecc9..ba45ea188 100644
--- a/docs/intro.md
+++ b/docs/intro.md
@@ -111,7 +111,7 @@ You can also learn more about [DataSQRL](../getting-started/concepts/datasqrl),
- Jump into our [Getting Started Guide](../getting-started) for a quick introduction to the basics of DataSQRL.
- Explore the Interactive Tutorial and [Quickstart](../getting-started/quickstart) to see DataSQRL in action.
-- Check out the [reference documentation](../reference/sqrl/datasqrl-spec) covers all aspects of DataSQRL in detail. If you want more information on how to use DataSQRL or are looking for comprehensive documentation, this is your place to go.
+- Check out the [reference documentation](/docs/sqrl/datasqrl-spec) covers all aspects of DataSQRL in detail. If you want more information on how to use DataSQRL or are looking for comprehensive documentation, this is your place to go.
- Learn more about advanced features and customization in our How-to Guides.
## Join the DataSQRL community
diff --git a/docs/reference/sqrl/bak.md b/docs/reference/sqrl/bak.md
index a2f4cd0fe..ebc5f3b04 100644
--- a/docs/reference/sqrl/bak.md
+++ b/docs/reference/sqrl/bak.md
@@ -151,7 +151,7 @@
[//]: # (### Dependency)
[//]: # ()
-[//]: # (Dependencies are declared in the [package configuration](/docs/reference/sqrl/datasqrl-spec/#packagejson#dependency). Dependencies can point to local package folders or be downloaded from a [repository](../../operations/repository) at compile time.)
+[//]: # (Dependencies are declared in the [package configuration](/sqrl/datasqrl-spec/#packagejson#dependency). Dependencies can point to local package folders or be downloaded from a [repository](../../operations/repository) at compile time.)
[//]: # ()
[//]: # (By default, DataSQRL looks up any missing packages in the [repository](https://dev.datasqrl.com). A package is missing if it is not declared as dependencies and cannot be resolved locally. If the missing package can be located in the repository, a dependency on the most recent version of that package is added to the package configuration.)
diff --git a/docs/reference/sqrl/cli.md b/docs/reference/sqrl/cli.md
deleted file mode 100644
index 3239fb590..000000000
--- a/docs/reference/sqrl/cli.md
+++ /dev/null
@@ -1,181 +0,0 @@
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-
-
-# DataSQRL Command
-
-The DataSQRL command compiles, runs, and tests SQRL scripts. It also provides utilities for managing data sources and sinks, uploading packages to the repository, and other convenience features.
-
-You invoke the DataSQRL command in your terminal or command line. Choose your operating system below or use Docker which works on any machine that has Docker installed.
-
-## Installation
-
-
-
-
-```bash
-brew tap datasqrl/sqrl
-brew install sqrl-cli
-```
-
-:::note
-Check that you're on the current version of DataSQRL by running `sqrl --version`
-To update an existing installation:
-
-```bash
-brew upgrade sqrl-cli
-```
-:::
-
-
-
-Always pull the latest Docker image to ensure you have the most recent updates:
-
-```bash
-docker pull datasqrl/cmd:latest
-```
-
-:::note
-The Docker version of DataSQRL has limited functionality and does not support the development or testing runtime that ships with the DataSQRL command. These significantly speed up the development cycles from minutes to seconds.
-:::
-
-
-
-
-### Global Options
-All commands support the following global options:
-
-|Option/Flag Name |Description|
-|--------------|---------------|
-|-c or --config| Specifies the path to one or more package configuration files. Contents of multiple files are merged in the specified order. Defaults to package.json in the current directory, generating a default configuration if none exists.|
-
-Note, that most commands require that you either specify the SQRL script (and, optionally, a GraphQL schema) as command line arguments or use the
-`-c` option to specify a project configuration file that configures the SQRL script (and, optionally, a GraphQL schema).
-
-## Compile Command
-The compile command processes an SQRL script and, optionally, an API specification, into a deployable data pipeline. The outputs are saved in the specified target directory.
-
-
-
-
-
-```bash
-sqrl compile myscript.sqrl myapischema.graphqls
-```
-
-
-
-
-```bash
-docker run --rm -v $PWD:/build datasqrl/cmd compile myscript.sqrl myapischema.graphqls
-```
-
-
-
-|Option/Flag Name| Description|
-|--------------|---------------|
-|-a or --api |Generates an API specification (GraphQL schema) in the file schema.graphqls. Overwrites any existing file with the same name.|
-|-t or --target |Directory to write deployment artifacts, defaults to build/deploy.|
-|--profile| Selects a specific set of configuration values that override the default package settings.|
-
-
-The command compiles the script and API specification into an integrated data product. The command creates a `build` with all the build artifacts that are used during the compilation and build process (e.g. dependencies). The command writes the deployment artifacts for the compiled data product into the `build/deploy` directory. Read more about deployment artifacts in the deployment documentation.
-
-
-## Test Command
-
-The test command executes the provided test queries and all tables annotated with `/*+test */` and snapshots the results.
-
-When you first run the test command, it will create the snapshots and fail. All subsequent runs of the test command compare the results to the previously snapshotted results and succeed if the results are identical, else fail.
-
-
-
-
-```bash
-sqrl test myscript.sqrl myapischema.graphqls
-```
-
-
-
-
-```bash
-docker run --rm -v $PWD:/build datasqrl/cmd test
-```
-
-
-
-Options for the Test Command:
-
-|Option/Flag Name| Description |
-|--------------|------------------------------------------------------|
-|-s or --snapshot| Path to the snapshot files. Defaults to `snapshot`. |
-|--tests| Path to test query files. Defaults to `tests`. |
-
-
-## Publish Command
-Publishes a local package to the repository. It is executed from the root directory of the package, archiving all contents and submitting them under the specified package configuration. The package must have a main `package.json` that contains the package information:
-
-```json
-{
- "version": "1",
- "package": {
- "name": "myorg.mypackage",
- "version": "0.1.2",
- "variant": "dev",
- "description": "This is my profile",
- "homepage": "http://www.mypackage.myorg.com",
- "documentation": "More information on my package",
- "topics": [ "mytag" ]
- }
-}
-```
-
-
-
-
-
-```bash
-sqrl publish --local
-```
-
-
-
-
-```bash
-docker run --rm -v $PWD:/build datasqrl/cmd publish --local
-```
-
-
-
-|Option/Flag Name| Description|
-|--------------|---------------|
-|--local |Publishes the package to the local repository only.|
-
-### How repository resolution works
-
-A repository contains DataSQRL packages. When compiling an SQRL script, the DataSQRL compiler retrieves dependencies declared in the [package configuration](/docs/reference/sqrl/datasqrl-spec) and unpacks them in the build directory.
-
-The remote DataSQRL directory is hosted at [https://dev.datasqrl.com](https://dev.datasqrl.com). Packages in the remote repository can be retrieved from any machine running the DataSQRL compiler with access to the internet.
-
-DataSQRL keeps a local repository in the hidden `~/.datasqrl/` directory in the user's home directory. The local repository is only accessible from the local machine. It caches packages downloaded from the remote repository and contains packages that are only published locally.
-
-
-## Login Command
-
-Authenticates a user against the repository. A user needs to be authenticated to access private packages in the repository or to publish a package.
-
-
-
-
-```bash
-sqrl login
-```
-
-
-
-
-```bash
-docker run --rm -v $PWD:/build datasqrl/cmd login
-```
-
-
diff --git a/docs/reference/sqrl/connectors.md b/docs/reference/sqrl/connectors.md
deleted file mode 100644
index 1a94bae82..000000000
--- a/docs/reference/sqrl/connectors.md
+++ /dev/null
@@ -1,246 +0,0 @@
-# Connectors (Source & Sink)
-
-To resolve `IMPORT` and `EXPORT` statements that ingest data from and write data to external systems, DataSQRL reads connector configuration files from a package. Packages are either local directories or downloaded from the repository, with the former taking precedence. Packages contain connector configurations.
-
-For example, the statement `IMPORT mypackage.MyTable;` imports the table `MyTable` from the package `mypackage`. For this import to resolve, the connector configuration for `MyTable` needs to present in the package.
-
-To connect to an external system like streaming platforms (e.g. Apache Flink), lake houses (e.g. Apache Iceberg), or databases (e.g. PostgreSQL), you create two configuration files:
-
-1. **Table Configuration**: A configuration file that specifies how to ingest the data into a table and how to connect to the external system. This file is named `MyTable.table.json`.
-2. **Schema Definition**: A schema that defines the structure of the data. DataSQRL supports multiple schema languages. Pick the schema language that the external system uses to make sure the data is aligned. The filename depends on the schema language - see below.
-
-## Table Configuration
-
-The table configuration is JSON file that has the name `MyTable.table.json` where `MyTable` is the name of the table that's used in the `IMPORT` statement.
-
-The table configuration has three sections for defining the table properties, connector configuration, and - optionally - metadata columns.
-
-An example table configuration file is shown below followed by the documentation for each section.
-
-```json
-{
- "version": 1,
- "table" : {
- "type" : "source",
- "primary-key" : ["id", "time"],
- "timestamp" : "_source_time",
- "watermark-millis" : "0"
- },
- "flink" : {
- "format" : "avro",
- "bootstrap.servers": "${BOOTSTRAP_SERVERS}",
- "group.id": "datasqrl-orders",
- "connector" : "kafka"
- },
- "metadata" : {
- "_source_time" : {
- "attribute" : "timestamp",
- "type": "TIMESTAMP_WITH_LOCAL_TIME_ZONE(3)"
- }
- }
-}
-```
-
-### Table Properties
-
-The `table` section of the table configuration specifies the table properties that are used by the DataSQRL planner to validate SQRL scripts and generate efficient data pipelines.
-
-The `type` of a table can be either `source`, `sink`, or `source_and_sink` if it can be used as both.
-
-The `primary-key` specifies the column or list of columns that uniquely identifies a single record in the table. Note, that when the table is a changelog or CDC stream for an entity table, the primary key should uniquely identify each record in the stream and not the underlying table. For example, if you consume a CDC stream for a `Customer` entity table with primary key `customerid` the primary key for the resulting CDC stream should include the timestamp of the change, e.g. `[customerid, lastUpdated]`.
-
-The `timestamp` field specifies the (single) timestamp column for a source stream which has the event time of a single stream record. `watermark-millis` defines the number of milliseconds that events/records can arrive late for consistent processing. Set this to `1` if events are perfectly ordered in time and to `0` if the timestamp is monotonically increasing (i.e. it's perfectly ordered and no two events have the same timestamp).
-Alternatively, you can also use processing time for event processing by removing the `watermark-millis` field and adding the processing time as metadata (see below), which means using the system clock of the machine processing the data and not the timestamp of the record. We highly recommend you use event time and not processing time for consistent, reproducible results.
-Timestamp and watermark are only used for sources.
-
-### Connector Configuration
-
-The connector configuration specifies how the stream engine connects to the source or sink and how it reads or writes the data. The connector configuration is specific to the configured stream processing engine that DataSQRL compiles to, and the section of the configuration is named after the engine. In the example above, the connector configuration is for the `flink` engine.
-
-The connector configuration is passed through to the stream engine. Check the documentation for the stream processing engine you are using for how to configure the connector:
-
-* [**Flink Connector Configuration**](https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/overview/): Make sure you use the connector configuration that's compatible with the version of Flink you are compiling to.
-
-### Metadata Columns
-
-Some connectors expose optional metadata that can included with the data as additional columns. Those are defined in the `metadata` section of the table configuration.
-
-The fields under `metadata` are the column names that are added to the table. Those need to be unique and not clash with the columns in the data. The `attribute` defines either a) the name of an attribute that the connector exposes or b) a function call.
-
-For a connector attribute, look up the name of the attribute in the connector configuration. Additionally, you need to specify the type of the attribute as `type`.
-In the example above, we add the Kafka timestamp as the column `_source_time` to the table with a timestamp `type`. The Kafka timestamp is exposed by the Flink Kafka connector under the `timestamp` attribute.
-
-For a function call, you specify the full function invocation as the attribute. You don't need to specify a type.
-For example, to use processing time you need to add the processing time as a column with the following metadata definition:
-
-```json
- "metadata" : {
- "_source_time" : {
- "attribute" : "proctime()"
- }
- }
-```
-
-## Schema Definition
-
-The schema defines the structure of the data and is used to derive the row type of the imported or exported table.
-
-If the source system has a schema, it is best to use the source schema directly to avoid mismappings. For example, if you use Avro schema registry with Kafka, download the Avro schema and place it into the package.
-
-### Avro
-
-To use Avro schema, take the Avro schema file and place it into the package with the table configuration file and name it `MyTable.avsc` where `MyTable` the name of your table.
-
-You don't need to make any modifications to your Avro schema file.
-
-### SQL
-
-:::note
-SQL schema is currently experimental, behind a feature-flag, and does not support nested data.
-:::
-
-To use SQL schema, place the `CREATE TABLE` statement for the table data in a file named `MyTable.sql`.
-
-```sql title=MyTable.sql
-CREATE TABLE Customer (
- id INT NOT NULL,
- name VARCHAR(100),
- birthdate DATE,
- email VARCHAR(100),
- lastUpdated TIMESTAMP_WITH_LOCAL_TIME_ZONE(3)
-);
-```
-
-Note, that the primary key must be defined in the table configuration and not in the schema SQL file. Any primary key definition in the SQL file will be ignored.
-
-### YAML
-
-DataSQRL supports a flexible schema format in YAML format.
-DataSQRL schema is simple, accommodates semi-structured data, supports schema evolution, and provides testing capabilities.
-
-YAML schema files end in `.schema.yml`, e.g. `MyTable.schema.yaml`. To get flexible schema capabilities in DataSQRL for JSON source data, you also need to configure the format in the connector to `flexible-json`. The flexible json format is more robust to input data and supports JSON natively, unlike Flink's default `json` format.
-
-DataSQRL schema is the default schema used for schema-less sources like JSON.
-
-
-#### Example DataSQRL Schema
-
-```yml
-name: "orders"
-schema_version: "1"
-columns:
-- name: "id"
- type: "INTEGER"
- tests:
- - "not_null"
-- name: "customerid"
- type: "INTEGER"
- tests:
- - "not_null"
-- name: "time"
- type: "DATETIME"
- tests:
- - "not_null"
-- name: "items"
- cardinality:
- min: 1
- max: 1000
- columns:
- - name: "productid"
- type: "INTEGER"
- tests:
- - "not_null"
- - name: "quantity"
- type: "INTEGER"
- tests:
- - "not_null"
- - name: "unit_price"
- type: "FLOAT"
- tests:
- - "not_null"
- - name: "discount"
- type: "FLOAT"
- tests:
- - "not_null"
-```
-
-#### Schema Definition
-
-DataSQRL schema supports the following attributes to define the data structure:
-
-| Field Name | Description | Required? |
-|----------------|-------------------------------------------------------------------------------------------------------------------------------|-----------|
-| name | Name of the table that this schema applies to | Yes |
-| schema_version | Version of DataSQRL schema for this schema file | Yes |
-| description | Description of the table | No |
-
-A table is defined by a list of columns. A column has a `name`. A column is either a scalar field or a nested table.
-
-A column is defined by the following attributes:
-
-| Field Name | Description | Required? |
-|---------------|--------------------------------------------------------------------|-------------------------------------|
-| name | Name of the column. Must be unique per table at any nesting level. | Yes |
-| description | Description of the column | No |
-| default_value | Value to use when column is missing in input data. | No |
-| type | Type for a scalar field | One of `type`, `columns` or `mixed` |
-| columns | Columns for a nested table | One of `type`, `columns` or `mixed` |
-| mixed | A mix of scalar fields and nested tables for unstructured data | One of `type`, `columns` or `mixed` |
-| tests | A set of constraints that the column satisfies | No |
-
-A column must either have a type (for scalar field) or a list of columns (for nested table). For unstructured data (i.e. data that does not conform to a schema), there is a third option to define a *mixed column* which can be a combination of multiple scalar fields or nested tables.
-
-A mixed column is defined by the attribute `mixed` which is a map of multiple column definitions that are identified by a unique name.
-
-```yml
-- name: "time"
- mixed:
- - "epoch":
- type: INTEGER
- - "timestamp":
- type: DATETIME
-```
-
-This defines the column `time` to be a mixed column that is either a scalar field called `epoch` with an `INTEGER` type or a scalar field called `timestap` with a `DATETIME` type. We would use such a mixed column definition for data where `time` is either represented as seconds since epoch or a timestamp.
-
-Each individual column of a mixed column definition gets mapped onto a separate column in the resulting SQRL table with the column name being a combination of the mixed column name and the map key. For our example above, the SQRL `orders` table would contain a column `time_epoch` and `time_timestamp` for each of the respective scalar fields.
-
-#### Scalar Types
-
-DataSQRL schema supports these scalar types:
-
-* **INTEGER**: for whole numbers
-* **FLOAT**: for floating point numbers
-* **BOOLEAN**: true or false
-* **DATETIME**: point in time
-* **STRING**: character sequence
-* **UUID**: unique identifier
-* **INTERVAL**: for periods of time
-
-To define arrays of scalar types, wrap the type in square brackets. For instance, an integer array is defined as `[INTEGER]`.
-
-#### Data Constraints
-
-The `test` attribute specifies data constraints for columns, whether scalar field or nested table. These constraints are validated when data is ingested to filter out invalid or unneeded data. The constraints are also used to validate statements in SQRL scripts. In addition, the DataSQRL [optimizer](/docs/reference/sqrl/learn#datasqrl-optimizer) analyzes the constraints to build more efficient data pipelines.
-
-DataSQRL schema supports the following test constraints:
-
-* **not_null**: the column can not be missing or have a null value.
-* **unique**: the column values are unique across all records in this table.
-
-
-## Static Data
-
-For development and testing, it is useful to use static data files as source data. To use static data as a source, convert the data to one of the following formats:
-
-* [JSON Lines format](https://jsonlines.org/): A text file of newline seperated json documents that ends in `.jsonl`.
-* [CSV Format](https://en.wikipedia.org/wiki/Comma-separated_values): A text file that contains comma-separated data and a header line. The file have the `.csv` extension.
-
-Give the file the name of the table and place it in a local package folder. You can import the data without any additional configuration since DataSQRL will automatically infer the schema and create the schema and table configuration files in the package folder.
-
-For example, if we place the file `MyTable.jsonl` in the folder `mypackage`, we can import the data with:
-```sql
-IMPORT mypackage.MyTable;
-```
-
-When you invoke the compiler, run or test your SQRL script, DataSQRL automatically discovers the schema and creates the table configuration and YAML schema files in the package/folder. You can then adjust both to suit your needs. To update the configuration and schema after making changes to the source data, delete those two files.
\ No newline at end of file
diff --git a/docs/reference/sqrl/datasqrl-spec.md b/docs/reference/sqrl/datasqrl-spec.md
deleted file mode 100644
index 76c0343d2..000000000
--- a/docs/reference/sqrl/datasqrl-spec.md
+++ /dev/null
@@ -1,321 +0,0 @@
-# DataSQRL Configuration
-
-You control the topology, execution characteristics, and API of the DataSQRL generated data pipeline/microservice through configuration files.
-
-* **Package.json:** Controls all aspects of the compiler.
-* **GraphQL Schema:** Specifies the resulting GraphQL API.
-
-## Package.json
-
-The package configuration is the central configuration file used by DataSQRL. The package configuration declares dependencies, configures the engines in the data pipeline, sets compiler options, and provides package information.
-
-You can pass a configuration file to the compiler via the `-c` or `--config` flag. You can specify a single configuration file or multiple files. If multiple files are specified, they are merged in the order they are specified (i.e. fields - including arrays - are replaced and objects are merged).
-
-DataSQRL allows environment variables: `${VAR}`.
-
-A minimal package json contains the following:
-```json
-{
- "version": "1"
-}
-```
-
-### Engines
-
-`engines` is a map of engine configurations by engine name that the compiler uses to instantiate the engines in the data pipeline. The DataSQRL compiler produces an integrated data pipeline against those engines. At a minimum, DataSQRL expects that a stream processing engine is configured.
-
-Engines can be enabled with the enabled-engines property. The default set of engines are listed below:
-```json
-{
- "enabled-engines": ["vertx", "postgres", "kafka", "flink"]
-}
-```
-
-#### Flink
-
-Apache Flink is the default stream processing engine.
-
-The physical plan that DataSQRL generates for the Flink engine includes:
-* FlinkSQL table descriptors for the sources and sinks
-* FlinkSQL view definitions for the data processing
-* A list of connector dependencies needed for the sources and sinks.
-
-Flink reads data from and writes data to the engines in the generated data pipeline. DataSQRL uses connector configuration templates to instantiate those connections.
-These templates are configured under the `connectors` property.
-
-Connectors that link flink to other engines and external systems can be configured in the `connectors` property. Connectors use the [flink configuration options](https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/overview/) and are directly passed through to flink without modification.
-
-Environment variables that start with the `sqrl` prefix are templated variables that the DataSQRL compiler instantiates. For example: `${sqrl:table}` provides the table name for a connector that writes to a table.
-
-```json
-{
- "engines" : {
- "flink" : {
- "connectors": {
- "postgres": {
- "connector": "jdbc",
- "password": "${JDBC_PASSWORD}",
- "driver": "org.postgresql.Driver",
- "username": "${JDBC_USERNAME}",
- "url": "${JDBC_URL}",
- "table-name": "${sqrl:table}"
- },
- "kafka": {
- "connector" : "kafka",
- "format" : "flexible-json",
- "properties.bootstrap.servers": "${PROPERTIES_BOOTSTRAP_SERVERS}",
- "properties.group.id": "${PROPERTIES_GROUP_ID}",
- "scan.startup.mode" : "group-offsets",
- "properties.auto.offset.reset" : "earliest",
- "topic" : "${sqrl:topic}"
- },
- "iceberg" : {
- "warehouse":"s3://daniel-iceberg-table-test",
- "catalog-impl":"org.apache.iceberg.aws.glue.GlueCatalog",
- "io-impl":"org.apache.iceberg.aws.s3.S3FileIO",
- "catalog-name": "mydatabase"
- }
- }
- }
- }
-}
-```
-
-Flink runtime configuration can be specified in the [`values` configuration](#values) section.
-
-
-#### Postgres
-
-Postgres is the default database engine.
-
-The physical plan that DataSQRL generates for the Postgres engine includes:
-* Table DDL statements for the physical tables.
-* Index DDL statements for the index structures on those tables.
-* View DDL statements for the logical tables. Views are only created when no server engine is enabled.
-
-#### Vertx
-
-Vertx is the default server engine. A high performance GraphQL server implemented in [Vertx](https://vertx.io/). The GraphQL endpoint is configured through the [GraphQL Schema](#graphql-schema).
-
-The physical plan that DataSQRL generates for Vertx includes:
-* The connection configuration for the database(s) and log engine
-* A mapping of GraphQL endpoints to queries for execution against the database(s) and log engine.
-
-
-#### Kafka
-
-Apache Kafka is the default `log` engine.
-
-The physical plan that DataSQRL generates for Kafka includes:
-* A list of topics with configuration and (optional) Avro schema.
-
-#### Iceberg
-
-Apache Iceberg is a table format that can be used as a database engine with DataSQRL.
-
-The `iceberg` engine requires an enabled query engine to execute queries against it.
-
-The physical plan that DataSQRL generates for Kafka includes:
-* Table DDL statements for the physical tables
-* Catalog registration for registering the tables in the associated catalog, e.g. AWS Glue.
-
-#### Snowflake
-
-Snowflake is a query engine that can be used in combination with a table format as a database in DataSQRL.
-
-The physical plan that DataSQRL generates for Kafka includes:
-* External table registration through catalog integration. The Snowflake connector currently support AWS Glue.
-* View definitions for the logical tables.
-
-To define the catalog integration for Snowflake:
-```json
-{
- "snowflake" : {
- "catalog-name": "MyCatalog",
- "external-volume": "iceberg_storage_vol"
- }
-}
-```
-
-### Compiler
-
-The `compiler` section of the configuration controls elements of the core compiler and DAG Planner.
-
-```json
-{
- "compiler" : {
- "addArguments": true,
- "logger": "print",
- "explain": {
- "visual": true,
- "text": true,
- "extended": false
- }
- }
-}
-```
-
-* `addArguments` specifies whether to include table columns as filters in the generated GraphQL schema. This only applies if the GraphQL schema is generated by the compiler.
-* `logger` configures the logging framework used for logging statements like `EXPORT MyTable TO logger.MyTable;`. It is `print` by default which logs to STDOUT. Set it to the configured log engine for logging output to be sent to that engine, e.g. `"logger": "kafka"`. Set it to `none` to suppress logging output.
-* `explain` configures how the DAG plan compiled by DataSQRL is presented in the `build` directory. If `visual` is true, a visual representation of the DAG is written to the `pipeline_visual.html` file which you can open in any browser. If `text` is true, a textual representation of the DAG is written to the `pipeline_explain.txt` file. If `extended` is true, the DAG outputs include more information like the relational plan which may be very verbose.
-
-
-### Profiles
-
-```json
-{
- "profiles": ["myprofile"]
-}
-```
-
-The deployment profile determines the deployment assets that are generated by the DataSQRL compiler. For example, the default profile generates docker images and a docker compose template for orchestrating the entire pipeline compiled by DataSQRL.
-
-You can configure a single or multiple deployment profiles which are merged together.
-
-A deployment profile can also be downloaded from the repository when it's fully qualified package name is specified:
-
-```json
-{
- "profiles" : ["datasqrl.profile.default"]
-}
-```
-
-Learn more about [deployment profiles](../deployments).
-
-### Values
-
-The `values` section of the [DataSQRL configuration](../datasqrl-spec) allows you to specify configuration values that are passed through to the deployment profile and can be referenced in the deployment profile templates. See [deployment profiles](../deployments) for more information.
-
-The default deployment profiles supports a `flink-config` section to allow injecting additional flink runtime configuration. You can use this section of the configuration to specify any [Flink configuration option](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/config/).
-
-```json
-{
- "values" : {
- "flink-config" : {
- "taskmanager.memory.network.max": "800m",
- "execution.checkpointing.mode" : "EXACTLY_ONCE",
- "execution.checkpointing.interval" : "1000ms"
- }
- }
-}
-```
-
-The `values` configuration settings take precedence over identical configuration settings in the compiled physical plans.
-
-### Dependencies
-
-`dependencies` is a map of all packages that a script depends on. The name of the dependency is the key in the map and the associated value defines the dependency by `version` and `variant`.
-
-While explicit package dependencies are encouraged, DataSQRL will automatically look up packages in the SQRL script in the [repository](https://dev.datasqrl.com).
-
-```json
-{
- "dependencies" : {
- "datasqrl.seedshop" : {
- "name": "datasqrl.seedshop",
- "version" : "0.1.0",
- "variant" : "dev"
- }
- }
-}
-```
-
-This example declares a single dependency `datasqrl.seedshop`. The DataSQRL packager retrieves the `datasqrl.seedshop` package from the repository for the given version "0.1.0" and "dev" variant and makes it available for the compiler. The `variant` is optional and defaults to `default`.
-Note, that specifying the name in this case is optional. You have to specify the name if the package name in the SQRL script is different from the name in the repository (i.e. you want to rename the package).
-
-**Variants**
-Package can have multiple variants for a given version. A variant might be a subset, static snapshot, or point to an alternate data system for development and testing.
-
-**Dependency Aliasing**:
-
-We can also rename dependencies which makes it easy to dynamically swap out dependencies for different environments and testing.
-```json
-{
- "dependencies":
- {
- "datasqrl.tutorials.seedshop": {
- "name": "local-seedshop",
- "version": "1.0.0",
- "variant": "dev" }
- }
-}
-```
-In the above example, the `local-seedshop` directory will be looked up and renamed to `datasqrl.tutorials.seedshop`.
-
-### Script
-
-The main SQRL script and GraphQL schema for the project can be configured in the project configuration under the `script` section:
-
-```json
- {
- "script": {
- "main": "mainScript.sqrl",
- "graphql": "apiSchema.graphqls"
- }
-}
-```
-
-
-### Package Information
-
-The `package` section of the configuration provides information about the package or script. The whole section can be omitted when compiling or running a script. It is required when publishing a package to the repository.
-
-:::info
-To publish a package to the remote repository, the first component of the package name path has to match your DataSQRL account name or the name of an organization you are part of.
-:::
-
-Learn more about publishing in the CLI documentation.
-
-```json
-{
- "package": {
- "name": "datasqrl.tutorials.Quickstart",
- "version": "0.0.1",
- "variant": "dev",
- "description": "A docker compose datasqrl profile",
- "homepage": "https://www.datasqrl.com/docs/getting-started/quickstart",
- "documentation": "Quickstart tutorial for datasqrl.com",
- "topics": "tutorial"
- }
-}
-```
-
-| Field Name | Description | Required? |
-|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
-| name | Name of the package. The package name should start with the name of the individual or organization that provides the package. | Yes |
-| version | The version of the package. We recommend to use [semantic versioning](https://semver.org/). | Yes |
-| variant | The variant of the package if multiple variants are available. Defaults to `default`. | No |
-| latest | If this is the latest version of this package. DataSQRL uses the latest version when looking up missing packages on import. Defaults to `false`. | No |
-| description | A description of the package. | No |
-| license | The license used by this package. | No |
-| documentation | Link that points to documentation for this package | No |
-| homepage | Link that points to the homepage for this package | No |
-
-
-### Testing
-
-Testing related configuration is found in the `test-runner` section.
-
-```json
-{
- "test-runner": {
- "delay-sec": 30
- }
-}
-```
-
-* `delay-sec`: The number of seconds to wait between starting the processing of data and snapshotting the data.
-
-
-## GraphQL Schema
-
-
-
-
diff --git a/docs/reference/sqrl/deployments.md b/docs/reference/sqrl/deployments.md
deleted file mode 100644
index 030e51a58..000000000
--- a/docs/reference/sqrl/deployments.md
+++ /dev/null
@@ -1,51 +0,0 @@
-# Deployment
-
-DataSQRL can target any deployment infrastructure through the concept of "deployment profiles". A deployment profile is a collection of deployment assets for each engine in the data pipeline/microservice that can be templated using [Freemarker templates](https://freemarker.apache.org/). The templates are instantiated with the output from the DataSQRL compiler to produce the deployment assets. Those deployment assets can then be deployed manually or through CI/CD pipelines and other automation tools.
-
-DataSQRL provides default deployment profiles for its internal test and development runtime, Docker, and Kubernetes. Deployment profiles can also target Terraform or any other IaC (infrastructure-as-code) platform.
-
-## Deployment Profiles Process
-
-The deployment profile is configured in the projects [package.json](../datasqrl-spec) file. If the profile is not explicitly configured, DataSQRL's default Docker profile is used.
-
-A deployment profile contains a root `package.json` configuration file that contains all the default configuration for the engines that the profile supports and default configuration for the compiler. That package.json is merged with the project's configuration file, with the latter taking precedence.
-
-During compilation, DataSQRL produces physical plans for all configured engines in the targeted data pipeline/microservice topology as configured in the (potentially merged) project configuration files.
-The physical plans are written as JSON documents to the `build/plan` directory with one file per engine with the name of the engine.
-
-As the final step of the compilation process, the deployment assets from the deployment profile are copied to the `build/deploy` folder and all Freemarker templates (i.e. files ending in `.ftl`) are instantiated with the values from the physical plans for each engine.
-
-Once the compilation completes, the deployment assets in the `build/deploy` folder are ready to be deployed or executed locally.
-
-## Custom Deployment Values
-
-The `values` section of the [DataSQRL configuration](../datasqrl-spec) allows you to specify configuration values that are passed through to the deployment profile and can be referenced in the deployment profile templates.
-
-This allows you to specify runtime configuration in the project configuration file.
-
-## Creating Deployment Profiles
-
-Creating a custom deployment profile allows you to:
-
-1. Target your specific data infrastructure
-2. Customize the deployment to your needs
-3. Standardize DataSQRL project development and deployment in your organization.
-
-A deployment profile consists of:
-
-* A `package.json` configuration file that is used as the basis for all DataSQRL projects that use this profile. This file is placed in the root of the deployment profile.
-* A folder for each engine that is supported by the profile with engine specific deployment artifacts.
-* Any other "shared" deployment assets that are used across the engines or that pull the individual engines together into one deployment (e.g. a docker compose template). Those are placed in the root of the deployment profile.
-
-The templates are instantiated with the values from the physical plan documents and any values specified in the project configuration under `values`. In addition, the templating engine also provides these variables:
-- `config`: the full package.json file
-- `environment`: The current system environment.
-
-When building your own deployment profile, it is best to start with an existing profile and iterate from there.
-Take a look at the [default DataSQRL profiles](https://github.com/DataSQRL/sqrl/tree/main/profiles).
-
-Deployment profiles can be merged. This is useful to overwrite the (templated) deployment assets for one engine without affecting the other engines. Deployment profiles are merged on a per-folder level.
-Often it is sufficient to overwrite just one engine instead of defining a completely new deployment profile.
-To use such incremental deployment profiles, you specify the `profiles` field in the `package.json` as an array of deployment profiles that contains the base/default deployment profile and the incremental one.
-
-Deployment profiles can be uploaded to the repository and resolved as dependencies in DataSQRL projects with versioning for consistency, reuse, and collaboration.
\ No newline at end of file
diff --git a/docs/reference/sqrl/learn.md b/docs/reference/sqrl/learn.md
index bc7f5a308..11b0e5d58 100644
--- a/docs/reference/sqrl/learn.md
+++ b/docs/reference/sqrl/learn.md
@@ -99,7 +99,7 @@ Note, that `now()` is different from the standard SQL function `CURRENT_TIMESTAM
The timestamp of a stream table determines how stream records are associated with a point on the timeline and how now advances in the data pipeline.
-For stream tables that are imported from a data source, the timestamp is configured explicitly in the [source configuration](/docs/reference/sqrl/datasqrl-spec#tablejson).
+For stream tables that are imported from a data source, the timestamp is configured explicitly in the [source configuration](docs//sqrl/datasqrl-spec#tablejson).
### Time Synchronization
@@ -115,7 +115,7 @@ In DataSQRL, sources and sinks represent the endpoints of the data pipeline, int
Data sources and sinks are defined in configuration files that are contained in packages. The configuration specifies how to connect to and read from (or write to) the source (or sink).
-DataSQRL supports a lot of different data systems as sources and sinks, including Kafka, file system, object storage, Iceberg, Postgres, etc. Check out the [connectors](/docs/reference/sqrl/datasqrl-spec#tablejson) for all the data systems that DataSQRL can connect to.
+DataSQRL supports a lot of different data systems as sources and sinks, including Kafka, file system, object storage, Iceberg, Postgres, etc. Check out the [connectors](/docs/sqrl/datasqrl-spec#tablejson) for all the data systems that DataSQRL can connect to.
When you are first getting started on a new project with DataSQRL, the easiest way to add a data source is to export your data (or a subset) to a [JSON Lines](https://jsonlines.org/) (i.e. line delimited json) or CSV files.
@@ -168,7 +168,7 @@ OrdersByMonth := SELECT endOfmonth(p.time) AS month,
FROM Orders GROUP BY month;
```
-The annotation `EXEC(streams)` instructs the optimizer to compute the `OrdersByMonth` table in the `stream` engine. An engine with the name `stream` must be configured in the engines section of the [package configuration](/docs/reference/sqrl/datasqrl-spec).
+The annotation `EXEC(streams)` instructs the optimizer to compute the `OrdersByMonth` table in the `stream` engine. An engine with the name `stream` must be configured in the engines section of the [package configuration](/docs/sqrl/datasqrl-spec).
Similarly, the `EXEC(database)` annotation instructs the optimizer to choose the engine with the name `database`:
@@ -182,7 +182,7 @@ OrdersByMonth := SELECT endOfmonth(p.time) AS month,
## Overview of Integrated Engines
An **engine** is a system or technology that executes part of the data pipeline compiled by DataSQRL.
-Which engines DataSQRL compiles to is configured in the [package configuration](/docs/reference/sqrl/datasqrl-spec) which also defines the data pipeline architecture.
+Which engines DataSQRL compiles to is configured in the [package configuration](/docs/sqrl/datasqrl-spec) which also defines the data pipeline architecture.
DataSQRL supports 4 types of engines that play distinct roles in a data pipeline: stream engines, database engines, server engines, log engines, query engines.
@@ -257,7 +257,7 @@ DataSQRL supports multiple engines and data pipeline architectures. That means,
The figure shows a data pipeline architecture that consists of a Apache Kafka, Apache Flink, a database engine, and API server. Kafka holds the input and streaming data. Flink ingests the data, processes it, and writes the results to the database. The API server translates incoming requests into database queries and assembles the response from the returned query results.
-The data pipeline architecture and engines are configured in the [package configuration](/docs/reference/sqrl/datasqrl-spec). The DataSQRL command looks for a `package.json` configuration file in the directory where it is executed. Alternatively, the package configuration file can be provided as an argument via the `-c` option. Check out the [command line reference](../cli) for all command line options.
+The data pipeline architecture and engines are configured in the [package configuration](/docs/sqrl/datasqrl-spec). The DataSQRL command looks for a `package.json` configuration file in the directory where it is executed. Alternatively, the package configuration file can be provided as an argument via the `-c` option. Check out the [command line reference](/docs/sqrl/cli) for all command line options.
If no package configuration file is provided or found, DataSQRL generates a default package configuration with the example data pipeline architecture shown above and the following engines:
diff --git a/docs/reference/sqrl/sqrl-spec.md b/docs/reference/sqrl/sqrl-spec.md
deleted file mode 100644
index a2dff53fc..000000000
--- a/docs/reference/sqrl/sqrl-spec.md
+++ /dev/null
@@ -1,372 +0,0 @@
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-
-# SQRL Specification
-This is the specification for SQRL, a declarative SQL query language developed at DataSQRL for describing data pipelines. SQRL stands for *"**S**tructured **Q**uery and **R**eaction **L**anguage"* because it extends SQL with support for streaming data and the ability to react to data in realtime. In addition, SQRL adds a number of convenience features that make it development-friendly.
-
-## Table Type System
-In SQRL, every table is assigned a specific type that influences how queries interact with the data, the semantic validity of those queries, and how data is processed by different engines.
-
-SQRL recognizes several distinct table types, each with unique characteristics and use cases:
-- **STREAM**: Comprises a stream of immutable records, each identified by a synthetic primary key and timestamp. These tables are ideal for representing events or actions over time.
-- **VERSIONED_STATE**: Contains records with a natural primary key and a timestamp, tracking changes over time to each record, thereby creating a change-stream.
-- **STATE**: Similar to VERSIONED_STATE but without tracking the history of changes. Each record is uniquely identified by its natural primary key.
-- **LOOKUP**: Supports lookup operations using a primary key but does not allow further processing of the data.
-- **RELATION**: Represents relational data that lacks a primary key, timestamp, or explicit streaming semantics. It is used primarily for static relational data integration.
-- **STATIC**: Consists of data that does not change over time, such as constants, table functions, or nested data structures. This type is treated as universally valid across all time points.
-
-These table types will be used throughout this specification to further describe the semantics of sql queries.
-
-## Functions
-Functions in SQRL are designed to be engine-agnostic, ensuring that their implementation is consistent across different platforms and execution environments. This uniformity is crucial for maintaining the semantic integrity of functions when executed under various systems.
-
-**Characteristics of Functions**
-- **Engine Agnosticism**: Functions are defined in a way that does not depend on the specifics of the underlying engine.
-- **Semantic Consistency**: Regardless of the engine used, function should preserve their semantic meaning.
-- **Mixed Engine Support**: While functions are designed to be widely supported, some may have mixed support depending on the engine's capabilities.
-- **Nullability Awareness**: Functions in SQRL retain nullability information. This feature is vital for correct schema generation downstream, ensuring that data integrity is maintained through the potential propagation of null values.
-- **Time-Preserving Capabilities**: A significant feature of SQRL functions is their ability to handle time-based data efficiently. Time-preserving functions can manipulate and interpret timestamps in a way that aligns with the temporal dynamics of data streams.
-
-For example, a time-preserving function called 'endOfWeek' could be implemented to aggregate timestamps into time windows. Time windows are a means to divide time into discrete buckets and aggregate all stream records within each bucket to produce a new stream table that contains one row for each aggregate.
-```sql
-Users.spending := SELECT endOfWeek(p.time) AS week,
- sum(t.price) AS spend, sum(t.saving) AS saved
- FROM @.purchases p JOIN p.totals t
- GROUP BY week ORDER BY week DESC;
-```
-
-## Import
-*IMPORT qualifiedName (AS? alias=identifier)?*
-
-An import in SQRL describes a **table**, **function**, or other **sqrl script** to be added to the schema. Import paths use the dot character `.` to separate path components.
-
-```sql
-IMPORT datasqrl.seedshop.Orders;
-```
-Imports are intended to act much like language dependencies. The SQRL specification does not describe how imports are resolved and is up to the implementation.
-
-Imports can end with a `*` to import all items on that level of the qualified path.
-```sql
-IMPORT mypackage.*;
-```
-
-Imports can be aliased using the `AS` keyword. Imports that end with a `*` cannot be aliased.
-```sql
-IMPORT datasqrl.seedshop.Orders AS MyOrders;
-```
-
-## Export
-
-*EXPORT table=tablePath TO sink=qualifiedName;*
-
-The `EXPORT` statement is an explicit sink to a data system, like a kafka topic or database table. Import paths and export sink paths should be resolved the same way.
-
-```sql
-EXPORT UserPromotion TO mysink.promotion;
-```
-
-Export statements are most commonly used to export data to an external system, but it could refer to other components such as console log print statements or to a log engine. It is up to the underlying implementation to determine what modules are available by default for import and export paths.
-```sql
-EXPORT UserPromotion TO print.promotion;
-EXPORT UserPromotion TO log.promotion;
-```
-
-:::note
-Exports do not describe the connector mapping when an optimizer splits the workload between multiple engines.
-:::
-
-## Create Table
-```sql
-CREATE TABLE MyTable(
- myCol bigint,
- myCol2 bigint
-);
-```
-SQRL allows a 'create table' statement. Create table statements describe a table that the SQRL implementation provides storage for. The implementation can optionally bring additional fields, such as a primary key and timestamp.
-
-## Assignment operator
-The assignment operator `:=` is used to specify the structure and contents of the right-hand side to the left-hand side. This operation is akin to the 'CREATE VIEW' statement in conventional SQL.
-
-## Distinct
-```sql
-Products := DISTINCT Products ON id ORDER BY updated DESC;
-```
-
-Distinct statements in SQRL are designed to select the most recent version of each row based on a specified key, effectively implementing deduplication in streaming data or ensuring data uniqueness in database systems.
-
-## Queries
-```sql
-MyTable := SELECT * FROM Table;
-```
-
-**Naming and Selecting Columns**
-
-In SQRL, all columns must be explicitly named.
-```sql
-// Invalid
-MyTable := SELECT COUNT() FROM Table;
-
-// Valid
-MyTable := SELECT COUNT() AS cnt FROM Table;
-```
-
-**Table Shadowing**
-
-Tables can be shadowed, meaning a new table can be created with the same name as an existing one.
-
-```sql
-MyTable := SELECT * FROM MyTable;
-```
-
-**Hidden Tables and Columns**
-
-When the name of a table or column starts with the underscore character _, it is considered hidden. Hidden tables and columns are not exposed in the API or imported by other scripts.
-
-```sql
-_MyHiddenTable := SELECT * FROM MyTable WHERE ...;
-```
-
-Example of Hidden Column:
-
-```
-MyTable := SELECT id, _hiddenColumn FROM Table;
-```
-In this example, _hiddenColumn will not be exposed in the API.
-
-**Usage of Hidden Tables**
-
-Hidden tables are useful for intermediate calculations or data transformations that should not be accessible externally.
-
-```sql
-_tempData := SELECT * FROM MyTable WHERE condition;
-```
-In this example, _tempData is used for an internal operation and is not exposed.
-
-## Nested Query
-Nested tables represent parent-child relationships and simplify aggregations by parent rows.
-```
-MyTable.query := SELECT * FROM x;
-```
-
-We can query a nested table globally, i.e., over all rows in the table, or locally, i.e., only the rows associated with a given parent row.
-
-**Global Aggregation**
-```sql
-Order_totals := SELECT sum(total) as price,
- sum(coalesce(discount, 0.0)) as saving FROM Orders.items;
-```
-In this example, the Order_totals table contains a single aggregate that sums up the total and discount over all items in all orders. The result is one global aggregation over all order items.
-
-**Local Aggregation**
-```sql
-Orders.totals := SELECT sum(total) as price,
- sum(coalesce(discount, 0.0)) as saving FROM @.items;
-```
-This statement aggregates all items for each order. The result is one local aggregate for each row in the Orders table.
-
-**Difference Between Global and Local Aggregation**
-
-The difference between the two statements lies in the FROM clause. The first statement references the Orders.items table globally, meaning it considers all rows in the Orders.items table without any specific parent context.
-
-The second statement references the Orders.items table locally by accessing the items relationship column on Orders. This makes it a localized query, defining a new nested table totals under the Orders table. The query on the right-hand side of the statement is interpreted in the context of each row in the parent table. The at-sign @ is used to refer to the parent row in a localized query. Hence, @.items means "all items that are associated with the current order record through the items relationship".
-
-**Usage of Nested Table Definitions**
-
-Nested table definitions are a convenient way to express GROUP BY and WINDOW queries by grouping on the rows in the parent table. This allows for more intuitive and organized data aggregation, making it easier to manage complex data relationships and calculations.
-
-#### Join types
-SQRL provides additional join types outside of standard SQL:
-- Default join
-- Temporal join
-- Interval join
-
-**Default Join**
-
-A join without a qualifier is a default join. It is up to the implementation to decide the best join type given the conditions of the join.
-```sql
-DefaultJoinExample := SELECT * FROM TableA JOIN TableB ON TableA.id = TableB.id;
-```
-
-**Inner Join**
-
-Inner joins are explicit inner joins. In stream processing contexts, this can mean maintaining the state on both sides of the join to allow proper semantics, which can be expensive.
-```sql
-InnerJoinExample := SELECT * FROM TableA INNER JOIN TableB ON TableA.id = TableB.id;
-```
-
-**Left Join and Left Outer Join**
-
-In SQRL, a left join and a left outer join are distinct. Left joins can let the implementation decide the best join type. Left outer joins are explicit left joins.
-
-```sql
-LeftJoinExample := SELECT * FROM TableA LEFT JOIN TableB ON TableA.id = TableB.id;
-LeftOuterJoinExample := SELECT * FROM TableA LEFT OUTER JOIN TableB ON TableA.id = TableB.id;
-```
-
-**Temporal Join**
-
-SQRL supports temporal joins between stream and state tables when joining on the state table's key. Temporal joins use the row from the state table at the timestamp of the stream row.
-
-```sql
-TemporalJoinExample :=
- SELECT l.login_time, t.transaction_time, t.amount
- FROM Logins l
- TEMPORAL JOIN Transactions t
- ON l.account_id = t.account_id;
-```
-
-**Interval Join**
-
-Interval joins are defined by specifying upper and lower time bounds. The INTERVAL JOIN condition specifies that a transaction must occur within a specified time frame after a login to be included in the join.
-```sql
-CustomerActivity :=
- SELECT l.login_time, t.transaction_time, t.amount
- FROM Logins l
- INTERVAL JOIN Transactions t
- ON l.account_id = t.account_id
- AND t.transaction_time BETWEEN l.login_time AND l.login_time + INTERVAL '1' HOUR;
-```
-This correlates logins with transactions that happen shortly after, capturing a critical timeframe for activity analysis.
-
-**Example: Temporal Joins in E-commerce**
-
-Temporal joins are essential in SQRL as they define different semantics, joining stream tables with state tables. For example, in an e-commerce scenario, if the price changes on a product, you do not want to retroactively update already placed orders.
-```sql
-IMPORT ecommerce.Orders; // is a stream
-IMPORT ecommerce.Products; // is a stream
-
-VersionedProducts := DISTINCT Product ON productid; // converts to a versioned state table
-
-OrdersWithPrice :=
- SELECT *
- FROM Orders
- JOIN VersionedProducts; // join on stream and state becomes a temporal join
-```
-In this example, VersionedProducts becomes a versioned state table, and the join with Orders ensures that each order reflects the product price at the time of the order, not the current product price.
-
-## Expressions
-Expressions in SQRL allow you to define new columns based on calculations or transformations of existing columns.
-```sql
-Products.weight_in_oz := weight_in_gram / 28.35;
-```
-
-**Defining New Columns**
-
-This statement adds a new column weight_in_oz to the existing Products table, which converts the product weight to ounces. The column name is the last item in the table path.
-
-**Expression Constraints**
-
-Expressions cannot be shadowed. Once an expression is defined, it cannot be overridden or redefined in the same context.
-
-Once a table is queried, new columns cannot be added. Tables become immutable once referenced in a query.
-
-**Nested Expressions and Window Functions**
-
-Nested expressions can evaluate window functions, allowing for calculations over a set of rows related to the current row.
-
-```sql
-MyRow.num := RANK();
-```
-In this example, MyRow.num is defined using the RANK() window function, which assigns a rank to each row within the partition of the dataset.
-
-## Relationships
-Relationships in SQRL make data relationships explicit, simplify joins, and allow API consumers to navigate through the data efficiently.
-
-**Defining Relationships**
-
-A relationship is defined using a JOIN expression, interpreted for each row of the table on which the relationship is defined. The at-sign @ refers to each row.
-
-```sql
-Users.purchases := JOIN Orders ON Orders.customerid = @.id;
-```
-This statement defines a purchases column in the Users table, relating each user to their corresponding orders where customerid matches the user's id.
-
-**Benefits**
-- **Simplifies Joins**: Avoids repetitive join statements.
-- **Explicit Relationships**: Makes data relationships clear and easy to follow
-
-**Using Relationships in Queries**
-```sql
-Users.spending :=
- SELECT endOfWeek(p.time) AS week,
- sum(t.price) AS spend,
- sum(t.saving) AS saved
- FROM @.purchases p
- JOIN p.totals t
- GROUP BY week
- ORDER BY week DESC;
-```
-This example defines a spending nested table under Users, aggregating order totals for all purchases of each user. The FROM @.purchases expands to FROM @ JOIN Orders p ON p.customerid = @.id.
-
-## Parameters
-Join declarations and tables in SQRL support parameters, allowing for dynamic queries.
-
-Parameters use the SQL variable syntax, the `@` followed by the variable name. This is not to be confused with localized queries
-
-```sql
-MyTable.byId(@val: BIGINT) := JOIN Table t ON t.id = @.id AND @val > 50;
-MyTableById(@id: STRING) := SELECT * FROM Table t WHERE t.id = @id;
-```
-
-**Arguments**
-
-Arguments may be provided in any syntactic order and maintain identical semantic meaning.
-
-
-Parameterized queries are useful when describing different views of a table.
-```sql
-ProductById(@ProductID: String) :=
- SELECT * FROM UniqueInventory
- WHERE ProductID = @ProductID;
-```
-
-## Table Paths
-In SQRL, table paths are traversed like a graph, not as subschemas.
-
-```sql
-Users.spending := SELECT endOfWeek(p.time) AS week,
- sum(t.price) AS spend, sum(t.saving) AS saved
- FROM @.purchases p JOIN p.totals t
- GROUP BY week ORDER BY week DESC;
-```
-This statement defines a nested table `spending` underneath `Users` which aggregates over the nested order `totals` for all purchases of each user. Relationships used in `FROM` and `JOIN` are expanded to their original definition. That means, `FROM @.purchases` gets expanded to `FROM @ JOIN Orders p ON p.customerid = @.id`.
-
-
-## Comments
-SQRL supports the use of comments within the code to provide hints, enhance readability, provide documentation, and explain the logic of complex queries or operations.
-
-SQRL supports two types of comments:
-
-**Single-line Comments**: These are initiated with double forward slashes (//). Everything following the // on the same line is considered part of the comment.
-```sql
-// This is a single-line comment explaining the next SQL command
-IMPORT data.SalesRecords;
-```
-
-**Multi-line Comments**: These are enclosed between /* and */. Everything within these markers is treated as a comment, regardless of the number of lines. T
-```sql
-/*
- * This is a multi-line comment.
- * It can span multiple lines and is often used to comment out
- * chunks of code or to provide detailed documentation.
- */
-IMPORT data.StockAdjustments;
-```
-
-## Hints
-Hints are included within multi-line comments and are prefixed with a plus sign (+) followed by the hint type and optional parameters. These hints do not alter the SQL syntax but suggest how the underlying engine should treat the subsequent SQL commands. Hints are placed above assignment statements.
-
-```sql
-// The below hint suggests that the following query should be executed on the stream engine.
-/* +exec(streams) */
-MyTable := SELECT * FROM InventoryStream;
-```
diff --git a/package.json b/package.json
index 59f3a2bbc..eaceaadb5 100644
--- a/package.json
+++ b/package.json
@@ -3,6 +3,8 @@
"version": "0.0.0",
"private": true,
"scripts": {
+ "prestart": "mkdir -p docs/sqrl && cp -f -R sqrl/docs/* docs/sqrl/",
+ "prebuild": "mkdir -p docs/sqrl && cp -f -R sqrl/docs/* docs/sqrl/",
"docusaurus": "docusaurus",
"start": "docusaurus start",
"build": "docusaurus build",
diff --git a/sidebars.js b/sidebars.js
index 02165ec0e..296e12ab0 100644
--- a/sidebars.js
+++ b/sidebars.js
@@ -98,13 +98,13 @@ const sidebars = {
collapsed: false,
link: {
type: 'doc',
- id: 'reference/sqrl/sqrl-spec',
+ id: 'sqrl/sqrl-spec',
},
items: [
- 'reference/sqrl/sqrl-spec',
- 'reference/sqrl/cli',
- 'reference/sqrl/connectors',
- 'reference/sqrl/datasqrl-spec',
+ 'sqrl/sqrl-spec',
+ 'sqrl/cli',
+ 'sqrl/connectors',
+ 'sqrl/datasqrl-spec',
// 'reference/sources/add-source',
// 'reference/sources/schema',
@@ -131,7 +131,7 @@ const sidebars = {
'reference/sqrl/functions/custom-functions',
],
},
- 'reference/sqrl/deployments',
+ 'sqrl/deployments',
// 'dev/roadmap',
diff --git a/sqrl b/sqrl
new file mode 160000
index 000000000..6b938bb43
--- /dev/null
+++ b/sqrl
@@ -0,0 +1 @@
+Subproject commit 6b938bb430c87bdeff4bc6af0313bc62ea47fd6a