The goal of this project is to assess which query engines can realistically run inside cloud functions (in particular AWS Lambda) and have a first feeling about their performances in this highly constrained environment.
We want to provide an accurate and interactive representation of our experimental results. We believe that this is best achieved through open interactive dashboards. This work is still work in progress, feel free to play with it and give us your feedback!
- NYC Taxi Parquet GROUP BY duration of various engines in AWS Lambda
- AWS Lambda scale up duration by payload and function size
The l12n-shell provides a way to run all commands in an isolated Docker
environement. It is not strictly necessary, but simplifies the collaboration on
the project. To set it up:
- you must have a recent version (v20+) of Docker installed, it is the only dependency
- clone this repository:
git clone https://github.com/cloudfuse-io/lambdatization
- add the l12n-shell to your path (optional)
sudo ln -s $(pwd)/lambdatization/l12n-shell /usr/local/bin/l12n-shell
- run
L12N_BUILD=1 l12n-shell:- the
L12N_BUILDenvironment variable indicates to thel12n-shellscript that it needs to build the image. l12n-shelloperates in the current directory to:- look for a
.envfile to source configurations from (see configuration section below). - stores the terraform state if the local backend is used.
- store the terraform data, i.e the cache data generated by
terraform init.
- look for a
- the
l12n-shellwithout any argument runs an interactive bash terminal in the CLI container. Note that the.envfile is loaded only once when thel12n-shellis started. 12n-shell cmdandecho "cmd" | l12n-shellboth runcmdin thel12n-shell.
- the
Note:
l12n-shellonly supports amd64 for now- it is actively tested on Linux only
l12n-shell can be configured through environement variables or a .env file
in the current directory:
L12N_PLUGINSis a comma seprated list of plugins to activateL12N_AWS_REGIONis the region where the stack should run
You can also provide the usual AWS variables:
AWS_PROFILEAWS_SHARED_CREDENTIALS_FILEAWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY
You might also want to verify you "Concurrent executions" quota for Lambda in your AWS account and ask for an increase if required.
If you want to use Terraform Cloud as a backend instead of local, set
TF_STATE_BACKEND=cloud. You should then also configure:
TF_ORGANIZATION, the name of an existing organization in your Terraform Cloud account.TF_API_TOKEN, a Terraform Cloud user token.TF_WORKSPACE_PREFIX, a prefix shared by all workspaces. Should contain only alphanumeric or-characters (e.gTF_WORKSPACE_PREFIX=l12n-dev-).- Add the
tfcloudplugin to theL12N_PLUGINSlist to enable thel12n tfcloud.configcommand. This will help you automatically configure the workspaces for all your active plugins with the right settings and credentials.
Note Environment variables will take precedence over the
.envfile
For better analysis of the proxying components, you can setup any observability backend compatible with the OpenTelemetry Protocol (OTLP) over the http protocol. We recommend in particular Grafana Cloud which has a generous Free Tier and a nice interface.
L12N_CHAPPY_OPENTELEMETRY_URL=https://otlp-gateway-{$grafana_region}.grafana.net/otlp/v1/traces
L12N_CHAPPY_OPENTELEMETRY_AUTHORIZATION="Basic {echo -n "$instance_id:$api_key" | base64}"Where:
grafana_regionis the region of your Grafana Cloud instance, e.gprod-us-east-0instance_idcan be obtained from the detail page of your Grafana Cloud instanceapi_keyis a Grafana Cloud api key with MetricsPublisher rolebase64(instance_id:api_key)is the base64 encoding of the two variables above separated by:
You can also try out [Aspecto][https://www.aspecto.io/] which has pretty similar capabilities and a very easy setup.
L12N_CHAPPY_OPENTELEMETRY_URL=https://otelcol.aspecto.io/v1/traces
L12N_CHAPPY_OPENTELEMETRY_AUTHORIZATION=aspecto_keyInside the l12n-shell, you can use the following commands:
l12n -hto see all the available commandsl12n deploy -awill run the terraform scripts and deploy the necessary resources (buckets, functions, roles...)l12n destroy -ato tear down the infrastructure and clean up your AWS accountl12n dockerized -e engine_nameruns a preconfigured query in the dockerized version of the specified engine locally. It requires the core module to be deployed to have access to the datal12n run-lambda -e engine_name -c sql_queryruns the specified sql query on the given engine- you can also run pre-configured queries using the examples. Run
l12n -hto see the list of examples.
- you can also run pre-configured queries using the examples. Run
Infrastructure is managed by Terraform.
We use Terragrunt to:
- DRY the Terraform config
- Manage dependencies between modules and allow a plugin based structure.
We are actively monitoring CDK for Terraform and plan to migrate the infrastructure scripts once the tool becomes sufficiently mature (e.g reaches v1).
- We follow the conventional commits standard with this list of types.
- We use the following linters:
- black for Python
- isort for Python imports
- yamllint
- markdownlint