gchq
diff --git a/‎docs/deployment-guide.md‎
Lines changed: 41 additions & 253 deletions b/‎docs/deployment-guide.md‎
Lines changed: 41 additions & 253 deletions
diff --git a/‎docs/deployment/deploy-with-cdk.md‎
Lines changed: 136 additions & 0 deletions b/‎docs/deployment/deploy-with-cdk.md‎
Lines changed: 136 additions & 0 deletions
diff --git a/‎docs/deployment/docker-images.md‎
Lines changed: 138 additions & 0 deletions b/‎docs/deployment/docker-images.md‎
Lines changed: 138 additions & 0 deletions
diff --git a/‎docs/deployment/images-to-upload.md‎
Lines changed: 0 additions & 44 deletions b/‎docs/deployment/images-to-upload.md‎
Lines changed: 0 additions & 44 deletions
@@ -0,0 +1,136 @@
+Deployment with the CDK
+=======================
+
+Sleeper is deployed with the AWS Cloud Development Kit (CDK). This can be done either with scripts as described in
+the [deployment guide](../deployment-guide.md#scripted-deployment), or by using the CDK directly. This document covers
+deployment using the CDK CLI directly.
+
+### Uploading artefacts to AWS
+
+Some jars and Docker images must be uploaded to AWS before you can deploy an instance of Sleeper. We have a CDK app
+`SleeperArtefactsCdkApp` which creates an S3 bucket and ECR repositories to hold these artefacts, but does not
+upload the artefacts. You can also include this in your own CDK app with `SleeperArtefacts`. You can use our tools to
+upload the artefacts as a separate step, or implement your own way to do this that may be specific to your Maven and
+Docker repositories.
+
+The scripted deployment uploads the jars from the local `scripts/jars` directory within the Git repository. The Docker
+images are either built from the local `scripts/docker` directory or pulled from a remote repository if that is
+configured. You could replicate that behaviour yourself with the script `scripts/deploy/uploadArtefacts.sh`, or use our
+Java classes `SyncJars` and `UploadDockerImagesToEcr`, or implement your own way to upload these artefacts.
+
+As part of `scripts/build/build.sh`, the jars are built and output to `scripts/jars`, and the Docker builds are prepared
+in separate directories for each Docker image under `scripts/docker`. You can also use
+our [publishing tools](../development/publishing.md) to prepare the artefacts.
+
+It's important to upload artefacts from within AWS to avoid lengthy uploads into AWS. Usually this is done from an EC2
+instance.
+
+#### `uploadArtefacts.sh`
+
+This script can upload artefacts to an existing CDK deployment. You can either pass in the deployment ID that you used
+for the CDK deployment, or pass in an instance properties file for an instance that is configured to use that artefacts
+deployment. In the latter case, Docker images will only be uploaded if they are required with your instance
+configuration. Run `uploadArtefacts.sh --help` for details.
+
+By default, the artefacts deployment ID should match the instance ID. Alternatively, you can set the deployment ID in
+the instance property [`sleeper.artefacts.deployment`](../usage/properties/instance/user/common.md).
+
+Here's an example with a CDK command to create an artefacts deployment, and a call to the script to upload all artefacts
+to that deployment:
+
+```bash
+DEPLOYMENT_ID=my-deployment
+cdk deploy --all -c id=$DEPLOYMENT_ID -a "java -cp ./scripts/jars/cdk-<version>.jar sleeper.cdk.SleeperArtefactsCdkApp"
+./scripts/deploy/uploadArtefacts.sh --id $DEPLOYMENT_ID
+```
+
+#### Direct upload
+
+If you prefer to implement this yourself, details of Docker images to be uploaded can be
+found [here](/docs/deployment/docker-images.md). That document includes details of how to build and push the images to
+ECR, as it is done by the automated scripts.
+
+You'll also need to create an S3 bucket for jars, and upload the contents of the `scripts/jars` directory to it. That
+directory is created during a build, or during installation of a published version. The jars S3 bucket needs to have
+versioning enabled so we can tie a CDK deployment to specific versions of each jar.
+
+When not using an artefacts CDK deployment, you can set the instance properties `sleeper.jars.bucket`
+and `sleeper.ecr.repository.prefix` instead of `sleeper.artefacts.deployment`.
+
+### Including Sleeper in your CDK app
+
+Sleeper supports deployment as part of your own CDK app, either as its own stack or as a nested stack under your stack.
+If you have published Sleeper to a Maven repository as described in the [publishing guide](../development/publishing.md)
+you can add the Sleeper CDK module as a Maven artefact like this:
+
+```xml
+<dependency>
+    <groupId>sleeper</groupId>
+    <artifactId>cdk</artifactId>
+    <version>version.number.here</version>
+</dependency>
+```
+
+Use the class `SleeperInstance` to add instances of Sleeper to your app. To load instance and table properties from
+the local file system you can use `DeployInstanceConfiguration.fromLocalConfiguration`. Here's an example:
+
+```java
+Stack stack = Stack.Builder.create(app, "MyStack")
+        .stackName("my-stack")
+        .env(environment)
+        .build();
+SleeperInstanceConfiguration myInstanceConfig = SleeperInstanceConfiguration.fromLocalConfiguration(
+        workingDir.resolve("my-instance/instance.properties"));
+SleeperInstance.createAsNestedStack(stack, "MyInstance",
+        NestedStackProps.builder()
+                .description("My instance")
+                .build(),
+        SleeperInstanceProps.builder(myInstanceConfig, s3Client, dynamoClient)
+                .deployPaused(false)
+                .build());
+```
+
+### Using the CDK CLI
+
+To deploy a Sleeper instance to AWS with the CDK, you need an [instance configuration](instance-configuration.md) and
+a [suitable environment](environment-setup.md). The artefacts will need to be uploaded as described in the section
+above. You can either use the instance ID as the deployment ID for the artefacts, or you can point to your artefacts
+deployment with the instance property `sleeper.artefacts.deployment`.
+
+You can use the same CDK apps used by the automated scripts, or your own CDK configuration. We'll give examples with the
+CDK apps used by the automated scripts. The following commands will deploy a Sleeper instance:
+
+```bash
+INSTANCE_PROPERTIES=/path/to/instance.properties
+SCRIPTS_DIR=./scripts # This is from the root of the Sleeper Git repository
+VERSION=$(cat "$SCRIPTS_DIR/templates/version.txt")
+cdk deploy --all -c propertiesfile=$INSTANCE_PROPERTIES -c newinstance=true -a "java -cp $SCRIPTS_DIR/jars/cdk-$VERSION.jar sleeper.cdk.SleeperCdkApp"
+```
+
+To avoid having to explicitly give approval for deploying all the stacks, you can add "--require-approval never" to the
+command.
+
+If you'd like to include data generation for system tests, use the system test CDK app instead.
+
+```bash
+INSTANCE_PROPERTIES=/path/to/instance.properties
+SCRIPTS_DIR=./scripts # This is from the root of the Sleeper Git repository
+VERSION=$(cat "$SCRIPTS_DIR/templates/version.txt")
+cdk deploy --all -c propertiesfile=$INSTANCE_PROPERTIES -c newinstance=true -a "java -cp $SCRIPTS_DIR/jars/system-test-$VERSION-utility.jar sleeper.systemtest.cdk.SystemTestApp"
+```
+
+#### Tear down
+
+If the artefacts and the Sleeper instance are each deployed in their own CDK app, with `SleeperArtefactsCdkApp` and
+`SleeperCdkApp`, you can tear down an instance of Sleeper either by deleting the CloudFormation stacks, or with the CDK
+CLI. You may need to delete the Sleeper instance before deleting the artefacts used to deploy it. Here's an example:
+
+```bash
+INSTANCE_PROPERTIES=/path/to/instance.properties
+ID=my-instance-id
+SCRIPTS_DIR=./scripts # From the root of the Sleeper Git repository
+VERSION=$(cat "$SCRIPTS_DIR/templates/version.txt")
+
+cdk destroy --all -c propertiesfile=$INSTANCE_PROPERTIES -c validate=false -a "java -cp $SCRIPTS_DIR/jars/cdk-$VERSION.jar sleeper.cdk.SleeperCdkApp"
+cdk destroy --all -c id=$ID -a "java -cp $SCRIPTS_DIR/jars/cdk-$VERSION.jar sleeper.cdk.SleeperArtefactsCdkApp"
+```
@@ -0,0 +1,138 @@
+Docker images deployed in Sleeper
+=================================
+
+A deployment of Sleeper includes components that run in Docker containers. This document lists the Docker images that
+are used in Sleeper, how to build them, and how to make them available for deployment.
+
+The easiest way to build and deploy these images is with our automated scripts. See
+the [deployment guide](../deployment-guide.md) and [deployment with the CDK](./deploy-with-cdk.md) for more information.
+The information below may be useful if you prefer to replicate this yourself.
+
+## Docker deployment images
+
+A build of Sleeper outputs several directories under `scripts/docker`. Each is the directory to build a Docker image,
+with a Dockerfile. Some of these are used for parts of Sleeper that are always deployed from Docker images, and those
+are listed here.
+
+* Deployment name - This is both the name of its directory under `scripts/docker`, and the name of the image when it's
+  built and the repository it's uploaded to.
+* Optional Stack - They're each associated with an optional stack, and will only be used when that optional stack is
+  deployed in an instance of Sleeper.
+* Multiplatform - Compaction job execution is built as a multiplatform image, so it can be deployed in both x86 and ARM
+  architectures.
+
+| Deployment Name            | Optional Stack     | Multiplatform |
+|----------------------------|--------------------|---------------|
+| ingest                     | IngestStack        | false         |
+| bulk-import-runner         | EksBulkImportStack | false         |
+| compaction-job-execution   | CompactionStack    | true          |
+| bulk-export-task-execution | BulkExportStack    | false         |
+
+
+## Lambda images
+
+Most lambdas are usually deployed from a jar in the jars bucket. Some need to be deployed as a Docker container, as
+there's a limit on the size of a jar that can be deployed as a lambda. We also have an option to deploy all lambdas as
+Docker containers as well.
+
+All lambda Docker images are built from the Docker build directory that's output during a build of Sleeper
+at `scripts/docker/lambda`. To build a Docker image for a lambda, we copy its jar file from `scripts/jars`
+to `scripts/docker/lambda/lambda.jar`, and then run the Docker build for that directory. This results in a separate
+Docker image for each lambda jar.
+
+* Filename - This is the name of the jar file that's output by the build in `scripts/jars`. It includes the version
+  number you've built, which we've included as a placeholder here.
+* Image name - This is the name of the Docker image that's built, and the name of the repository it's uploaded to.
+* Always Docker deploy - This means that that lambda will always be deployed with Docker, usually because the jar is too
+  large to deploy directly.
+
+| Filename                                            | Image Name                        | Always Docker deploy |
+|-----------------------------------------------------|-----------------------------------|----------------------|
+| athena-`<version-number>`.jar                       | athena-lambda                     | true                 |
+| bulk-import-starter-`<version-number>`.jar          | bulk-import-starter-lambda        | false                |
+| bulk-export-planner-`<version-number>`.jar          | bulk-export-planner               | false                |
+| bulk-export-task-creator-`<version-number>`.jar     | bulk-export-task-creator          | false                |
+| ingest-taskrunner-`<version-number>`.jar            | ingest-task-creator-lambda        | false                |
+| ingest-batcher-submitter-`<version-number>`.jar     | ingest-batcher-submitter-lambda   | false                |
+| ingest-batcher-job-creator-`<version-number>`.jar   | ingest-batcher-job-creator-lambda | false                |
+| lambda-garbagecollector-`<version-number>`.jar      | garbage-collector-lambda          | false                |
+| lambda-jobSpecCreationLambda-`<version-number>`.jar | compaction-job-creator-lambda     | false                |
+| runningjobs-`<version-number>`.jar                  | compaction-task-creator-lambda    | false                |
+| lambda-splitter-`<version-number>`.jar              | partition-splitter-lambda         | false                |
+| query-`<version-number>`.jar                        | query-lambda                      | true                 |
+| cdk-custom-resources-`<version-number>`.jar         | custom-resources-lambda           | false                |
+| metrics-`<version-number>`.jar                      | metrics-lambda                    | false                |
+| statestore-lambda-`<version-number>`.jar            | statestore-lambda                 | false                |
+
+
+## Building and pushing
+
+See the [deployment guide](../deployment-guide.md) and [deployment with the CDK](./deploy-with-cdk.md) for information
+on available scripts and code to automate building of these images. This is done automatically in any of the deployment
+scripts. We'll look at some examples of how to match the behaviour of those scripts.
+
+We'll start by creating some environment variables for convenience:
+
+```bash
+INSTANCE_ID=<insert-a-unique-id-for-the-sleeper-instance-here>
+ACCOUNT=<your-account-id>
+REGION=eu-west-2
+DOCKER_REGISTRY=$ACCOUNT.dkr.ecr.$REGION.amazonaws.com
+REPO_PREFIX=${DOCKER_REGISTRY}/${INSTANCE_ID}
+SCRIPTS_DIR=./scripts # This is from the root of the Sleeper Git repository
+VERSION=$(cat "$SCRIPTS_DIR/templates/version.txt")
+```
+
+Then log into ECR:
+
+```bash
+aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $DOCKER_REGISTRY
+```
+
+The value of the REPO_PREFIX environment variable could later be used as the value of the instance
+property [`sleeper.ecr.repository.prefix`](../usage/properties/instance/user/common.md).
+
+### Docker deployments
+
+Here's an example of commands to build and push a non-multiplatform image from the `scripts/docker` directory:
+
+```bash
+TAG=$REPO_PREFIX/ingest:$VERSION
+aws ecr create-repository --repository-name $INSTANCE_ID/ingest
+docker build -t $TAG $SCRIPTS_DIR/docker/ingest
+docker push $TAG
+```
+
+### Multiplatform images
+
+For a multiplatform image, e.g. to run on AWS Graviton on the ARM64 architecture, we need a Docker builder suitable for
+this.
+
+These commands will create or recreate a builder:
+
+```bash
+docker buildx rm sleeper || true
+docker buildx create --name sleeper --use
+```
+
+This also requires a slightly different command to build and push. This must be done as a single command as the builder
+does not automatically add the image to the Docker Engine image store:
+
+```bash
+TAG=$REPO_PREFIX/ingest:$VERSION
+aws ecr create-repository --repository-name $INSTANCE_ID/compaction-job-execution
+docker buildx build --platform linux/amd64,linux/arm64 -t $TAG --push $SCRIPTS_DIR/docker/compaction-job-execution
+```
+
+### Lambdas
+
+For a lambda the jar must be copied into the build directory before the build. Provenance must also be disabled for the
+image to be supported by AWS Lambda. Here's an example:
+
+```bash
+TAG=$REPO_PREFIX/query-lambda:$VERSION
+aws ecr create-repository --repository-name $INSTANCE_ID/query-lambda
+cp $SCRIPTS_DIR/jars/query-$VERSION.jar $SCRIPTS_DIR/docker/lambda/lambda.jar
+docker build --provenance=false -t $TAG $SCRIPTS_DIR/docker/query-lambda
+docker push $TAG
+```