cloudwise
is Surreal’s cloud infrastructure provisioner based on
Terraform. Surreal’s website and
github.
It prepares a kubernetes cluster using terraform. It generates
.tf.json
files that are also recognized by
Symphony.
- Cloud wise runs in python 3
- Do
git clone [email protected]:SurrealAI/cloudwise.git && cd cloudwise
- Run
pip install -e .
in this directory. - Install
terraform
following instructions here - Install
kubectl
following instructions here
- (Optional, Recommended) Create and work in a clean directory as running terraform would generate relevant files.
> mkdir surreal
> cd surreal
You first need to setup credentials for
terraform
to access google cloud. See guide here. Choose one of the two methods:- Run the following command
gcloud auth application-default login
or
- Go to the api key management page
https://console.cloud.google.com/apis/credentials/serviceaccountkey
and select Create new service account. You would need to give
the service account sufficient permissions to do things properly.
Project editor would suffice but is also more than enough. You
can then generate and download the key, (json format is fine).
Put the path to the
.json
file into the commandline argument when prompted.
Follow the instructions in the commandline tool.
> cloudwise-gke
It will provide instructions and generate a <cluster_name>.tf.json
file which terraform recognizes. If you have generated a .json
credential file, you should provide it when prompted. *
terraform init && terraform plan
describes changes to be made. *
terraform apply
makes the changes to your cloud project. * After
cluster creation, obtain credentials for kubectl.
> gcloud container clusters get-credentials <cluster_name>
- If you have GPUs in your cluster, create the daemon set to install drivers, see documentation.
> kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/nvidia-driver-installer/cos/daemonset-preloaded.yaml
- The generated
<cluster_name>.tf.json
is also recognized by Symphony’s scheduling mechanism andSurreal
. So you may want to link to it - If you want to remove everything, run
terraform destroy
Stay tuned
Stay tuned
- Terraform install fails.
- If you are seeing error:
... API has not been used in project...
: duringterraform apply
, go to the Kubernetes Engine tab and/or Compute Engine tab on your google cloud console to enable their APIs.
- If you are seeing error:
- GPU nodes are not scaling up.
- Check if the driver installation daemon set is running (see documentation).