-
Notifications
You must be signed in to change notification settings - Fork 2
Software components
The following contains detailed information on the installed cluster components, i.e. Rucio, Reana, Jupyterhub, Dask, CVMFS.
Rucio is installed via its helm chart through IaC (terraform). The CERN VRE tf
and values
YAML files can be used as a template for the initial setup. The helm charts should be installed one after the other, first the Rucio server, then daemons and then the UI and probes if required. Note that some secrets containing host certificates for the servers need to be applied to the cluster BEFORE installing the helm charts.
Some secrets need to be created before applying the Rucio Helm charts via Terraform. A script with instructions can be found here.
In order to generate the host certificates, head to the CERN CA website and create new grid host certificates for the main server, for the auth server, for the webui, specifying your desired DAN names (ours are vre-rucio.cern.ch, vre-rucio-auth.cern.ch, vre-rucio-ui.cern.ch).
The secrets will be encrypted with sealed-secrets. The CERN Openstack nginx-ingress-controller
pod in the kube-system
namespace has a validatingwebhookconfiguration
named ingress-nginx-admission
that needs to be deleted in order for the nginx ingress controller to be able to reach the K8s API.
The way to install the CA certificates in a persistant and updated way is the following
> vi /etc/yum.repos.d/linuxsupport7s-stable.repo
> yum install -y CERN-CA-certs
with
> cat linuxsupport7s-stable.repo
# Example modified for cc7 taken from https://gitlab.cern.ch/linuxsupport/rpmci/-/blob/master/kojicli/linuxsupport8s-stable.repo
[linuxsupport7s-stable]
name=linuxsupport [stable]
baseurl=https://linuxsoft.cern.ch/cern/centos/7/cern/$basearch
enabled=1
gpgcheck=False
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-koji file:///etc/pki/rpm-gpg/RPM-GPG-KEY-kojiv2
priority=1
protect=1
what adds a CERN-bundle.pem
file (among others) into the /etc/pki/tls/certs/
directory.
rpm
s cannot be installed on apt
based systems. Thus add the "bundle" file manually.
> curl -fsSL 'https://cafiles.cern.ch/cafiles/certificates/CERN%20Root%20Certification%20Authority%202.crt' | openssl x509 -inform DER -out /tmp/cernrootca2.crt
> curl -fsSL 'https://cafiles.cern.ch/cafiles/certificates/CERN%20Grid%20Certification%20Authority(1).crt' -o /tmp/cerngridca.crt
> curl -fsSL 'https://cafiles.cern.ch/cafiles/certificates/CERN%20Certification%20Authority(2).crt' -o /tmp/cernca.crt
> mv /tmp/cernrootca2.crt /tmp/cerngridca.crt /tmp/cernca.crt /usr/local/share/ca-certificates/
update-ca-certificates
# Move the files anywhere or merge them into a single file. For example;
# cat cernrootca2.crt >> /certs/rucio_ca.crt
# cat cerngridca.crt >> /certs/rucio_ca.crt
# cat cernca.crt >> /certs/rucio_ca.crt
For info, the update-ca-certificates
command updates the /etc/ssl/certs
(command description).
Both methods provide the same output file. However, the CERN-ca-bundle.pem
, contains a fourth and extra certificate.
- requesting a DBOD instance at CERN (postgres better than Oracle)
- configure psql to connect and mange it
- In order to pass the database connection string to Rucio, we need to export it as a variable. Therefore, run this command locally:
$export DB_CONNECT_STRING="<set manually>"
Create a secret named ${helm_release}db-secret
in the cluster.
- Bootstrapping of the database with the Rucio DB init container.
keep in mind that when installing the daemons chart, the DB will be connected to different services. In the case of the Database On Demand at CERN, the maximum number of DB connections is limited to 100, so you have to set the limit on the database.pool_size.
In order to perform a major upgrade (i.e. 1.30.0 --> 1.31.0), you will need to run manually an alembic upgrade of the DB. Go to section 9 of the WIKI to know more about this.
Creation of FTS either long proxy or cert and key cluster secrets for the FTS renewal cronjob to run and create the required x509 proxy
look at this issue that automatises the process and requires you to provide the certificate cert, key and password in order to automatise the proxy creation and inject it into the cluster.
It will be necessary to request a ROBOT certificate from a service account to delegate the proxy to FTS. This is much better than delegating FTS transfers to a single user account. In order to request a Grid Robot Certificate, follow the instructions at the bottom of https://ca.cern.ch/ca/Help/?kbid=021003, separate the certificate into hostcert.pem
and hostkey.pem
and create the <release>-fts-cert
and <release>-fts-key
secrets.
They will be used by the fts-cron container to generate the proxy.
Apply the Rucio helm charts by providing your specific values.yaml files stored in the /rucio repo.
Once the Rucio helm charts are applied, the rucio-server and rucio-server-auth will be created. They are services of type Loadbalancer, accessible from outside the cluster. You can inspect them with:
kubectl get service servers-vre-rucio-server -n rucio-vre
The external IP address is created once the chart gets applied to the cluster. Once the IP is created, you can inspect it on CERN's aiadm
with the command openstack loadbalancer list
. You will then have to add a description to the loadbalancer, and a tag to it in order for it to be reachable via a DNS name (i.e. vre-rucio.cern.ch
).
# backlog: this option uses the CERN's loadbalancer as a service
# set a description
openstack loadbalancer set --description "vre-rucio.cern.ch" $LB_ID_MAIN
openstack loadbalancer set --description "vre-rucio-auth.cern.ch" $LB_ID_AUTH
openstack loadbalancer set --description "vre-rucio-ui.cern.ch" $LB_ID_UI
#set a tag
openstack loadbalancer set --tag landb-alias=vre-rucio $LB_ID_MAIN
openstack loadbalancer set --tag landb-alias=vre-rucio-auth $LB_ID_AUTH
openstack loadbalancer set --tag landb-alias=vre-rucio-ui $LB_ID_UI
Afterwards, open the firewall for that loadbalancer service to be accessible from the outside world, and not only from CERN. Use this link: https://landb.cern.ch/portal/firewall.
Connection to any third-party authentication service, in this case the ESCAPE-IAM instance. The authentication is managed through OIDC tokens. To achieve the users' authentication to the VRE Rucio instance, a connection between IAM and Rucio needs to be achieved. We are managing the Rucio instance, while the IAM instance is managed by CNAF admins. We suggest to get familiar with OAuth2 and OIDC tokens by going through this presentation on tokens in the Rucio framework.
The set-up is straight forward. Before starting, you need to have:
- The Rucio servers (main+auth) running, as described in the Rucio server chart. Apply these to your K8s cluster via Terraform and check that the service gets created correctly.
- Your Rucio DB needs to be already initiated correctly.
Once you have this ready, register two new clients from the MitreID IAM bashboard, as described in the Rucio documentation.
You can name them however you want, ours are:
-
cern-vre-rucio-admin
is the** ADMIN** client. -
cern-vre-rucio-auth
is the **AUTH **client. For each of them, you have aclient-id
and aclient-secret
, and you can generate a 'registration access token'. Create a newidpsecret.json
file and populate it as follows:
{
"escape": {
"issuer": "https://iam-escape.cloud.cnaf.infn.it/",
"redirect_uris": [
"https://vre-rucio-auth.cern.ch/auth/oidc_code",
"https://vre-rucio-auth.cern.ch/auth/oidc_token"
],
"client_id": "<AUTH-client-id>",
"registration_access_token": "<**AUTH**-client-token>",
"client_secret": "<AUTH-client-secret>",
"SCIM": {
"client_id": "<ADMIN-client-id>",
"grant_type": "client_credentials",
"registration_access_token": "<ADMIN-client-token>",
"client_secret": "<ADMIN-client-secret>"
}
}
}
After having injected this secret as a helm_release_name
-idpsecrets as stated here, you need to add the config.oidc
and the additionalSecrets.idpsecrets
sections in the values.yaml of the server and daemons, if you haven't already done so.
Last step is to run the iam-rucio-sync.py script, ideally as a cronjob, in a container that has both server and client modules installed. This will populate the accounts
table of the DB with all the IAM ESCAPE accounts. You can test run it from the Rucio server pod by entering into the shell and executing the code as described in the containers repo.
If the synchronisation gives problems, execute the mapping manually:
rucio-admin identity add --account <account_name> --type OIDC --id "SUB=<look-in-IAM-user-info>, ISS=https://iam-escape.cloud.cnaf.infn.it/" --email "<user_email>"
You need root permissions, so you need to authenticate with Rucio from the root account, which has the following config:
[client]
rucio_host = https://vre-rucio.cern.ch:443
auth_host = https://vre-rucio-auth.cern.ch:443
auth_type = userpass
username = ddmlab
password = <password_tbag>
ca_cert = /etc/pki/tls/certs/CERN-bundle.pem
account = root
request_retries = 3
protocol_stat_retries = 6
oidc_issuer = escape
oidc_polling = true
[policy]
permission = escape
schema = escape
lfn2pfn_algorithm_default = hash
When you now run the command 'rucio whoami' by providing a rucio.cfg that is similar to this:
[client]
rucio_host = https://vre-rucio.cern.ch:443
auth_host = https://vre-rucio-auth.cern.ch:443
ca_cert = /etc/pki/tls/certs/CERN-bundle.pem
auth_type = oidc
account = <your_IAM_account>
oidc_audience = rucio
oidc_scope = openid profile wlcg wlcg.groups fts:submit-transfer offline_access
request_retries = 3
oidc_issuer = escape
oidc_polling = true
auth_oidc_refresh_activate = true
oidc_audience = rucio
[policy]
permission = escape
schema = escape
lfn2pfn_algorithm_default = hash
The server should prompt you with a link to generate the token to authenticate to the instance.
In order to be authenticated by the Rucio auth server with your x509 certificate, you need to periodically update the CA certificates that will verify that your personal certificate is valid. You can do that by running the command fetch-cerl
(yum install fetch-crl && fetch-crl
) or you could safely copy the contents of /etc/grid-security/certificates/*
from the official institutional computers (in case of cern, lxplus.cern.ch
).
To see whether you are correctly authenticated to the instance, always run first the command rucio -vvv whoami
(-vvv
stands for verbose
).
curl -vvv --cacert /etc/pki/tls/certs/CERN-bundle.pem --cert <x509_cert_path> --key <x509_key_path> https://<rucio_auth_server>/auth/x509
When you authenticate with tokens, the OAuth method is described on the Bearer token documentation. The token gets stored in the directory /tmp/root/.rucio_root/
, and you can export it with:
export tokenescape=<content_of_your_authe_token_file>
Inspect it with:
curl -s -H "Authorization: Bearer $tokenescape" https://iam-escape.cloud.cnaf.infn.it/userinfo | jq .
If you want, you can use CRIC to better manage your RSEs, but for now we are setting them up manually.
Before adding any RSE to the RUCIO instance:
- Be sure that you can communicate with the endpoint: download the corresponding client to communicate with the storage and test it (explore the storage, for example).
- Then test if you can interact with the endpoint using Gfal (local-SE).
- Last check will be testing the the connection/communication between SEs using FTS.
You can either execute the following commands manually, or using this script to make your life easier. Remember to insert your own variables at the start of the file!
First of all you'll need to identify with RUCIO, f. ex, using a rucio.cfg
file.
> export RUCIO_CONFIG=/paht/to/the/file/rucio.cfg
> rucio whoami
The following script fully sets up an EULAKE RSE from a root
account
> rucio-admin rse add <RSE_NAME>
Added new deterministic RSE: RSE_NAME
> rucio-admin rse add-protocol --hostname <HOSTNAME> \
--scheme <SCHEME> \
--prefix <PREFIX> \
--port <PORT> \
--impl <IMPT> \
--domain-json '{"wan": {"read": X, "write": X, "delete": X, "third_party_copy_read": X, "third_party_copy_write": X}, "lan": {"read": X, "write": X, "delete": X}}' <RSE_NAME>
> rucio-admin rse set-attribute --rse <RSE_NAME> --key <KEY> --value <VALUE>
Added new RSE attribute for RSE_NAME: KEY-VALUE
# Do the same for the following attributes:
Attributes:
===========
QOS: X
SITE: X
city: Missing from CRIC
country_name: NULL
fts: X
greedyDeletion: X
latitude: X
lfn2pfn_algorithm: hash
longitude: X
oidc_support: x
region_code: X
source_for_used_space: X
verify_checksum: X
# Defining storage for the RSE
> rucio-admin account set-limits <ACCOUNT> <RSE_NAME> <LIMIT>
> rucio-admin rse set-limit <RSE_NAME> MinFreeSpace <LIMIT>
# Once you have added at least 2 SEs, you can set up the distances between the SEs. This can be done in a single direction, if intented.
> rucio-admin rse add-distance --distance <distance> --ranking <ranking> <RSE-1> <RSE-2>
-
lfn2pfn_algorithm
attribute needs to be set tohash
. - (Optional attribute - no need to set it up) country code needs to be a 2 letter code, not more.
-
oidc_support
does not affect user authentication, upload or download, but only transfers and rules. Here the documentation in more details. The root account needs to be well configured with it, otherwise the reaper daemon will throw OIDC errors when deleting. -
greedyDeletion
needs to be studied, but it is better for the moment to keep it toTrue
, so that the reaper actually deletes the files. - In general, avoid adding
NULL
values to any key attribute.
Files in RUCIO can only be deleted by the reaper
deamon.
The usual way to do this is just by using the RUCIO CLI
> rucio erase <SCOPE>:<DID>
This operation is asynchronous (it should work within the next 24h), and first deletes all the replicas associated to the DID until the file completely disappears from the DB. To delete all the replicas associated to one rule, use:
rucio delete-rule --purge-replicas --all <rule_id>
Sometimes, it will be convenient to delete an RSE from the Data Lake. In order to do so, the RSE needs to be completely empty. To check if it is, run
rucio list-rse-usage <rse_name>
To see which datasets are ont he RSE:
rucio list-datasets-rse <rse_name>
If you know the account that uploaded data on the RSE:
rucio list-rules –-account=<account_name>
Access the DB, for example with psql, and execute the query:
SELECT * FROM replicas WHERE rse_id ='<rse_id>';
And check if there are still some files there. In general, if lock_cnt
is 0 and tombstone
is in the past, the reaper
daemon should delete the file! To do so, run the rucio erase
command as explained above.
Sometimes you might want to delete an RSE from your Rucio DB. This can happen if:
- the site is already gone (a partner institutions has decommissioned it)
- you lost access to it
To do so:
- use the
--schema mock
on the RSE with the following command: - set the greedy deletion to True
- (if the
state
of the file in the DB isA
(available), you can set the tombstone time to the past so that the reaper actions take place without the 24h delay) - update the
update_at
time in the DB to the past - (use under your own risk- this step that worked in our case. If there are not rules associated to a replica but still the replica shows a, for example,
C
(copying) state, you can update the state of the replica toU
(unavailable) and update theupdated_at
time to the past as explained on step 3.)
# Step 1. command
> rucio-admin rse add-protocol --hostname example.com --scheme mock --prefix / --port 0 --impl rucio.rse.protocols.mock.Default <rse_name>
# Step 2. command
> rucio-admin rse set-attribute --key greedyDeletion --value True --rse <rse_name>
# Step 3. command
> rucio-admin rucio-admin replicas set-tombstone --rse <rse_name> <scope>:<did>
# Step 4 sql (DB) command
UPDATE replicas SET updated_at = TO_DATE('2023-01-01', 'YYYY-MM-DD') WHERE rse_id = '<rse_id>';
# Step 5 sql (DB) command
UPDATE replicas SET state = 'U' WHERE rse_id = '<rse_id>'; # command not tested, though.
The hermes Rucio daemon takes care of the messages on the VRE operations for file transfer, upload and download. These messages (metrics) are useful to pass them to monitoring dashboard such as Grafana. Here are some initial useful links:
- dashmb queue monitoring: https://monit-grafana.cern.ch/d/client/client?from=now-30d&orgId=32&to=now&var-client=eosc.rucio.events&var-cluster=dashb&var-rp=one_month&refresh=5m&var-host=All
- Monitoring Rucio presentation: https://indico.cern.ch/event/844100/contributions/3560556/attachments/1909111/3153979/monitoring-rucio.pdf
- Monitoring data access Docs and data source Docs
- Public logs integrated in MONIT monit-timber
- Change the index pattern (top left corner) by typing
escape
oreosc
- Change the index pattern (top left corner) by typing
- Long term selected metrics monit-opensearch-lt
- Change the index pattern (top left corner) by typing
eosc
orescape
- Change the index pattern (top left corner) by typing
- Public logs integrated in MONIT monit-timber
Elastic Search (ES
) data sources can be added to Grafana on the configuration tab (if you have the corresponding rights on the Grafana instance).
Follow this tuto/doc to set up in Grafana the ES
Data Source you are interested in. User
and pass
for both timber
and monit-opensearch-lt
data source can be found in tbag
.
Examples of containers/scripts set up as cronjobs that populated the dashboard can be found here:
- Testing infra with rucio noise production
- Deprecated way
- Population of rucio DID dashboard.
- Population of rucio replicas dashboard.
The VRE pulls Helm charts directly from the Rucio main repository: https://github.com/rucio/helm-charts. When a minor version upgrade (i.e. 1.30.0 --> 1.30.1) is performed, the Helm chart can be edited directly on Github (flux will take care of applying the edits to the K8s cluster) without worries.
In a case of a major version upgrade (i.e. 1.30.0 --> 1.31.0), the logic of the database (DB) tables will be changed, and the upgrade needs to be performed carefully. Here is an outline of the steps to be taken to migrate the Rucio instance from v1 to v2 (where, for example, v1=1.30.0 and v2=1.31.0):
- create a clone C1 of the C0 DB (if on CERN's DBOD, this is very straightforward); this will leave you with database C0 and C1.
- go inside a v2 rucio-server Docker container (in a development cluster, for example) and edit the
/opt/rucio/etc/rucio.cfg
to connect to the C1 clone of the DB.
[database]
default = postgresql://<user>:<pwd>@<hostname>.cern.ch:<port>/<db_name>
- run the alembic upgrade with
alembic upgrade head --sql
; thealembic.ini
file should be picked up automatically. Clone C1 is now in v2. - check that the upgrade has worked well by looking at the DB Tables and noticing if anything fishy happened. If all is good, you are good to go!
- create a clone C2 of C0, both still in v1.
- on the production cluster, in v1, stop the connections to the DB by deleting the K8s
db-secret
. - from the v2 rucio server in the dev cluster, change the DB config to connect to clone C0 of the database and run the alembic upgrade as before.
- once the upgrade of C0 is performed, apply the v2 Helm charts to the production cluster, and re-create the
db-secret
in K8s to connect to C0 database, which is now in v2. - if all is well, you now have an upgraded version of the DB and of the K8s cluster!
Most of the VRE operation cronjobs are sharing a BASE
image with the common software shared between all the cron containers.
All the VRE containers can be found within the VRE repository.
Starting from a BASEIMAGE=rucio/rucio-server:release-1.30.0
base image, the base-ops container has the following software installed: vre-base-ops Docker file
This image should interact with different RUCIO and grid components. Please check the Certificates and secrets section to see how to install (in a persistant way) certain certificates.
Uses the vre-base-ops
latest stable version.
TBC.
Uses the vre-base-ops
latest stable version.
TBC.
Uses a BASEIMAGE=rucio/rucio-client:release-1.30.0
as a base image
TBC.
To enable mounting CVMFS on the cluster you will need first to check which is the CSI drivers version that it's on the cluster. This happens automatically if using CERN OpenStack resource provider - unless specified during the creation of the cluster (either by un-checking the CVMFS CSI Enabled
box, either through the argument --labels cvmfs_enabled=false
if using CLI).
# To check the CVMFS CSI version installed on the cluster
(k9s) [root@kike-dev example]# kubectl describe pod csi-cvmfsplugin-xfnqz -n kube-system | grep Image
Image: registry.cern.ch/magnum/csi-node-driver-registrar:v2.2.0
Image ID: registry.cern.ch/magnum/csi-node-driver-registrar@sha256:2dee3fe5fe861bb66c3a4ac51114f3447a4cd35870e0f2e2b558c7a400d89589
Image: registry.cern.ch/magnum/cvmfsplugin:v1.0.0
Image ID: registry.cern.ch/magnum/cvmfsplugin@sha256:409e1e2a4b3a0a6c259d129355392d918890325acfb4afeba2f21407652d58a5
If the cvmfsplugin
is set to v1.0.0, you will need to upgrade the plugin to v>=2. Follow this tutorial.
On our case we use helm to make the upgrade:
$ kubectl patch daemonset csi-cvmfsplugin -p '{"spec": {"updateStrategy": {"type": "OnDelete"}}}' -n kube-system
$ vi values-v1-v2-upgrade.yaml
`yaml
nodeplugin:
# Override DaemonSet name to be the same as the one used in v1 deployment.
fullnameOverride: "csi-cvmfsplugin"
# DaemonSet matchLabels must be the same too.
matchLabelsOverride:
app: csi-cvmfsplugin
# Create a dummy ServiceAccount for compatibility with v1 DaemonSet.
serviceAccount:
create: true
use: true
`
$ helm upgrade cern-magnum cern/cvmfs-csi --version 2.0.0 --values values-v1-v2-upgrade.yaml --namespace kube-system
Once the upgrade has happened, you will need to set up a stage class and a PVC to mount cmvfs. Following the above tutorial plus the examples found in the same repo, the K8s manifest to be applied should look like as shown below. On the ESCAPE cluster this was the manifest applied - to be updated to the VRE one:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cvmfs
provisioner: cvmfs.csi.cern.ch
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cvmfs
namespace: jupyterhub
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
# Volume size value has no effect and is ignored
# by the driver, but must be non-zero.
storage: 1
storageClassName: cvmfs
--> Because of the jupyter-role=singleuser:NoSchedule
taint (using-a-dedicated-node-pool-for-users and Assigning Pods to Nodes). This was solved thanks to ticket INC3384022.
- how to solve it: Add the following toleration to the DaemonSet deployed by the CSI-cvmfs helm chart (v2.0.0)
│ tolerations:
│ - key: "jupyter-role"
│ operator: "Equal"
│ value: "singleuser"
│ effect: "NoSchedule"
or
kubectl patch daemonset csi-cvmfsplugin -p '{"tolerations": [{"key": "jupyter-role", "operator": "Equal", "value": "singleuser", "effect": "NoSchedule"}]}' -n kube-system
(or alternatively kubectl patch daemonset csi-cvmfsplugin -p '{"tolerations": [{"key": "jupyter-role", "operator": "Exists", "effect": "NoSchedule"}]}'
) should work.
- Apply the
reana-release.yaml
helm chart via flux, keeping the ingress disabled as the default Reana ingress is Traefik, while at CERN Openstack already deploysnginx
as the ingress controller. - If you are using your own DB instance, change the configuration with DB name, host and port in the helm chart, delete the secret
<your-reana-helm-release>-db-secrets
which contains the username and password and re-apply your own as in theinfrastructure/scripts/create-reana-secrets.sh
. NOTE THAT THE SECRET GETS UPDATED EVERY TIME YOU APPLY A RELEASE, so make sure you check the db secret is you own and not the default one once you do some development that requires you to re-apply changes.
Initialise the DB as described in the helm chart. If you are using k9s, type:helm
and press enter on the<your-reana-helm-release>
name for instructions. - Once the helm chart is applied correctly, add the DNS name (
reana-vre.cern.ch
) as a label to the ingress nodes, like thejhub
label as well:
openstack server set --property landb-alias=jhub-vre--load-1-,reana-vre--load-1- cern-vre-bl53fcf4f77h-node-0
openstack server set --property landb-alias=jhub-vre--load-2-,reana-vre--load-2- cern-vre-bl53fcf4f77h-node-1
openstack server set --property landb-alias=jhub-vre--load-3-,reana-vre--load-3- cern-vre-bl53fcf4f77h-node-2
- Apply the
reana-ingress.yaml
manually: theletsencrypt
annotation should create the secretcert-manager-tls-ingress-secret-reana
automatically. - Configure your identity provider. For this, follow the initial instructions on https://github.com/reanahub/docs.reana.io/pull/151/files. For the IAM ESCAPE idP the OpenID configuration is the following: https://iam-escape.cloud.cnaf.infn.it/.well-known/openid-configuration. The secrets of the IAM client acting on behalf of the application are stored in
reana-vre-iam-client
. You can then see that the users get created in the DB, and the release has a way to specify email notification whenever a new user requests a token. A coupel of useful commands to deal with users are:
$ export REANA_ACCESS_TOKEN=$(kubectl get secret reana-admin-access-token -n reana -o json | jq -r '.data | map_values(@base64d) | .ADMIN_ACCESS_TOKEN')
$ echo $REANA_ACCESS_TOKEN
# LIST USERS
$ kubectl exec -i -t deployment/reana-server -n reana -- flask reana-admin user-list --admin-access-token $REANA_ACCESS_TOKEN
# CREATING USER
kubectl exec -i -t deployment/reana-server -n reana -- flask reana-admin user-create --email <user-email> --admin-access-token $REANA_ACCESS_TOKEN
# GRANTING TOKEN TO NEW USER
kubectl exec -i -t deployment/reana-server -n reana -- flask reana-admin token-grant -e <user-email> --admin-access-token $REANA_ACCESS_TOKEN
- Navigate to
reana-vre.cern.ch
and log in with your IAM credentials.
JupyterHub is installed through Z2JH Helm Chart. The domain is https://nb-vre.cern.ch/ URL, which uses a Sectigo certificate.
The chart values are adjusted to use:
-
LoadBalancer
as a service - IAM ESCAPE OAuth authentication
- SSL/HTTPS with the domain and Sectigo certificate
- DBOD postgres database
Secrets for IAM and Sectigo are stored in the same namespace.
This chart combines a Jupyterhub deployment, a Dask deployment and a Dask Gateway to distribute the workflow on all the nodes of the cloud cluster. It can be accessed via nb-vre.cern.ch. Here the instructions to set it up via the Helm chart in the repository.
-
Apply namespace and chart: https://github.com/vre-hub/vre/tree/main/infrastructure/cluster/flux-v2/dask.
-
label the ingress nodes with the correct URL (nb-vre, as well as the already existing ones for jhub and rucio). openstack server set --property landb-alias=jhub-vre--load-1-,reana-vre--load-1-,nb-vre--load-1- cern-vre-bl53fcf4f77h-node-0 openstack server set --property landb-alias=jhub-vre--load-2-,reana-vre--load-2-,nb-vre--load-2- cern-vre-bl53fcf4f77h-node-1 openstack server set --property landb-alias=jhub-vre--load-3-,reana-vre--load-3-,nb-vre--load-3- cern-vre-bl53fcf4f77h-node-2
-
create IAM client (
nb-vre-iam-client
) with redirecthttps://nb-vre.cern.ch/hub/oauth_callback
. -
Apply daskhub secrets: https://github.com/vre-hub/vre/tree/main/infrastructure/secrets/dask.
-
For the DB connection, create a new DB and user in your DB instance:
CREATE DATABASE dask;
CREATE USER dask WITH ENCRYPTED PASSWORD 'BEEtpXGrP8uJGRXuLMI6';
GRANT ALL PRIVILEGES ON DATABASE dask TO dask;
-
Apply the release, that will request a certificate (with
letsencrypt
service at CERN) for the service namenb-vre.cern.ch
. If the certificate does not get issued, look into the errors withkubectl describe -n daskhub
in order, all of these resources, that depend one on the other:certificate < certificaterequest < issuer/clusterissuer < orders/challenges
-
Navigate to the URL nb-vre.cern.chand test in a notebook:
import dask
from dask_gateway import Gateway
from dask_gateway import GatewayCluster
import dask.array as da
# create cluster
cluster = GatewayCluster() # see the pod dask-scheduler being created in other nodes of the cluster
cluster.scale(4) # see the pods dask-worker being created in other nodes of the cluster
cluster.adapt(minimum=2, maximum=20)
# inspect the active cluster
gateway=Gateway()
gateway.list_clusters()
# execute a Dask computation
x = da.random.random((10000, 10000, 10), chunks=(1000, 1000, 5))
y = da.random.random((10000, 10000, 10), chunks=(1000, 1000, 5))
z = (da.arcsin(x) + da.arccos(y)).sum(axis = (1,2))
z.compute()
# shut down the Dask cluster
cluster.shutdown()
-
CSI
: Container-Storage-Intergace -
PV
: Persistent Volume -
PVC
: Persistent Volume Claim -
SE
: Storage Element
CERN VRE Technical Documentation ©CERN 2023