Skip to content

Commit cf86e0e

Browse files
authored
Merge pull request #3 from lexming/2023.02
Update to 2023.02 implementation
2 parents 5816861 + bad7fa5 commit cf86e0e

File tree

14 files changed

+595
-327
lines changed

14 files changed

+595
-327
lines changed

README.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,18 @@
33
The goal of our notebook platform is to provide a web-based interface to our
44
tier-2 HPC cluster. This alternative interface to the standard shell interface
55
is based on [computational notebooks](https://en.wikipedia.org/wiki/Notebook_interface).
6-
The notebook platform must be capable to handle user authentication, launch
7-
notebooks leveraging the computational resources of our HPC infrastructure and
8-
allow users to manage a library of notebooks.
6+
The notebook platform must be capable to:
7+
* handle VSC user authentication
8+
* allow selection of computational resources
9+
* launch notebooks leveraging the computational resources of our HPC infrastructure
10+
* allow users to manage a library of notebooks
11+
* integrate with the software module system of our HPC clusters
912

1013
## JupyterHub
1114

12-
[JupyterHub](https://jupyter.org/hub) from the Jupyter Project fulfills all the
13-
requirements of this platform. The details of its integration in the HPC
14-
cluster of VUB are available in [notebook-platform/jupyterhub](jupyterhub).
15+
[JupyterHub](https://jupyter.org/hub) from the Jupyter Project fulfills all
16+
requirements of this platform. Moreover, the modular architecture of JupyterHub
17+
allows to easily implement solutions for those requirements that are not
18+
covered natively. The details of its integration in the HPC cluster of VUB are
19+
available in [notebook-platform/jupyterhub](jupyterhub).
1520

jupyterhub/README.md

Lines changed: 83 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,42 @@
33
![JupyterHub integration in HPC cluster](jupyterhub-diagram.png "JupyterHub integration in HPC cluster")
44

55
The hub and its HTTP proxy are run by a non-root user in a rootless container.
6-
The container is managed by a service in [systemd](https://systemd.io/) with
7-
[podman](https://podman.io/).
6+
The container is managed in the host systems by a service in
7+
[systemd](https://systemd.io/) with [podman](https://podman.io/).
88

9-
Notebooks are run remotely, in any available compute node in the HPC cluster.
9+
Notebooks are launched remotely, on the compute nodes of our HPC cluster.
1010
The allocation of hardware resources for the notebook is done on-demand by
11-
[Slurm](https://slurm.schedmd.com/). JupyterHub can submit jobs to Slurm to
12-
launch new notebooks thanks to [batchspawner](https://github.com/jupyterhub/batchspawner).
11+
the resource manager [Slurm](https://slurm.schedmd.com/). Users can select the
12+
resources for their notebooks from the JupyterHub interface thanks to the
13+
[JupyterHub MOdular Slurm Spawner](https://github.com/silx-kit/jupyterhub_moss),
14+
which leverages [batchspawner](https://github.com/jupyterhub/batchspawner) to
15+
submit jobs to Slurm in user's behalf that will launch the single-user server.
16+
1317
The main particularity of our setup is that such jobs are not submitted to
1418
Slurm from the host running JupyterHub, but from the login nodes of the HPC
15-
cluster.
19+
cluster via an SSH connection. This approach has the advantage that the system
20+
running JupyterHub can be very minimal, avoiding the need for local users,
21+
special file-system mounts and the complexity of provisioning a Slurm
22+
installation capable of submitting jobs to the HPC cluster.
23+
24+
## Rootless
25+
26+
JupyterHub is run by a non-root user in a rootless container. Setting up a
27+
rootless container is well described in the [podman rootless
28+
tutorial](https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md).
29+
30+
We use a [system service](host/etc/systemd/system/jupyterhub.service) to
31+
execute `podman` by a non-root user `jupyterhub` (*aka* JupyterHub operator).
32+
This service relies on a [custom shell script](host/usr/local/bin/jupyterhub-init.sh)
33+
to automatically initialize a new image of the rootless container or start an
34+
existing one.
35+
36+
The container [binds a few mounts with sensitive configuration
37+
files](host/usr/local/bin/jupyterhub-init.sh#L59-L66) for JupyterHub, SSL
38+
certificates for the web server and SSH keys to connect to the login nodes.
39+
Provisioning these files in the container through bind-mounts allows to have
40+
secret-free container images and seamlessly deploy updates to the configuration
41+
of the hub.
1642

1743
## Network
1844

@@ -21,13 +47,13 @@ have a routable IP address, so they rely on the network interfaces of the host
2147
system. The hub must be able to talk to the notebooks being executed on the
2248
compute nodes in the internal network, as well as serve the HTTPS requests
2349
(through its proxy) from users on the external network. Therefore, ports 8000
24-
(HTTP proxy) and 8081 (REST API) in the
25-
[container are forwarded to the host system](usr/local/bin/jupyterhub-init.sh#L54).
50+
(HTTP proxy) and 8081 (REST API) in the [container are forwarded to the host
51+
system](host/usr/local/bin/jupyterhub-init.sh#L53-L57).
2652

2753
The firewall on the host systems blocks all connection through the external
2854
network interface and forwards port 8000 on the internal interface (HTTP proxy)
29-
to port 443 on the external one. This setup allows accessing the web interface
30-
of the hub/notebooks from both the internal and external networks. The REST API
55+
to port 443 on the external one. This setup renders the web interface of the
56+
hub/notebooks accessible from both the internal and external networks. The REST API
3157
of the hub is only available on port 8081 of the internal network.
3258

3359
## Authentication
@@ -36,112 +62,57 @@ User authentication is handled through delegation via the
3662
[OAuth](https://en.wikipedia.org/wiki/OAuth) service of the
3763
[VSC](https://www.vscentrum.be/) accounts used by our users.
3864

39-
We made a custom
40-
[VSCGenericOAuthenticator](etc/jupyterhub/jupyterhub_config.py#L73-L77) which
41-
is heavily based on `LocalGenericOAuthenticator` from
42-
[OAuthenticator](https://github.com/jupyterhub/oauthenticator/):
43-
44-
* entirely relies on OAuthenticator to carry out a standard OAuth delegation
45-
with the VSC account page, the [URLs of the VSC OAuth are defined in the
46-
environment of the container](container/Dockerfile#L59-L61) and the [secrets
47-
to connect to it are defined in JupyterHub's configuration
48-
file](etc/jupyterhub/jupyterhub_config.py#L83-L88)
49-
* automatically creates local users in the container for any VSC account logged
50-
in to JupyterHub and ensures correct UID mapping to allow local VSC users to
51-
[access their home directories](usr/local/bin/jupyterhub-init.sh#L80),
52-
which is needed to securely connect to the login nodes in the HPC cluster
53-
with their SSH keys
65+
We use the [GenericOAuthenticator](https://github.com/jupyterhub/oauthenticator/)
66+
from JupyterHub:
5467

55-
## Rootless
68+
* carry out a standard OAuth delegation with the VSC account page
5669

57-
JupyterHub is run by a non-root user in a rootless container. Setting up a
58-
rootless container is well described in the [podman rootless
59-
tutorial](https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md).
60-
61-
We use a [system service](etc/systemd/system/jupyterhub.service) to execute
62-
`podman` by a non-root user `jupyterhub` (*aka* JupyterHub operator). This
63-
service relies on a [custom shell script](usr/local/bin/jupyterhub-init.sh) to
64-
automatically initialize a new image of the rootless container or start an
65-
existing one.
66-
67-
### Extra permissions
68-
69-
In the current setup, running JupyterHub fully non-root is not possible because
70-
the hub needs superuser permissions for two specific tasks:
71-
72-
* `VSCGenericOAuthenticator` creates local users in the container
73-
* `SlurmSpawner` switches to VSC users (other non-root users) to launch their
74-
notebooks through Slurm
70+
* [URLs of the VSC OAuth](container/Dockerfile#L72-L76) are defined in the
71+
environment of the container
7572

76-
These additional permissions are granted to the hub user discretely by means of
77-
`sudo`. The definitions of each extra permission is defined in the
78-
[sudoers](container/sudoers.conf) file of the container.
73+
* [OAuth secrets](container/.config/jupyterhub_config.py#L40-L45) are
74+
defined in JupyterHub's configuration file
7975

80-
### Container namespace
81-
82-
Users logging in JupyterHub have to access their home directories to be able to
83-
connect to the login nodes of the HPC cluster with their SSH keys. Since home
84-
directories are bound to the mounts in the host system, it is critical to
85-
properly define the namespace used by the rootless container to cover the real
86-
UIDs of the users in the host system.
87-
88-
The UID/GIDs of VSC users are all in the 250000-2599999 range. We can
89-
easily create a [mapping for the container](etc/subuid) with a straightforward
90-
relationship between the UIDs inside and outside the container:
91-
92-
```
93-
$ podman unshare cat /proc/self/uid_map
94-
0 4009 1
95-
1 2500001 65536
96-
```
97-
98-
Therefore, the non-root user executing the rootless container will be mapped to
99-
the root user of the container, as usual. While, for instance, user with UID 1
100-
in the container will be able to access the files of UID 250001 outside.
101-
The custom method [`vsc_user_uid_home`](etc/jupyterhub/jupyterhub_config.py#L43)
102-
ensures that VSC users created inside the container have the correct UID with
103-
regards to this mapping.
104-
105-
The namespace used by the container must be available in the host system (*i.e*
106-
not assigned to any user or group in the system), which means that the VSC
107-
users must not exist in the host system of the container. This requirement does
108-
not hinder mounting the home directories of those VSC users in the system
109-
though, as any existing files owned by those UID/GIDs of the VSC users will be
110-
just non-assigned to any known user/group.
76+
* local users beyond the non-root user running JupyterHub are **not needed**
11177

11278
## Slurm
11379

114-
Integration with Slurm is leveraged by `SlurmSpawner` of
115-
[batchspawner](https://github.com/jupyterhub/batchspawner).
116-
117-
We modified the submission command to execute `sbatch` in the login nodes of
118-
the HPC cluster through SSH. The login nodes already run Slurm and are the sole
119-
systems handling job submission in our cluster. Delegating job submission to
120-
them avoids having to install and configure Slurm in the container running
121-
JupyterHub.
122-
123-
The user's environment in the hub is passed through the SSH connection by
124-
selectively selecting the needed environment variables to launch the user's
125-
notebook:
126-
127-
```
128-
sudo -E -u vscXXXXX ssh -o 'StrictHostKeyChecking no' login.host.domain \
129-
env JUPYTERHUB_API_TOKEN="${JUPYTERHUB_API_TOKEN@Q}" \
130-
JPY_API_TOKEN="${JPY_API_TOKEN@Q}" \
131-
JUPYTERHUB_CLIENT_ID="${JUPYTERHUB_CLIENT_ID@Q}" \
132-
JUPYTERHUB_HOST="${JUPYTERHUB_HOST@Q}" \
133-
JUPYTERHUB_API_URL="${JUPYTERHUB_API_URL@Q}" \
134-
JUPYTERHUB_OAUTH_CALLBACK_URL="${JUPYTERHUB_OAUTH_CALLBACK_URL@Q}" \
135-
JUPYTERHUB_OAUTH_SCOPES="${JUPYTERHUB_OAUTH_SCOPES@Q}" \
136-
JUPYTERHUB_USER="${JUPYTERHUB_USER@Q}" \
137-
JUPYTERHUB_SERVER_NAME="${JUPYTERHUB_SERVER_NAME@Q}" \
138-
JUPYTERHUB_ACTIVITY_URL="${JUPYTERHUB_ACTIVITY_URL@Q}" \
139-
JUPYTERHUB_BASE_URL="${JUPYTERHUB_BASE_URL@Q}" \
140-
JUPYTERHUB_SERVICE_PREFIX="${JUPYTERHUB_SERVICE_PREFIX@Q}" \
141-
JUPYTERHUB_SERVICE_URL="${JUPYTERHUB_SERVICE_URL@Q}" \
142-
sbatch --parsable
143-
```
144-
145-
Note: the expansion operator `${var@Q}` is available in bash 4.4+ and returns a
146-
quoted string with escaped special characters
147-
80+
Integration with Slurm is leveraged through a custom Spawner called
81+
[VSCSlurmSpawner](container/.config/jupyterhub_config.py#L60) based on
82+
[MOSlurmSpawner](https://github.com/silx-kit/jupyterhub_moss).
83+
`VSCSlurmSpawner` allows JupyterHub to generate the user's environment needed
84+
to spawn its single-user server without any local users. All user settings are
85+
taken from `vsc-config`.
86+
87+
We modified the [submission command](container/.config/jupyterhub_config.py#L295)
88+
to execute `sbatch` in the login nodes of the HPC cluster through SSH.
89+
The login nodes already run Slurm and are the sole systems handling job
90+
submission in our cluster. Delegating job submission to them avoids having to
91+
install and configure Slurm in the container running JupyterHub. The hub
92+
environment is passed over SSH with a strict control over the variables that
93+
are [sent](container/.ssh/config) and [accepted](slurm_login/etc/ssh/sshd_config)
94+
on both ends.
95+
96+
The SSH connection is established by the non-root user running JupyterHub (the
97+
hub container does not have other local users). This jupyterhub user has
98+
special `sudo` permissions on the login nodes to submit jobs to Slurm as other
99+
users. The specific group of users and list of commands allowed to the
100+
jupyterhub user are defined in the [sudoers file](slurm_login/etc/sudoers).
101+
102+
Single-user server spawn process:
103+
104+
1. user selects computational resources for the notebook in the
105+
[web interface of the hub](https://github.com/silx-kit/jupyterhub_moss)
106+
107+
2. `VSCSlurmSpawner` generates environment for the user without any local users
108+
in the system of the hub
109+
110+
3. jupyterhub user connects to login node with SSH, environment is passed
111+
through the wire
112+
113+
4. jupyterhub user submits new job to Slurm cluster as target user keeping the
114+
hub environment
115+
116+
5. single-user server job fully [resets the environment and
117+
re-generates](container/.config/jupyterhub_config.py#L264-L285) specific
118+
environment variables for the single-user server

0 commit comments

Comments
 (0)