Skip to content

Update Dockerfile for new CPU base images #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 29 additions & 26 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# pending
## [unreleased]

- #112 upgrades the CPU base image to Python 3.10. The full build matrix isn't
generated yet, so test at your own peril.

- #109:

Expand All @@ -13,13 +16,13 @@

- fixes failure when GCP credentials aren't configured running in local mode.

# 0.4.1
## 0.4.1

Small release to archive for JOSS acceptance.

- Move `cloud_sql_proxy` installation before code copy (https://github.com/google/caliban/pull/87)

# 0.4.0
## 0.4.0

The biggest feature in this new release is native support for logging to an
MLFlow tracking server using the [UV
Expand All @@ -42,7 +45,7 @@ documentation soon.
match the current state here):
https://cloud.google.com/ai-platform/training/docs/regions

# 0.3.0
## 0.3.0

- @ramasesh Added a fix that prevented `pip` git dependencies from working in
`caliban shell` mode (https://github.com/google/caliban/pull/55) This adds a
Expand Down Expand Up @@ -130,7 +133,7 @@ dlvm:tf2-gpu-2.2
Format strings work here as well! So, `"dlvm:pytorch-{}-1.4"` is a totally valid
base image.

# 0.2.6
## 0.2.6

- Prepared for a variety of base images by setting up a cloud build matrix:
https://github.com/google/caliban/pull/25
Expand All @@ -146,7 +149,7 @@ base image.

![2020-06-26 09 48 50](https://user-images.githubusercontent.com/69635/85877300-2a3e7300-b794-11ea-9792-4cf3ae5e4263.gif)

# 0.2.5
## 0.2.5

- fixes the python binary that caliban notebook points to (now that we use
conda)
Expand All @@ -156,23 +159,23 @@ base image.
This makes it easy to add, for example, npm and latex support to your caliban
notebook invocations.

# 0.2.4
## 0.2.4

- fixes a bug with `parse_region` not handling a lack of default.
- converts the build to Github Actions.
- Rolls Caliban back to requiring only python 3.6 support.
- Removes some unused imports from a few files.

# 0.2.3
## 0.2.3

- Added fix for an issue where large user IDs would crash Docker during the
build phase. https://github.com/google/caliban/pull/8

# 0.2.2
## 0.2.2

- Fix for bug with requirements.txt files.

# 0.2.1
## 0.2.1

- Added support for Conda dependencies
(https://github.com/google/caliban/pull/5). If you include `environment.yml`
Expand All @@ -181,7 +184,7 @@ notebook invocations.
- Base images for CPU and GPU modes now use Conda to manage the container's
virtual environment instead of `virtualenv`.

# 0.2.0
## 0.2.0

- Caliban now caches the service account key and ADC file; you should see faster
builds, BUT you might run into trouble if you try to run multiple Caliban
Expand Down Expand Up @@ -228,7 +231,7 @@ This works too:
}
```

# 0.1.15
## 0.1.15

- `caliban notebook` now attempts to search for the first free port instead of
failing due to an already-occupied port.
Expand All @@ -240,7 +243,7 @@ This works too:
using its build cache. This is helpful to use if you want to, say, force new
dependencies to get installed without bumping their versions explicitly.

# 0.1.14
## 0.1.14

- JSON experiment configuration files can now handle arguments which are varied
together, by supplying a compound key, of the form e.g. `[arg1,arg2]`.
Expand All @@ -252,7 +255,7 @@ This works too:
The colon in the project name separating domain and project ID is handled
properly.

# 0.1.13
## 0.1.13

- 'caliban run' and 'caliban shell' now take an --image_id argument; if
provided, these commands will skip their 'docker build' phase and use the
Expand All @@ -274,7 +277,7 @@ This works too:
- `caliban shell` has a new `--shell` argument that you can use to override the
container's default shell.

# 0.1.12
## 0.1.12

- consolidated gke tpu/gpu spec parsing with cloud types
- modified all commands to accept as the module argument paths to arbitrary
Expand All @@ -297,26 +300,26 @@ This works too:
- if ADC credentials are NOT present but a service account key is we write a
placeholder. this is required to get ctpu working inside containers.

# 0.1.11
## 0.1.11

- added tpu driver specification for gke jobs
- added query for getting available tpu drivers for cluster/project

# 0.1.10
## 0.1.10

- set host_ipc=True for cluster jobs

# 0.1.9
## 0.1.9

- moved cluster constants to separate file
- moved cluster gpu validation to separate file
- added test for gpu limits validation

# 0.1.8
## 0.1.8

- TPU and GPU spec now accept validate_count arg to disable count validation.

# 0.1.7
## 0.1.7

- Fixed a bug where the label for the job name wasn't getting properly
sanitized - this meant that if you provided an upper-cased job name job
Expand All @@ -325,7 +328,7 @@ This works too:
- experiment config parsing now performs the full expansion at CLI-parse-time
and validates every expanded config.

# 0.1.6
## 0.1.6

- `--docker_run_args` allows you to pass a string of arguments directly through
to `docker run`. This command works for `caliban run`, `caliban notebook` and
Expand All @@ -339,7 +342,7 @@ This works too:

- fixed a bug in `caliban.util.TempCopy` where a `None`-valued path would fail. This affected environments where `GOOGLE_APPLICATION_CREDENTIALS` wasn't set.

# 0.1.5
## 0.1.5

- `--experiment_config` can now take experiment configs via stdin (pipes, yay!);
specify `--experiment_config stdin`, or any-cased version of that, and the
Expand Down Expand Up @@ -376,7 +379,7 @@ This works too:
image ID. Useful for checking if your image can build at all with the current
settings.

# 0.1.4
## 0.1.4

- the CLI will now error if you pass any caliban keyword arguments AFTER the
python module name, but before `--`. In previous versions, if you did something like
Expand Down Expand Up @@ -404,11 +407,11 @@ Instead of
pip install .
```

# 0.1.3.1
## 0.1.3.1

- Minor bugfix; I was calling "len" on an iterator, not a list.

# 0.1.3
## 0.1.3

This version:

Expand All @@ -421,7 +424,7 @@ This version:

If you like you can set `-v 1` to see the full spec output.

# 0.1.2
## 0.1.2

- `caliban.cloud.types` has lots of enums and types that make it easier to code
well against AI Platform.
Expand Down
6 changes: 3 additions & 3 deletions caliban/docker/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@

t = Terminal()

DEV_CONTAINER_ROOT = "gcr.io/blueshift-playground/blueshift"
DEV_CONTAINER_ROOT = "probcomp/caliban"
DEFAULT_GPU_TAG = "gpu-ubuntu1804-py37-cuda101"
DEFAULT_CPU_TAG = "cpu-ubuntu1804-py37"
DEFAULT_CPU_TAG = "cpu-ubuntu2204-py310"
TF_VERSIONS = {"2.2.0", "1.12.3", "1.14.0", "1.15.0"}
DEFAULT_WORKDIR = "/usr/app"
CREDS_DIR = "/.creds"
Expand Down Expand Up @@ -731,7 +731,7 @@ def build_image(
path=".", caliban_config=caliban_config) as launcher_config:

cache_args = ["--no-cache"] if no_cache else []
cmd = ["docker", "build", "--platform", "linux/amd64"] + cache_args + \
cmd = ["docker", "build", "--platform", "linux/x86_64"] + cache_args + \
["--iidfile", id_file.name] + \
["--rm", "-f-", build_path]

Expand Down
2 changes: 1 addition & 1 deletion caliban/platform/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def _run_cmd(job_mode: c.JobMode,
run_args = []

runtime = ["--runtime", "nvidia"] if c.gpu(job_mode) else []
return ["docker", "run", "--platform", "linux/amd64"
return ["docker", "run", "--platform", "linux/x86_64"
] + runtime + ["--ipc", "host"] + run_args


Expand Down
113 changes: 55 additions & 58 deletions dockerfiles/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2020 Google LLC
# Copyright 2020-2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -12,84 +12,78 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# This builds the base images that we can use for development at Blueshift.
# Tensorflow 2.1 by default, but we can override the image when we call docker.
#
# docker build -t gcr.io/blueshift-playground/blueshift:cpu -f- . <Dockerfile
#
# docker push gcr.io/blueshift-playground/blueshift:cpu
#
# docker build --build-arg BASE_IMAGE=tensorflow/tensorflow:2.1.0-gpu-py3 -t gcr.io/blueshift-playground/blueshift:gpu -f- . <Dockerfile
#
# docker push gcr.io/blueshift-playground/blueshift:gpu
#
# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/dockerfiles/assembler.py

ARG BASE_IMAGE=ubuntu:18.04
ARG BASE_IMAGE=ubuntu:22.04

FROM $BASE_IMAGE
MAINTAINER Sam Ritchie <[email protected]>
MAINTAINER Sam Ritchie <[email protected]>

ARG GCLOUD_ARCHIVE=google-cloud-cli-441.0.0-linux-x86_64.tar.gz
ARG GCLOUD_LOC=/usr/local/gcloud
ARG PYTHON_VERSION=3.7
ARG JULIA_LOC=/usr/local/julia
ARG PYTHON_VERSION=3.10
ARG JULIA_VERSION=1.9.2
ARG ARCH=x86_64

# minicoda release archive is here: https://repo.anaconda.com/miniconda
# see the docs here for managing python versions with conda:
# https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-python.html
ARG MINICONDA_VERSION=py37_4.8.2

LABEL maintainer="[email protected]"
ARG MINICONDA_VERSION=py310_23.5.2-0

# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
LABEL maintainer="[email protected]"

# Install git so that users can declare git dependencies, and python3 plus
# python3-virtualenv so we can generate an isolated Python environment inside
# the container.
# Install git so that users can declare git dependencies, and python3 so
# miniconda we can generate an isolated Python environment inside the container.
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
python3 \
python3-virtualenv \
wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
build-essential \
git \
cmake \
ca-certificates \
wget \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Some tools expect a "python" binary.
RUN ln -s $(which python3) /usr/local/bin/python
RUN ln -s $(which python${PYTHON_VERSION}) /usr/local/bin/python

# install the google cloud SDK.
RUN wget -nv \
https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz && \
mkdir -m 777 ${GCLOUD_LOC} && \
tar xvzf google-cloud-sdk.tar.gz -C ${GCLOUD_LOC} && \
rm google-cloud-sdk.tar.gz && \
${GCLOUD_LOC}/google-cloud-sdk/install.sh --usage-reporting=false \
--path-update=false --bash-completion=false \
--disable-installation-options && \
rm -rf /root/.config/* && \
ln -s /root/.config /config && \
# Remove the backup directory that gcloud creates
rm -rf ${GCLOUD_LOC}/google-cloud-sdk/.install/.backup

# Path configuration
https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/${GCLOUD_ARCHIVE} && \
mkdir -m 777 ${GCLOUD_LOC} && \
tar xvzf ${GCLOUD_ARCHIVE} -C ${GCLOUD_LOC} && \
rm ${GCLOUD_ARCHIVE} && \
${GCLOUD_LOC}/google-cloud-sdk/install.sh --usage-reporting=false \
--path-update=false --bash-completion=false \
--disable-installation-options && \
rm -rf /root/.config/* && \
ln -s /root/.config /config && \
# Remove the backup directory that gcloud creates
rm -rf ${GCLOUD_LOC}/google-cloud-sdk/.install/.backup

# Add the Cloud SDK to the path:
ENV PATH $PATH:${GCLOUD_LOC}/google-cloud-sdk/bin

COPY scripts/bashrc /etc/bash.bashrc

# Install Miniconda and prep the system to activate our custom environment.
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda clean -tipsy && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> /etc/bash.bashrc && \
echo "conda activate caliban" >> /etc/bash.bashrc

RUN yes | /opt/conda/bin/conda create --name caliban python=${PYTHON_VERSION} && /opt/conda/bin/conda clean --all

# This allows a user to install packages in the conda environment once it
# launches.
RUN chmod -R 757 /opt/conda/envs/caliban && mkdir /.cache && chmod -R 757 /.cache
RUN wget -nv https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-${ARCH}.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -u -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda clean -afy && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> /etc/bash.bashrc && \
echo "conda activate caliban" >> /etc/bash.bashrc

RUN yes | /opt/conda/bin/conda create \
--name caliban python=${PYTHON_VERSION} \
&& /opt/conda/bin/conda clean --all

## This allows a user to:
# - read the system-wide bashrc file
# - install packages into the conda environment once it launches.
RUN chmod -R 644 /etc/bash.bashrc \
&& chmod -R 757 /opt/conda/envs/caliban \
&& mkdir /.cache \
&& chmod -R 757 /.cache

# This is equivalent to activating the env.
ENV PATH /opt/conda/envs/caliban/bin:$PATH
Expand All @@ -98,3 +92,6 @@ ENV PATH /opt/conda/envs/caliban/bin:$PATH
# as a virtual environment, so it installs editables properly
# See https://github.com/conda/conda/issues/5861 for details
ENV PIP_SRC /opt/conda/envs/caliban/pipsrc

# introduced in pip 22.1, silences the warning since we are NOT using root.
ENV PIP_ROOT_USER_ACTION ignore
Loading