Skip to content

Commit

Permalink
Replace Launcher Script with Justfile & Standalone Scripts + Instruct…
Browse files Browse the repository at this point in the history
…ions for External (#239)

Full support for `just` is slated for the next release. While this PR adds the `justfile`, it also adds stand-alone scripts under `internal/scripts/` that external users can use for the development lifecycle. Instructions for these have been added to the README.md. Documentation for the `justfile` has been moved to `internal/README_justfile.md`.

--

Replaces the `launch.sh` script with a new `justfile` with commands that support the development lifecycle. New commands ('recipes' in `just`) allow for building the development or release image, checking the setup (checking installed programs & their versions, creating the `.env` file or reading it, etc.), running interactive programs on either image, or running `pytest` in the release image. These new recipes are:
```
    build-dev              # Builds the development image.
    build-release          # Builds the release image.
    run-dev cmd='bash'     # Runs an interactive program in the development bionemo image.
    run-release cmd='bash' # Runs an interactive program in the release bionemo image.
    setup                  # Checks for installed programs (docker, git, etc.), their versions, and grabs the latest cache image.
    test                   # Executes pytest in the release image.
```

The image building commands are executed as either `just build-dev` or `just build-release` while their corresponding container-running commands are `just run-{dev,release} (cmd)`, where `cmd` is the path to an executable program in the image (it defaults to `bash`, like `launch.sh` did before). The `just test` recipie runs `pytest` on all of the bionemo code & produces an HTML code coverage report, which is written to `htmlcov/` on the host machine (this directory is volume-mounted). This test command also runs the notebook tests. Note that `just setup` is a dependency of all of these commands: meaning the setup will _always_ be performed.

To support building a development image, the `Dockerfile` has been modified to have a new `development` target. This builds off of `dev`, but keeps the `3rdparty/` and `sub-packages/` code in `/workspace/bionemo2`. Additionally, it installs all of the local code (`bionemo-*` and `3rdparty/*`) as **editable** installs. This enables development work as the code can be mounted and modified _outside_ of the `dist-packages` folder, which is owned by root. 

Note that `just build-dev` builds this new `development` target while `just build-release` builds the `release` target, which is equivalent to building the entire image as `release` is the final stage.
  • Loading branch information
malcolmgreaves authored Oct 5, 2024
1 parent d16c495 commit 1d7602e
Show file tree
Hide file tree
Showing 9 changed files with 475 additions and 302 deletions.
3 changes: 2 additions & 1 deletion CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ docs/CONTRIBUTING.md @dorotat-nv @jstjohn @malcolmgreaves @jomitchellnv @pstjohn
.devcontainer @dorotat-nv @malcolmgreaves @pstjohn
CODEOWNERS @dorotat-nv @jomitchellnv @jstjohn @malcolmgreaves @pstjohn @trvachov
Dockerfile @dorotat-nv @jomitchellnv @jstjohn @malcolmgreaves @ohadmo @pstjohn @trvachov
launch.sh @dorotat-nv @jomitchellnv @jstjohn @malcolmgreaves @ohadmo @pstjohn @trvachov
justfile @dorotat-nv @jomitchellnv @jstjohn @malcolmgreaves @ohadmo @pstjohn @trvachov
internal/ @dorotat-nv @jomitchellnv @jstjohn @malcolmgreaves @ohadmo @pstjohn @trvachov
3rdparty @jomitchellnv @jstjohn @malcolmgreaves @ohadmo @pstjohn @trvachov
pyproject.toml @jomitchellnv @jstjohn @malcolmgreaves @ohadmo @pstjohn @trvachov
requirements-cve.txt @dorotat-nv @jomitchellnv @jstjohn @malcolmgreaves @ohadmo @pstjohn @trvachov
Expand Down
41 changes: 31 additions & 10 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ COPY ./sub-packages /workspace/bionemo2/sub-packages
RUN --mount=type=bind,source=./.git,target=./.git \
--mount=type=bind,source=./requirements-test.txt,target=/requirements-test.txt \
--mount=type=bind,source=./requirements-cve.txt,target=/requirements-cve.txt \
<<EOT
<<EOF
set -eo pipefail
uv pip install --no-build-isolation \
./3rdparty/* \
Expand All @@ -104,7 +104,7 @@ uv pip install --no-build-isolation \
-r /requirements-test.txt
rm -rf ./3rdparty
rm -rf /tmp/*
EOT
EOF

# In the devcontainer image, we just copy over the finished `dist-packages` folder from the build image back into the
# base pytorch container. We can then set up a non-root user and uninstall the bionemo and 3rd-party packages, so that
Expand All @@ -114,13 +114,13 @@ FROM ${BASE_IMAGE} AS dev

RUN --mount=type=cache,id=apt-cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,id=apt-lib,target=/var/lib/apt,sharing=locked \
<<EOT
<<EOF
set -eo pipefail
apt-get update -qy
apt-get install -qyy \
sudo
rm -rf /tmp/* /var/tmp/*
EOT
EOF

# Create a non-root user to use inside a devcontainer.
ARG USERNAME=bionemo
Expand All @@ -133,13 +133,13 @@ RUN groupadd --gid $USER_GID $USERNAME \

# Here we delete the dist-packages directory from the pytorch base image, and copy over the dist-packages directory from
# the build image. This ensures we have all the necessary dependencies installed (megatron, nemo, etc.).
RUN <<EOT
RUN <<EOF
set -eo pipefail
rm -rf /usr/local/lib/python3.10/dist-packages
mkdir -p /usr/local/lib/python3.10/dist-packages
chmod 777 /usr/local/lib/python3.10/dist-packages
chmod 777 /usr/local/bin
EOT
EOF

USER $USERNAME

Expand All @@ -153,28 +153,49 @@ ENV UV_LINK_MODE=copy \
UV_SYSTEM_PYTHON=true

RUN --mount=type=bind,source=./requirements-dev.txt,target=/workspace/bionemo2/requirements-dev.txt \
--mount=type=cache,id=uv-cache,target=/root/.cache,sharing=locked <<EOT
--mount=type=cache,id=uv-cache,target=/root/.cache,sharing=locked <<EOF
set -eo pipefail
uv pip install -r /workspace/bionemo2/requirements-dev.txt
rm -rf /tmp/*
EOT
EOF

RUN <<EOT
RUN <<EOF
set -eo pipefail
rm -rf /usr/local/lib/python3.10/dist-packages/bionemo*
pip uninstall -y nemo_toolkit megatron_core
EOT
EOF

FROM dev AS development

WORKDIR /workspace/bionemo2
COPY --from=bionemo2-base /workspace/bionemo2/ .
# because of the `rm -rf ./3rdparty` in bionemo2-base
COPY ./3rdparty ./3rdparty
USER root
RUN <<EOF
set -eo pipefail
find . -name __pycache__ -type d -print | xargs rm -rf
for sub in ./3rdparty/* ./sub-packages/bionemo-*; do
uv pip install --no-deps --no-build-isolation --editable $sub
done
EOF
ARG USERNAME=bionemo
USER $USERNAME

# The 'release' target needs to be last so that it's the default build target. In the future, we could consider a setup
# similar to the devcontainer above, where we copy the dist-packages folder from the build image into the release image.
# This would reduce the overall image size by reducing the number of intermediate layers. In the meantime, we match the
# existing release image build by copying over remaining files from the repo into the container.
FROM bionemo2-base AS release

RUN mkdir -p /workspace/bionemo2/.cache/

COPY VERSION .
COPY ./scripts ./scripts
COPY ./README.md ./

# Copy over folders so that the image can run tests in a self-contained fashion.
COPY ./ci/scripts ./ci/scripts
COPY ./docs ./docs

RUN chmod 777 -R /workspace/bionemo2/
75 changes: 45 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,7 @@
# BioNeMo2 Repo
To get started, please build the docker container using
```bash
./launch.sh build
```

Launch a container from the build image by executing
```bash
./launch.sh dev
```

All `bionemo2` code is partitioned into independently installable namespace packages. These live under the `sub-packages/` directory.


## Downloading artifacts
Set the AWS access info in your `.env` in the host container prior to running docker:

```bash
AWS_ACCESS_KEY_ID="team-bionemo"
AWS_SECRET_ACCESS_KEY=$(grep aws_secret_access_key ~/.aws/config | cut -d' ' -f 3)
AWS_REGION="us-east-1"
AWS_ENDPOINT_URL="https://pbss.s8k.io"
```
then, running tests should download the test data to a cache location when first invoked.

For more information on adding new test artifacts, see the documentation in [bionemo.testing.data.load](sub-packages/bionemo-testing/src/bionemo/testing/data/README.md)

All `bionemo2` code is partitioned into independently installable namespace packages.
These live under the `sub-packages/` directory.

## Initializing 3rd-party dependencies as git submodules

Expand All @@ -44,18 +21,56 @@ To download the pinned versions of these submodules within an existing git repos
git submodule update --init --recursive
```

Different branches of the repo can have different pinned versions of these third-party submodules. To update submodules
after switching branches (or pulling recent changes), run
Different branches of the repo can have different pinned versions of these third-party submodules. Make sure you
update submodules after switching branches or pulling recent changes!

To configure git to automatically update submodules when switching branches, run
```bash
git submodule update
git config submodule.recurse true
```
**NOTE**: this setting will not download **new** or remove **old** submodules with the branch's changes.
You will have to run the full `git submodule update --init --recursive` command in these situations.

To configure git to automatically update submodules when switching branches, run
## First Time Setup
After cloning the repository, you need to run the setup script **first**:
```bash
./internal/scripts/setup_env_file.sh
```
This will return an exit code of 1 on a first time run.

## Release Image Building
To build the release image, run the following script:
```bash
git config submodule.recurse true
DOCKER_BUILDKIT=1 ./ci/scripts/build_docker_image.sh \
-regular-docker-builder \
-image-name "nvcr.io/nvidian/cvai_bnmo_trng/bionemo:bionemo2-$(git rev-parse HEAD)"
```

## Development Image Building
To build the development image, run the following script:
```bash
./internal/scripts/build_dev_image.sh
```

## Interactive Shell in Development Image
After building the development image, you can start a container from it and open a bash shell in it by executing:
```bash
./internal/scripts/run_dev.sh
```

## Downloading artifacts
Set the AWS access info in environment prior to running the dev-container launch script:
```bash
AWS_ACCESS_KEY_ID="team-bionemo"
AWS_SECRET_ACCESS_KEY=$(grep aws_secret_access_key ~/.aws/config | cut -d' ' -f 3)
AWS_REGION="us-east-1"
AWS_ENDPOINT_URL="https://pbss.s8k.io"
```
then, running tests should download the test data to a cache location when first invoked.

For more information on adding new test artifacts, see the documentation in
[`bionemo.testing.data.load`](sub-packages/bionemo-testing/src/bionemo/testing/data/README.md).


### Updating pinned versions of NeMo / Megatron-LM

Expand Down
24 changes: 24 additions & 0 deletions internal/README_justfile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
To get started, first download [`just`](https://github.com/casey/just). You can use [Homebrew](https://brew.sh/) on OS X & Linux:
```bash
brew install just
```

**Once you have `just`, you need to run the `just setup` command once _before_ you can run any other command.**
Thus, if it's your first time, you will need to do this first:
```bash
just setup
just <command you want to run>
```

You can see all of the commands for the development cycle by running `just`. These commands are executable as
`just X` for each command `X` listed:
```
build-dev # Builds the development image.
build-release # Builds the release image.
run-dev cmd='bash' # Runs an interactive program in the development bionemo image.
run-release cmd='bash' # Runs an interactive program in the release bionemo image.
setup # Checks for installed programs (docker, git, etc.), their versions, and grabs the latest cache image.
test # Executes pytest in the release image.
```

You can combine `just` commands together. For example, run `just build-dev build-release` to build both images.
17 changes: 17 additions & 0 deletions internal/scripts/build_dev_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash

set -euo pipefail

COMMIT=$(git rev-parse HEAD)
DATE=$(date --iso-8601=seconds -u)

set -x
DOCKER_BUILDKIT=1 docker buildx build \
-t "nvcr.io/nvidian/cvai_bnmo_trng/bionemo:dev-bionemo2-${COMMIT}" \
--target="development" \
--load \
--cache-to type=inline \
--label com.nvidia.bionemo.git_sha=${COMMIT} \
--label com.nvidia.bionemo.created_at=${DATE} \
-f ./Dockerfile \
.
61 changes: 61 additions & 0 deletions internal/scripts/run_dev.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env bash

set -euo pipefail

source .env

LOCAL_REPO_PATH="$(realpath $(pwd))"

if [[ "$(basename ${LOCAL_REPO_PATH})" != "bionemo-fw-ea" ]]; then
echo "ERROR: must run this script from the bionemo repository root!"
exit 1
fi

COMMIT=$(git rev-parse HEAD)

DOCKER_REPO_PATH="/workspace/bionemo2"

DOCKER_VERSION=$(docker version | grep -i version | head -1 | awk '{print $2}')
DOCKER_VERSION_WITH_GPU_SUPPORT='19.03.0'
if [ "$DOCKER_VERSION_WITH_GPU_SUPPORT" == "$(echo -e "$DOCKER_VERSION\n$DOCKER_VERSION_WITH_GPU_SUPPORT" | sort -V | head -1)" ]; then
PARAM_RUNTIME="--gpus all"
else
PARAM_RUNTIME="--runtime=nvidia"
fi

echo "docker run ... nvcr.io/nvidian/cvai_bnmo_trng/bionemo:dev-bionemo2-${COMMIT} bash"
echo '---------------------------------------------------------------------------------------------'
# DO NOT set -x: we **DO NOT** want to leak credentials to STDOUT! (API_KEY)
docker run \
--rm \
-it \
--network host \
${PARAM_RUNTIME} \
-p ${JUPYTER_PORT}:8888 \
--shm-size=4g \
-e TMPDIR=/tmp/ \
-e NUMBA_CACHE_DIR=/tmp/ \
-e BIONEMO_HOME=$DOCKER_REPO_PATH \
-e WANDB_API_KEY=$WANDB_API_KEY \
-e NGC_CLI_API_KEY=$NGC_CLI_API_KEY \
-e NGC_CLI_ORG=$NGC_CLI_ORG \
-e NGC_CLI_TEAM=$NGC_CLI_TEAM \
-e NGC_CLI_FORMAT_TYPE=$NGC_CLI_FORMAT_TYPE \
-e AWS_ENDPOINT_URL \
-e AWS_REGION \
-e AWS_SECRET_ACCESS_KEY \
-e AWS_ACCESS_KEY_ID \
-e HOME=${DOCKER_REPO_PATH} \
-w ${DOCKER_REPO_PATH} \
-v ${LOCAL_RESULTS_PATH}:${DOCKER_RESULTS_PATH} \
-v ${LOCAL_DATA_PATH}:${DOCKER_DATA_PATH} \
-v ${LOCAL_MODELS_PATH}:${DOCKER_MODELS_PATH} \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-v /etc/shadow:/etc/shadow:ro \
-v ${HOME}/.ssh:${DOCKER_REPO_PATH}/.ssh:ro \
-v ${LOCAL_REPO_PATH}/htmlcov:/${DOCKER_REPO_PATH}/htmlcov \
-u $(id -u):$(id -g) \
-v ${LOCAL_REPO_PATH}:${DOCKER_REPO_PATH} \
"nvcr.io/nvidian/cvai_bnmo_trng/bionemo:dev-bionemo2-${COMMIT}" \
bash
Loading

0 comments on commit 1d7602e

Please sign in to comment.