Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,14 @@
"-p", "6006:6006"
],

"containerEnv": {
// We always want to manage CUDA_VISIBLE_DEVICES ourselves.
"RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES": "0",

// We set CUDA_VISIBLE_DEVICES here, as each container will need to set visible GPUs independently.
"CUDA_VISIBLE_DEVICES": "0,1"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to set this dynamically? what if someone has like, e.g., 10 GPUs, or only 1 GPU?

Copy link
Collaborator Author

@jacksonjacobs1 jacksonjacobs1 Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded values were set for simplicity. This PR is not yet ready for review - I still need to test whether QA works with a multi-node, multi-gpu cluster.

That said, there are ways to set CUDA_VISIBLE_DEVICES dynamically:

  1. Run export command after container setup (can be added to either Dockerfile, devcontainer.json, or docker compose file):

    export CUDA_VISIBLE_DEVICES=$(nvidia-smi --query-gpu=uuid --format=csv,noheader | paste -sd "," -)
  2. Using nvidia-container-toolkit's API:
    In devcontainer.json

    {
      "name": "My GPU Dev Container",
      "runArgs": ["--gpus=all"],
      "workspaceFolder": "/workspace"
    }

    In docker compose yaml:

    services:
      app:
        image: your-image
        runtime: nvidia
        environment:
          - NVIDIA_VISIBLE_DEVICES=all
    

But it's unclear to me whether option 2 avoids the bug that you noticed with ray requiring CUDA_VISIBLE_DEVICES to be explicitly set:
ray-project/ray#49985 (comment)

If not, we could run the following command within the container to ensure CUDA_VISIBLE_DEVICES is set:

export CUDA_VISIBLE_DEVICES=$NVIDIA_VISIBLE_DEVICES

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coo coo coo

},

"postCreateCommand": "ln -sf /opt/QuickAnnotator/quickannotator/client/package.json /opt/package.json && ln -sf /opt/QuickAnnotator/quickannotator/client/package-lock.json /opt/package-lock.json && uv pip install -e ."

}
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,21 +54,26 @@ By default, QuickAnnotator uses a SQLite database. If you would like to use a po
git checkout v2.0
```

2. Modify `devcontainer.json` to suit your use case. Particularly, change the value of `CUDA_VISIBLE_DEVICES` to your desired GPU ids.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see - are folks likely to read the readme in detail though? or perhaps we should have some explicit messages appear on the screen/log during bootup to draw their attention to these components?

2. Within VS Code, open the cloned repository and click on the "Reopen in Container" button to build the devcontainer. This will create a docker container with all the necessary dependencies to run QuickAnnotator.
![image](https://github.com/user-attachments/assets/b776577f-a4c2-4eb8-858c-c603ac20cc6d)


### Usage
1. Connect to a Ray cluster. Ray is used to run operations which require asyncronous processing. There are three ways to connect to a Ray cluster:
- **Default**: By default QA will initialize a local Ray cluster within the docker container.
- Note: The default ray cluster does not host the Ray dashboard.
Once the devcontainer is built, run the following commands within the container terminal to use QuickAnnotator

1. Connect to a Ray cluster. Ray is used to run operations which require asyncronous processing. There are two ways to connect to a Ray cluster:
- **Manual local cluster**: Run the following command to start a Ray cluster with the Ray dashboard:
```bash
ray start --head --dashboard-host 0.0.0.0
```
- **Pre-existing cluster**: If you would like QA to connect to an existing Ray cluster, use the `--cluster_address` argument.
- **Pre-existing cluster**: To add the container to an existing cluster, use the `--address` argument.
```bash
ray start --address <cluster_address>
```

2. Once the devcontainer is built, you can run the following command to start the QuickAnnotator server:
2. Run the following command to start the QuickAnnotator server:
```
(venv) root@e4392ecdd8ef:/opt/QuickAnnotator# quickannotator
* Serving Flask app '__main__'
Expand Down
6 changes: 4 additions & 2 deletions quickannotator/dl/ray_jackson.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import logging
import os
from quickannotator.db.logging import LoggingManager
import ray
from ray.train import ScalingConfig
Expand Down Expand Up @@ -49,12 +50,13 @@ def start_dlproc(self, allow_pred=True):
self.setProcRunningSince()

total_gpus = ray.cluster_resources().get("GPU", 0)
self.logger.info(f"Total GPUs available: {total_gpus}")
self.logger.info(f"{os.environ['RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES']=}")
self.logger.info(f"{os.environ['CUDA_VISIBLE_DEVICES']=}")
scaling_config = ray.train.ScalingConfig(
num_workers=int(total_gpus),
use_gpu=True,
resources_per_worker={"GPU": .01},
placement_strategy="STRICT_SPREAD"
# placement_strategy="STRICT_SPREAD" #TODO: remove
)

trainer = ray.train.torch.TorchTrainer(
Expand Down