diff --git a/auto3dseg/README.md b/auto3dseg/README.md index e13996b07d..64c68dac62 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -56,13 +56,44 @@ We provide [a two-minute example](notebooks/auto3dseg_hello_world.ipynb) for use To further demonstrate the capabilities of **Auto3DSeg**, [here](./tasks/instance22/README.md) is the detailed performance of the algorithm in **Auto3DSeg**, which won 2nd place in the MICCAI 2022 challenge **[INSTANCE22: The 2022 Intracranial Hemorrhage Segmentation Challenge on Non-Contrast Head CT (NCCT)](https://instance.grand-challenge.org/)** +## Running With Your Own Data + +To run Auto3DSeg on your own dataset, you need to build a `datalist.json` file, and pass it to the AutoRunner. + +The datalist format is based on the datasets released by the [Medical Segmentation Decathlon](http://medicaldecathlon.com). +See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format. + +For the AutoRunner, we only need the `training` list in the JSON, it does not use any other fields. +The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds (the number of folds is hard-coded to 5). +If you do add the cross-validation folds beforehand, the AutoRunner will use these by default. +You can also choose to include a `validation` list in the JSON file, in which case the AutoRunner will disable cross-validation and use the specified validation set. +Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. If you are using multi-modal scans, it is possible to enter lists of image paths for both the `image` and `label` keys; MONAI will stack them into channels. +In short, your `datalist.json` file should look like this: + +``` +{ + "name": "Example datalist.json" + "training": + [ + {"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"}, + {"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"}, + ... + ] +} + +``` + +The AutoRunner will create a `work_dir` folder in the directory from which it is ran, which will contain the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to keep track of which datalist file the models are trained on. + +See the description below or the file [run_with_minimal_input.md](docs/run_with_minimal_input.md) to use your datalist with the AutoRunner. + ## Reference Python APIs for Auto3DSeg **Auto3DSeg** offers users different levels of APIs to run pipelines that suit their needs. ### 1. Run with Minimal Input using ```AutoRunner``` -The user needs to provide a data list (".json" file) for the new task and data root. A typical data list is as this [example](tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json). A sample datalist for an existing MSD formatted dataset can be created using [this notebook](notebooks/msd_datalist_generator.ipynb). After creating the data list, the user can create a simple "task.yaml" file (shown below) as the minimum input for **Auto3DSeg**. +The user needs to provide a data list (".json" file) for the new task and data root. A typical data list is as this [example](tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json). [This notebook](notebooks/msd_crossval_datalist_generator.ipynb) features an example to create a datalist with cross-validation folds from an existing MSD dataset. After creating the data list, the user can create a simple "task.yaml" file (shown below) as the minimum input for **Auto3DSeg**. ``` modality: CT diff --git a/auto3dseg/docs/run_with_minimal_input.md b/auto3dseg/docs/run_with_minimal_input.md index 0ec8d82872..8a43e3fd19 100644 --- a/auto3dseg/docs/run_with_minimal_input.md +++ b/auto3dseg/docs/run_with_minimal_input.md @@ -18,55 +18,32 @@ if os.path.exists(root): download_and_extract(resource, compressed_file, root) ``` -**Step 1.** Provide the following data list (a ".json" file) for a new task and the data root. The typical data list is shown as follows. +**Step 1.** Provide a `datalist.json` file. +See the documentation under the `load_decathlon_datalist` function in `monai.data.decathlon_datalist` for details on the file format. +For the AutoRunner, you only need the `training` field with its list of training files: ``` { - "training": [ - { - "fold": 0, - "image": "image_001.nii.gz", - "label": "label_001.nii.gz" - }, - { - "fold": 0, - "image": "image_002.nii.gz", - "label": "label_002.nii.gz" - }, - { - "fold": 1, - "image": "image_003.nii.gz", - "label": "label_001.nii.gz" - }, - { - "fold": 2, - "image": "image_004.nii.gz", - "label": "label_002.nii.gz" - }, - { - "fold": 3, - "image": "image_005.nii.gz", - "label": "label_003.nii.gz" - }, - { - "fold": 4, - "image": "image_006.nii.gz", - "label": "label_004.nii.gz" - } - ], - "testing": [ - { - "image": "image_010.nii.gz" - } - ] + "training": + [ + {"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"}, + {"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"}, + ... + ] } + ``` +In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds (always 5). All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates. +If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation. +It is recommended to add a `name` field and any other metadata fields that allow you to track which version of your dataset the models are trained on. + +Save the file to `./datalist.json`. **Step 2.** Prepare "task.yaml" with the necessary information as follows. ``` -modality: CT -datalist: "./task.json" +modality: CT # or MRI +datalist: "./datalist.json" dataroot: "/workspace/data/task" ``` diff --git a/auto3dseg/notebooks/auto_runner.ipynb b/auto3dseg/notebooks/auto_runner.ipynb index 0c1bc5ba48..2d3bc1b266 100644 --- a/auto3dseg/notebooks/auto_runner.ipynb +++ b/auto3dseg/notebooks/auto_runner.ipynb @@ -273,13 +273,9 @@ "\n", "`set_training_params` in `AutoRunner` provides an interface to change all algorithms' training parameters in one line. \n", "\n", - "NOTE: \n", - "**Auto3DSeg** uses MONAI bundle templates to perform training, validation, and inference.\n", - "The number of epochs/iterations of training is specified by the config files in each template.\n", - "Users can override these these values in the bundle templates.\n", - "But users should consider that some bundle templates may use `num_iterations` and other may use `num_epochs` to iterate.\n", + "As an example, see the code block below, which specifies e.g. the number of epochs used for training. Note that some algorithms may treat this as a maximum number of epochs.\n", "\n", - "For demo purposes, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters.\n", + "NOTE: \n", "The setup works fine for a machine that has GPUs less than or equal to 8.\n", "The datalist in this example is only using a subset of the original dataset.\n", "Users need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned.\n", diff --git a/auto3dseg/notebooks/msd_datalist_generator.ipynb b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb similarity index 97% rename from auto3dseg/notebooks/msd_datalist_generator.ipynb rename to auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb index 3b9be29e8c..7a76409603 100644 --- a/auto3dseg/notebooks/msd_datalist_generator.ipynb +++ b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb @@ -19,7 +19,15 @@ "See the License for the specific language governing permissions and \n", "limitations under the License. \n", "\n", - "# Datalist Generator" + "# Datalist Cross-Validation Folds Generator" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook contains an example to add cross-validation folds to an existing Medical Segmentation Decathlon datalist, in this case the one of Task09_Spleen. \n", + "When running repeated experiments, it can be beneficial to create cross-validation folds beforehand." ] }, {