Skip to content

2015 improve explanation of datalist format #2019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 32 additions & 1 deletion auto3dseg/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,44 @@ We provide [a two-minute example](notebooks/auto3dseg_hello_world.ipynb) for use

To further demonstrate the capabilities of **Auto3DSeg**, [here](./tasks/instance22/README.md) is the detailed performance of the algorithm in **Auto3DSeg**, which won 2nd place in the MICCAI 2022 challenge **[INSTANCE22: The 2022 Intracranial Hemorrhage Segmentation Challenge on Non-Contrast Head CT (NCCT)](https://instance.grand-challenge.org/)**

## Running With Your Own Data

To run Auto3DSeg on your own dataset, you need to build a `datalist.json` file, and pass it to the AutoRunner.

The datalist format is based on the datasets released by the [Medical Segmentation Decathlon](http://medicaldecathlon.com).
See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format.

For the AutoRunner, we only need the `training` list in the JSON, it does not use any other fields.
The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds (the number of folds is hard-coded to 5).
If you do add the cross-validation folds beforehand, the AutoRunner will use these by default.
You can also choose to include a `validation` list in the JSON file, in which case the AutoRunner will disable cross-validation and use the specified validation set.
Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. If you are using multi-modal scans, it is possible to enter lists of image paths for both the `image` and `label` keys; MONAI will stack them into channels.
In short, your `datalist.json` file should look like this:

```
{
"name": "Example datalist.json"
"training":
[
{"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"},
{"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"},
...
]
}

```

The AutoRunner will create a `work_dir` folder in the directory from which it is ran, which will contain the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to keep track of which datalist file the models are trained on.

See the description below or the file [run_with_minimal_input.md](docs/run_with_minimal_input.md) to use your datalist with the AutoRunner.

## Reference Python APIs for Auto3DSeg

**Auto3DSeg** offers users different levels of APIs to run pipelines that suit their needs.

### 1. Run with Minimal Input using ```AutoRunner```

The user needs to provide a data list (".json" file) for the new task and data root. A typical data list is as this [example](tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json). A sample datalist for an existing MSD formatted dataset can be created using [this notebook](notebooks/msd_datalist_generator.ipynb). After creating the data list, the user can create a simple "task.yaml" file (shown below) as the minimum input for **Auto3DSeg**.
The user needs to provide a data list (".json" file) for the new task and data root. A typical data list is as this [example](tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json). [This notebook](notebooks/msd_crossval_datalist_generator.ipynb) features an example to create a datalist with cross-validation folds from an existing MSD dataset. After creating the data list, the user can create a simple "task.yaml" file (shown below) as the minimum input for **Auto3DSeg**.

```
modality: CT
Expand Down
57 changes: 17 additions & 40 deletions auto3dseg/docs/run_with_minimal_input.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,55 +18,32 @@ if os.path.exists(root):
download_and_extract(resource, compressed_file, root)
```

**Step 1.** Provide the following data list (a ".json" file) for a new task and the data root. The typical data list is shown as follows.
**Step 1.** Provide a `datalist.json` file.
See the documentation under the `load_decathlon_datalist` function in `monai.data.decathlon_datalist` for details on the file format.

For the AutoRunner, you only need the `training` field with its list of training files:
```
{
"training": [
{
"fold": 0,
"image": "image_001.nii.gz",
"label": "label_001.nii.gz"
},
{
"fold": 0,
"image": "image_002.nii.gz",
"label": "label_002.nii.gz"
},
{
"fold": 1,
"image": "image_003.nii.gz",
"label": "label_001.nii.gz"
},
{
"fold": 2,
"image": "image_004.nii.gz",
"label": "label_002.nii.gz"
},
{
"fold": 3,
"image": "image_005.nii.gz",
"label": "label_003.nii.gz"
},
{
"fold": 4,
"image": "image_006.nii.gz",
"label": "label_004.nii.gz"
}
],
"testing": [
{
"image": "image_010.nii.gz"
}
]
"training":
[
{"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"},
{"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"},
...
]
}

```
In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds (always 5). All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates.
If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation.
It is recommended to add a `name` field and any other metadata fields that allow you to track which version of your dataset the models are trained on.

Save the file to `./datalist.json`.

**Step 2.** Prepare "task.yaml" with the necessary information as follows.

```
modality: CT
datalist: "./task.json"
modality: CT # or MRI
datalist: "./datalist.json"
dataroot: "/workspace/data/task"
```

Expand Down
8 changes: 2 additions & 6 deletions auto3dseg/notebooks/auto_runner.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -273,13 +273,9 @@
"\n",
"`set_training_params` in `AutoRunner` provides an interface to change all algorithms' training parameters in one line. \n",
"\n",
"NOTE: \n",
"**Auto3DSeg** uses MONAI bundle templates to perform training, validation, and inference.\n",
"The number of epochs/iterations of training is specified by the config files in each template.\n",
"Users can override these these values in the bundle templates.\n",
"But users should consider that some bundle templates may use `num_iterations` and other may use `num_epochs` to iterate.\n",
"As an example, see the code block below, which specifies e.g. the number of epochs used for training. Note that some algorithms may treat this as a maximum number of epochs.\n",
"\n",
"For demo purposes, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters.\n",
"NOTE: \n",
"The setup works fine for a machine that has GPUs less than or equal to 8.\n",
"The datalist in this example is only using a subset of the original dataset.\n",
"Users need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned.\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,15 @@
"See the License for the specific language governing permissions and \n",
"limitations under the License. \n",
"\n",
"# Datalist Generator"
"# Datalist Cross-Validation Folds Generator"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook contains an example to add cross-validation folds to an existing Medical Segmentation Decathlon datalist, in this case the one of Task09_Spleen. \n",
"When running repeated experiments, it can be beneficial to create cross-validation folds beforehand."
]
},
{
Expand Down