Skip to content

Commit

Permalink
Polish benchmarks page
Browse files Browse the repository at this point in the history
  • Loading branch information
philippmwirth committed Nov 22, 2023
1 parent 514ffd7 commit b7661f0
Showing 1 changed file with 85 additions and 89 deletions.
174 changes: 85 additions & 89 deletions docs/source/getting_started/benchmarks.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,31 @@
Benchmarks
===================================
We show benchmarks of the different models for self-supervised learning
and their performance on public datasets.
Implemented models and their performance on various datasets. Hyperparameters are not tuned for maximum accuracy.

List of available benchmarks:

We have benchmarks we regularly update for these datasets:

- `Imagenet`_
- `Imagenet100`_
- `ImageNette`_
- `ImageNet1k`_
- `ImageNet100`_
- `Imagenette`_
- `CIFAR-10`_

ImageNet
--------
ImageNet1k
----------

We use the ImageNet1k ILSVRC2012 split provided here: https://image-net.org/download.php.
- `Dataset <https://image-net.org/download.php>`_
- `Code <https://github.com/lightly-ai/lightly/tree/master/benchmarks/imagenet/resnet50>`_

Self-supervised training of a SimCLR model for 100 epochs with total batch size 256
takes about four days including evaluation on two GeForce RTX 4090 GPUs. You can reproduce the results with
the code at `benchmarks/imagenet/resnet50 <https://github.com/lightly-ai/lightly/tree/master/benchmarks/imagenet/resnet50>`_.
The following experiments have been conducted on a system with 2x4090 GPUs.
Training a model takes around four days for 100 epochs (35 min per epoch), including kNN, linear probing, and fine-tuning evaluation.

Evaluation settings are based on these papers:
Evaluation settings are based on the following papers:

- Linear: `SimCLR <https://arxiv.org/abs/2002.05709>`_
- Finetune: `SimCLR <https://arxiv.org/abs/2002.05709>`_
- KNN: `InstDisc <https://arxiv.org/abs/1805.01978>`_

See the `benchmarking scripts <https://github.com/lightly-ai/lightly/tree/master/benchmarks/imagenet/resnet50>`_ for details.
- kNN: `InstDisc <https://arxiv.org/abs/1805.01978>`_

.. csv-table:: Imagenet benchmark results.
:header: "Model", "Backbone", "Batch Size", "Epochs", "Linear Top1", "Linear Top5", "Finetune Top1", "Finetune Top5", "KNN Top1", "KNN Top5", "Tensorboard", "Checkpoint"
:header: "Model", "Backbone", "Batch Size", "Epochs", "Linear Top1", "Linear Top5", "Finetune Top1", "Finetune Top5", "kNN Top1", "kNN Top5", "Tensorboard", "Checkpoint"
:widths: 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20

"BarlowTwins", "Res50", "256", "100", "62.9", "84.3", "72.6", "90.9", "45.6", "73.9", "`link <https://tensorboard.dev/experiment/NxyNRiQsQjWZ82I9b0PvKg/>`_", "`link <https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_barlowtwins_2023-08-18_00-11-03/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt>`_"
Expand All @@ -41,20 +37,62 @@ See the `benchmarking scripts <https://github.com/lightly-ai/lightly/tree/master
"SwAV", "Res50", "256", "100", "67.2", "88.1", "75.4", "92.7", "49.5", "78.6", "`link <https://tensorboard.dev/experiment/Ipx4Oxl5Qkqm5Sl5kWyKKg>`_", "`link <https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_swav_2023-05-25_08-29-14/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt>`_"
"VICReg", "Res50", "256", "100", "63.0", "85.4", "73.7", "91.9", "46.3", "75.2", "`link <https://tensorboard.dev/experiment/qH5uywJbTJSzgCEfxc7yUw>`_", "`link <https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_vicreg_2023-09-11_10-53-08/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt>`_"

*\*We use square root learning rate scaling instead of linear scaling as it yields better results for smaller batch sizes. See Appendix B.1 in SimCLR paper.*
*\*We use square root learning rate scaling instead of linear scaling as it yields better results for smaller batch sizes. See Appendix B.1 in the SimCLR paper.*

Found a missing model? Track the progress of our planned benchmarks on `GitHub <https://github.com/lightly-ai/lightly/issues/1197>`_.

Imagenet100
-----------

- `Dataset <https://image-net.org/download.php>`_
- :download:`Code <benchmarks/imagenet100_benchmark.py>`

Imagenet100 is a subset of the popular ImageNet-1k dataset. It consists of 100 classes
with 1300 training and 50 validation images per class. We train the
self-supervised models from scratch on the training data. At the end of every
epoch we embed all training images and use the features for a kNN classifier
with k=20 on the test set. The reported kNN Top 1 is the max accuracy
over all epochs the model reached. All experiments use the same ResNet-18 backbone and
the default ImageNet-1k training parameters from the respective papers.

The following experiments have been conducted on a system with single A6000 GPU.
Training a model takes between 20 and 30 hours, including kNN evaluation.

.. csv-table:: Imagenet100 benchmark results
:header: "Model", "Batch Size", "Epochs", "kNN Top 1", "Runtime", "GPU Memory"
:widths: 20, 20, 20, 20, 20, 20

"BarlowTwins", "256", "200", "0.465", "1319.3 Min", "11.3 GByte"
"BYOL", "256", "200", "0.439", "1315.4 Min", "12.9 GByte"
"DINO", "256", "200", "0.518", "1868.5 Min", "17.4 GByte"
"FastSiam", "256", "200", "0.559", "1856.2 Min", "22.0 GByte"
"Moco", "256", "200", "0.560", "1314.2 Min", "13.1 GByte"
"NNCLR", "256", "200", "0.453", "1198.6 Min", "11.8 GByte"
"SimCLR", "256", "200", "0.469", "1207.7 Min", "11.3 GByte"
"SimSiam", "256", "200", "0.534", "1175.0 Min", "11.1 GByte"
"SwaV", "256", "200", "0.678", "1569.2 Min", "16.9 GByte"


Imagenette
----------

ImageNette
-----------------------------------
- `Dataset <https://github.com/fastai/imagenette>`_
- :download:`Code <benchmarks/imagenette_benchmark.py>`

We use the ImageNette dataset provided here: https://github.com/fastai/imagenette
For our benchmarks we use the 160px version of the Imagenette dataset and
resize the input images to 128 pixels during training.
We train the self-supervised models from scratch on the training data. At the end of every
epoch we embed all training images and use the features for a kNN classifier
with k=20 on the test set. The reported kNN Top 1 is the max accuracy
over all epochs the model reached. All experiments use the same ResNet-18 backbone and
the default ImageNet-1k training parameters from the respective papers.

For our benchmarks we use the 160px version and resize the input images to 128 pixels.
Training a single model for 800 epochs on a A6000 GPU takes about 3-5 hours.
The following experiments have been conducted on a system with single A6000 GPU.
Training a model takes three to five hours, including kNN evaluation.


.. csv-table:: ImageNette benchmark results using kNN evaluation on the test set using 128x128 input resolution.
:header: "Model", "Batch Size", "Epochs", "KNN Test Accuracy", "Runtime", "GPU Memory"
.. csv-table:: Imagenette benchmark results
:header: "Model", "Batch Size", "Epochs", "kNN Top 1", "Runtime", "GPU Memory"
:widths: 20, 20, 20, 20, 20, 20

"BarlowTwins", "256", "800", "0.852", "298.5 Min", "4.0 GByte"
Expand All @@ -78,31 +116,32 @@ Training a single model for 800 epochs on a A6000 GPU takes about 3-5 hours.
"VICReg", "256", "800", "0.845", "205.6 Min", "4.0 GByte"
"VICRegL", "256", "800", "0.778", "218.7 Min", "4.0 GByte"

You can reproduce the benchmarks using the following script:
:download:`imagenette_benchmark.py <benchmarks/imagenette_benchmark.py>`


CIFAR-10
-----------------------------------
--------

- `Dataset <https://www.cs.toronto.edu/~kriz/cifar.html>`_
- :download:`Code <benchmarks/cifar10_benchmark.py>`

Cifar10 consists of 50k training images and 10k testing images. We train the
CIFAR-10 consists of 50k training images and 10k testing images. We train the
self-supervised models from scratch on the training data. At the end of every
epoch we embed all training images and use the features for a kNN classifier
with k=200 on the test set. The reported kNN test accuracy is the max accuracy
with k=200 on the test set. The reported kNN Top 1 is the max accuracy
over all epochs the model reached.
All experiments use the same ResNet-18 backbone and we disable the gaussian blur
augmentation due to the small image sizes.

.. note:: The ResNet-18 backbone in this benchmark is slightly different from
the torchvision variant as it starts with a 3x3 convolution and has no
stride and no `MaxPool2d`. This is a typical variation used for cifar10
stride and no `MaxPool2d`. This is a typical variation used for CIFAR-10
benchmarks of SSL methods.

.. role:: raw-html(raw)
:format: html

.. csv-table:: Cifar10 benchmark results showing kNN test accuracy, runtime and peak GPU memory consumption for different training setups.
:header: "Model", "Batch Size", "Epochs", "KNN Test Accuracy", "Runtime", "GPU Memory"
.. csv-table:: CIFAR-10 benchmark results
:header: "Model", "Batch Size", "Epochs", "kNN Top 1", "Runtime", "GPU Memory"
:widths: 20, 20, 20, 30, 20, 20

"BarlowTwins", "128", "200", "0.842", "375.9 Min", "1.7 GByte"
Expand All @@ -123,8 +162,8 @@ augmentation due to the small image sizes.
"DCLW", "512", "200", "0.824", "87.9 Min", "4.9 GByte"
"DINO", "512", "200", "0.813", "108.6 Min", "5.0 GByte"
"FastSiam", "512", "200", "0.788", "146.9 Min", "9.5 GByte"
"Moco (*)", "512", "200", "0.847", "112.2 Min", "5.6 GByte"
"NNCLR (*)", "512", "200", "0.815", "88.1 Min", "5.0 GByte"
"Moco*", "512", "200", "0.847", "112.2 Min", "5.6 GByte"
"NNCLR*", "512", "200", "0.815", "88.1 Min", "5.0 GByte"
"SimCLR", "512", "200", "0.848", "87.1 Min", "4.9 GByte"
"SimSiam", "512", "200", "0.764", "87.8 Min", "5.0 GByte"
"SwaV", "512", "200", "0.842", "88.7 Min", "4.9 GByte"
Expand All @@ -135,27 +174,23 @@ augmentation due to the small image sizes.
"DCLW", "512", "800", "0.871", "333.3 Min", "4.9 GByte"
"DINO", "512", "800", "0.848", "405.2 Min", "5.0 GByte"
"FastSiam", "512", "800", "0.902", "582.0 Min", "9.5 GByte"
"Moco (*)", "512", "800", "0.899", "417.8 Min", "5.4 GByte"
"NNCLR (*)", "512", "800", "0.892", "335.0 Min", "5.0 GByte"
"Moco*", "512", "800", "0.899", "417.8 Min", "5.4 GByte"
"NNCLR*", "512", "800", "0.892", "335.0 Min", "5.0 GByte"
"SimCLR", "512", "800", "0.879", "331.1 Min", "4.9 GByte"
"SimSiam", "512", "800", "0.904", "333.7 Min", "5.1 GByte"
"SwaV", "512", "800", "0.884", "330.5 Min", "5.0 GByte"
"SMoG", "512", "800", "0.800", "415.6 Min", "3.2 GByte"

(*): Increased size of memory bank from 4096 to 8192 to avoid too quickly
changing memory bank due to larger batch size.
*\*Increased size of memory bank from 4096 to 8192 to avoid
changing the memory bank too quickly due to larger batch size.*

We make the following observations running the benchmark:

- Self-Supervised models benefit from larger batch sizes and longer training.
- All models need around 3-4h to complete the 200 epoch benchmark and 11-13h
for the 800 epoch benchmark.
- Memory consumption is roughly the same for all models.
- Some models, like MoCo or SwaV, learn quickly in the beginning and then
plateau. Other models, like SimSiam or NNCLR, take longer to warm up but then
catch up when training for 800 epochs. This can also be seen in the
figure below.

- Training time is roughly the same for all methods (three to four hours for 200 epochs).
- Memory consumption is roughly the same for all methods.
- MoCo and SwaV learn quickly in the beginning and then plateau.
- SimSiam or NNCLR take longer to warm up but then catch up when training for 800 epochs.

.. figure:: images/cifar10_benchmark_knn_accuracy_800_epochs.png
:align: center
Expand All @@ -167,48 +202,9 @@ We make the following observations running the benchmark:
Interactive plots of the 800 epoch accuracy and training loss are hosted on
`tensorboard <https://tensorboard.dev/experiment/2XsJe3Y4TWCQSzHyDFaPQA>`__.

You can reproduce the benchmarks using the following script:
:download:`cifar10_benchmark.py <benchmarks/cifar10_benchmark.py>`


Imagenet100
-----------

Imagenet100 is a subset of the popular ImageNet-1k dataset. It consists of 100 classes
with 1300 training and 50 validation images per class. We train the
self-supervised models from scratch on the training data. At the end of every
epoch we embed all training images and use the features for a kNN classifier
with k=20 on the test set. The reported kNN test accuracy is the max accuracy
over all epochs the model reached. All experiments use the same ResNet-18 backbone and
with the default ImageNet-1k training parameters from the respective papers.


.. csv-table:: Imagenet100 benchmark results showing kNN test accuracy, runtime and peak GPU memory consumption for different training setups.
:header: "Model", "Batch Size", "Epochs", "KNN Test Accuracy", "Runtime", "GPU Memory"
:widths: 20, 20, 20, 20, 20, 20

"BarlowTwins", "256", "200", "0.465", "1319.3 Min", "11.3 GByte"
"BYOL", "256", "200", "0.439", "1315.4 Min", "12.9 GByte"
"DINO", "256", "200", "0.518", "1868.5 Min", "17.4 GByte"
"FastSiam", "256", "200", "0.559", "1856.2 Min", "22.0 GByte"
"Moco", "256", "200", "0.560", "1314.2 Min", "13.1 GByte"
"NNCLR", "256", "200", "0.453", "1198.6 Min", "11.8 GByte"
"SimCLR", "256", "200", "0.469", "1207.7 Min", "11.3 GByte"
"SimSiam", "256", "200", "0.534", "1175.0 Min", "11.1 GByte"
"SwaV", "256", "200", "0.678", "1569.2 Min", "16.9 GByte"

You can reproduce the benchmarks using the following script:
:download:`imagenet100_benchmark.py <benchmarks/imagenet100_benchmark.py>`


Next Steps
----------

Now that you understand the performance of the different Lightly SSL methods how about
looking into a tutorial to implement your favorite model?

- :ref:`input-structure-label`
- :ref:`lightly-moco-tutorial-2`
- :ref:`lightly-simclr-tutorial-3`
- :ref:`lightly-simsiam-tutorial-4`
- :ref:`lightly-custom-augmentation-5`
Train your own self-supervised model following our :ref:`examples <models>` or
check out our :ref:`tutorials <input-structure-label>`.

0 comments on commit b7661f0

Please sign in to comment.