Polish benchmarks page

lightly-ai · Nov 22, 2023 · b7661f0 · b7661f0
1 parent 514ffd7
commit b7661f0
Showing 1 changed file with 85 additions and 89 deletions.
diff --git a/docs/source/getting_started/benchmarks.rst b/docs/source/getting_started/benchmarks.rst
@@ -1,35 +1,31 @@
 Benchmarks 
 ===================================
-We show benchmarks of the different models for self-supervised learning
-and their performance on public datasets.
+Implemented models and their performance on various datasets. Hyperparameters are not tuned for maximum accuracy.
 
+List of available benchmarks:
 
-We have benchmarks we regularly update for these datasets:
-
-- `Imagenet`_
-- `Imagenet100`_
-- `ImageNette`_
+- `ImageNet1k`_
+- `ImageNet100`_
+- `Imagenette`_
 - `CIFAR-10`_
 
-ImageNet
---------
+ImageNet1k
+----------
 
-We use the ImageNet1k ILSVRC2012 split provided here: https://image-net.org/download.php.
+- `Dataset <https://image-net.org/download.php>`_
+- `Code <https://github.com/lightly-ai/lightly/tree/master/benchmarks/imagenet/resnet50>`_
 
-Self-supervised training of a SimCLR model for 100 epochs with total batch size 256
-takes about four days including evaluation on two GeForce RTX 4090 GPUs. You can reproduce the results with
-the code at `benchmarks/imagenet/resnet50 <https://github.com/lightly-ai/lightly/tree/master/benchmarks/imagenet/resnet50>`_.
+The following experiments have been conducted on a system with 2x4090 GPUs.
+Training a model takes around four days for 100 epochs (35 min per epoch), including kNN, linear probing, and fine-tuning evaluation.
 
-Evaluation settings are based on these papers:
+Evaluation settings are based on the following papers:
 
 - Linear: `SimCLR <https://arxiv.org/abs/2002.05709>`_
 - Finetune: `SimCLR <https://arxiv.org/abs/2002.05709>`_
-- KNN: `InstDisc <https://arxiv.org/abs/1805.01978>`_
-
-See the `benchmarking scripts <https://github.com/lightly-ai/lightly/tree/master/benchmarks/imagenet/resnet50>`_ for details.
+- kNN: `InstDisc <https://arxiv.org/abs/1805.01978>`_
 
 .. csv-table:: Imagenet benchmark results.
-  :header: "Model", "Backbone", "Batch Size", "Epochs", "Linear Top1", "Linear Top5", "Finetune Top1", "Finetune Top5", "KNN Top1", "KNN Top5", "Tensorboard", "Checkpoint"
+  :header: "Model", "Backbone", "Batch Size", "Epochs", "Linear Top1", "Linear Top5", "Finetune Top1", "Finetune Top5", "kNN Top1", "kNN Top5", "Tensorboard", "Checkpoint"
   :widths: 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20
 
   "BarlowTwins", "Res50", "256", "100", "62.9", "84.3", "72.6", "90.9", "45.6", "73.9", "`link <https://tensorboard.dev/experiment/NxyNRiQsQjWZ82I9b0PvKg/>`_", "`link <https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_barlowtwins_2023-08-18_00-11-03/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt>`_"
@@ -41,20 +37,62 @@ See the `benchmarking scripts <https://github.com/lightly-ai/lightly/tree/master
   "SwAV", "Res50", "256", "100", "67.2", "88.1", "75.4", "92.7", "49.5", "78.6", "`link <https://tensorboard.dev/experiment/Ipx4Oxl5Qkqm5Sl5kWyKKg>`_", "`link <https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_swav_2023-05-25_08-29-14/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt>`_"
   "VICReg", "Res50", "256", "100", "63.0", "85.4", "73.7", "91.9", "46.3", "75.2", "`link <https://tensorboard.dev/experiment/qH5uywJbTJSzgCEfxc7yUw>`_", "`link <https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_vicreg_2023-09-11_10-53-08/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt>`_"
 
-*\*We use square root learning rate scaling instead of linear scaling as it yields better results for smaller batch sizes. See Appendix B.1 in SimCLR paper.*
+*\*We use square root learning rate scaling instead of linear scaling as it yields better results for smaller batch sizes. See Appendix B.1 in the SimCLR paper.*
+
+Found a missing model? Track the progress of our planned benchmarks on `GitHub <https://github.com/lightly-ai/lightly/issues/1197>`_.
+
+Imagenet100
+-----------
+
+- `Dataset <https://image-net.org/download.php>`_
+- :download:`Code <benchmarks/imagenet100_benchmark.py>`
+
+Imagenet100 is a subset of the popular ImageNet-1k dataset. It consists of 100 classes
+with 1300 training and 50 validation images per class. We train the
+self-supervised models from scratch on the training data. At the end of every
+epoch we embed all training images and use the features for a kNN classifier 
+with k=20 on the test set. The reported kNN Top 1 is the max accuracy
+over all epochs the model reached. All experiments use the same ResNet-18 backbone and
+the default ImageNet-1k training parameters from the respective papers.
+
+The following experiments have been conducted on a system with single A6000 GPU.
+Training a model takes between 20 and 30 hours, including kNN evaluation.
+
+.. csv-table:: Imagenet100 benchmark results
+  :header: "Model", "Batch Size", "Epochs", "kNN Top 1", "Runtime", "GPU Memory"
+  :widths: 20, 20, 20, 20, 20, 20
 
+  "BarlowTwins", "256", "200", "0.465", "1319.3 Min", "11.3 GByte"
+  "BYOL", "256", "200", "0.439", "1315.4 Min", "12.9 GByte"
+  "DINO", "256", "200", "0.518", "1868.5 Min", "17.4 GByte"
+  "FastSiam", "256", "200", "0.559", "1856.2 Min", "22.0 GByte"
+  "Moco", "256", "200", "0.560", "1314.2 Min", "13.1 GByte"
+  "NNCLR", "256", "200", "0.453", "1198.6 Min", "11.8 GByte"
+  "SimCLR", "256", "200", "0.469", "1207.7 Min", "11.3 GByte"
+  "SimSiam", "256", "200", "0.534", "1175.0 Min", "11.1 GByte"
+  "SwaV", "256", "200", "0.678", "1569.2 Min", "16.9 GByte"
+
+
+Imagenette
+----------
 
-ImageNette
------------------------------------
+- `Dataset <https://github.com/fastai/imagenette>`_
+- :download:`Code <benchmarks/imagenette_benchmark.py>`
 
-We use the ImageNette dataset provided here: https://github.com/fastai/imagenette
+For our benchmarks we use the 160px version of the Imagenette dataset and
+resize the input images to 128 pixels during training.
+We train the self-supervised models from scratch on the training data. At the end of every
+epoch we embed all training images and use the features for a kNN classifier 
+with k=20 on the test set. The reported kNN Top 1 is the max accuracy
+over all epochs the model reached. All experiments use the same ResNet-18 backbone and
+the default ImageNet-1k training parameters from the respective papers.
 
-For our benchmarks we use the 160px version and resize the input images to 128 pixels. 
-Training a single model for 800 epochs on a A6000 GPU takes about 3-5 hours.
+The following experiments have been conducted on a system with single A6000 GPU.
+Training a model takes three to five hours, including kNN evaluation.
 
 
-.. csv-table:: ImageNette benchmark results using kNN evaluation on the test set using 128x128 input resolution.
-  :header: "Model", "Batch Size", "Epochs", "KNN Test Accuracy", "Runtime", "GPU Memory"
+.. csv-table:: Imagenette benchmark results
+  :header: "Model", "Batch Size", "Epochs", "kNN Top 1", "Runtime", "GPU Memory"
   :widths: 20, 20, 20, 20, 20, 20
 
   "BarlowTwins", "256", "800", "0.852", "298.5 Min", "4.0 GByte"
@@ -78,31 +116,32 @@ Training a single model for 800 epochs on a A6000 GPU takes about 3-5 hours.
   "VICReg", "256", "800", "0.845", "205.6 Min", "4.0 GByte"
   "VICRegL", "256", "800", "0.778", "218.7 Min", "4.0 GByte"
 
-You can reproduce the benchmarks using the following script:
-:download:`imagenette_benchmark.py <benchmarks/imagenette_benchmark.py>` 
 
 
 CIFAR-10
------------------------------------
+--------
+
+- `Dataset <https://www.cs.toronto.edu/~kriz/cifar.html>`_
+- :download:`Code <benchmarks/cifar10_benchmark.py>` 
 
-Cifar10 consists of 50k training images and 10k testing images. We train the
+CIFAR-10 consists of 50k training images and 10k testing images. We train the
 self-supervised models from scratch on the training data. At the end of every
 epoch we embed all training images and use the features for a kNN classifier 
-with k=200 on the test set. The reported kNN test accuracy is the max accuracy
+with k=200 on the test set. The reported kNN Top 1 is the max accuracy
 over all epochs the model reached.
 All experiments use the same ResNet-18 backbone and we disable the gaussian blur
 augmentation due to the small image sizes.
 
 .. note:: The ResNet-18 backbone in this benchmark is slightly different from 
           the torchvision variant as it starts with a 3x3 convolution and has no
-          stride and no `MaxPool2d`. This is a typical variation used for cifar10
+          stride and no `MaxPool2d`. This is a typical variation used for CIFAR-10
           benchmarks of SSL methods.
 
 .. role:: raw-html(raw)
    :format: html
 
-.. csv-table:: Cifar10 benchmark results showing kNN test accuracy, runtime and peak GPU memory consumption for different training setups.
-  :header: "Model", "Batch Size", "Epochs", "KNN Test Accuracy", "Runtime", "GPU Memory"
+.. csv-table:: CIFAR-10 benchmark results
+  :header: "Model", "Batch Size", "Epochs", "kNN Top 1", "Runtime", "GPU Memory"
   :widths: 20, 20, 20, 30, 20, 20
 
   "BarlowTwins", "128", "200", "0.842", "375.9 Min", "1.7 GByte"
@@ -123,8 +162,8 @@ augmentation due to the small image sizes.
   "DCLW", "512", "200", "0.824", "87.9 Min", "4.9 GByte"
   "DINO", "512", "200", "0.813", "108.6 Min", "5.0 GByte"
   "FastSiam", "512", "200", "0.788", "146.9 Min", "9.5 GByte"
-  "Moco (*)", "512", "200", "0.847", "112.2 Min", "5.6 GByte"
-  "NNCLR (*)", "512", "200", "0.815", "88.1 Min", "5.0 GByte"
+  "Moco*", "512", "200", "0.847", "112.2 Min", "5.6 GByte"
+  "NNCLR*", "512", "200", "0.815", "88.1 Min", "5.0 GByte"
   "SimCLR", "512", "200", "0.848", "87.1 Min", "4.9 GByte"
   "SimSiam", "512", "200", "0.764", "87.8 Min", "5.0 GByte"
   "SwaV", "512", "200", "0.842", "88.7 Min", "4.9 GByte"
@@ -135,27 +174,23 @@ augmentation due to the small image sizes.
   "DCLW", "512", "800", "0.871", "333.3 Min", "4.9 GByte"
   "DINO", "512", "800", "0.848", "405.2 Min", "5.0 GByte"
   "FastSiam", "512", "800", "0.902", "582.0 Min", "9.5 GByte"
-  "Moco (*)", "512", "800", "0.899", "417.8 Min", "5.4 GByte"
-  "NNCLR (*)", "512", "800", "0.892", "335.0 Min", "5.0 GByte"
+  "Moco*", "512", "800", "0.899", "417.8 Min", "5.4 GByte"
+  "NNCLR*", "512", "800", "0.892", "335.0 Min", "5.0 GByte"
   "SimCLR", "512", "800", "0.879", "331.1 Min", "4.9 GByte"
   "SimSiam", "512", "800", "0.904", "333.7 Min", "5.1 GByte"
   "SwaV", "512", "800", "0.884", "330.5 Min", "5.0 GByte"
   "SMoG", "512", "800", "0.800", "415.6 Min", "3.2 GByte"
 
-(*): Increased size of memory bank from 4096 to 8192 to avoid too quickly 
-changing memory bank due to larger batch size.
+*\*Increased size of memory bank from 4096 to 8192 to avoid 
+changing the memory bank too quickly due to larger batch size.*
 
 We make the following observations running the benchmark:
 
 - Self-Supervised models benefit from larger batch sizes and longer training.
-- All models need around 3-4h to complete the 200 epoch benchmark and 11-13h
-  for the 800 epoch benchmark.
-- Memory consumption is roughly the same for all models.
-- Some models, like MoCo or SwaV, learn quickly in the beginning and then 
-  plateau. Other models, like SimSiam or NNCLR, take longer to warm up but then
-  catch up when training for 800 epochs. This can also be seen in the 
-  figure below.
-
+- Training time is roughly the same for all methods (three to four hours for 200 epochs).
+- Memory consumption is roughly the same for all methods.
+- MoCo and SwaV learn quickly in the beginning and then plateau.
+- SimSiam or NNCLR take longer to warm up but then catch up when training for 800 epochs.
 
 .. figure:: images/cifar10_benchmark_knn_accuracy_800_epochs.png
     :align: center
@@ -167,48 +202,9 @@ We make the following observations running the benchmark:
 Interactive plots of the 800 epoch accuracy and training loss are hosted on
 `tensorboard <https://tensorboard.dev/experiment/2XsJe3Y4TWCQSzHyDFaPQA>`__.
 
-You can reproduce the benchmarks using the following script:
-:download:`cifar10_benchmark.py <benchmarks/cifar10_benchmark.py>` 
-
-
-Imagenet100
------------
-
-Imagenet100 is a subset of the popular ImageNet-1k dataset. It consists of 100 classes
-with 1300 training and 50 validation images per class. We train the
-self-supervised models from scratch on the training data. At the end of every
-epoch we embed all training images and use the features for a kNN classifier 
-with k=20 on the test set. The reported kNN test accuracy is the max accuracy
-over all epochs the model reached. All experiments use the same ResNet-18 backbone and
-with the default ImageNet-1k training parameters from the respective papers.
-
-
-.. csv-table:: Imagenet100 benchmark results showing kNN test accuracy, runtime and peak GPU memory consumption for different training setups.
-  :header: "Model", "Batch Size", "Epochs", "KNN Test Accuracy", "Runtime", "GPU Memory"
-  :widths: 20, 20, 20, 20, 20, 20
-
-  "BarlowTwins", "256", "200", "0.465", "1319.3 Min", "11.3 GByte"
-  "BYOL", "256", "200", "0.439", "1315.4 Min", "12.9 GByte"
-  "DINO", "256", "200", "0.518", "1868.5 Min", "17.4 GByte"
-  "FastSiam", "256", "200", "0.559", "1856.2 Min", "22.0 GByte"
-  "Moco", "256", "200", "0.560", "1314.2 Min", "13.1 GByte"
-  "NNCLR", "256", "200", "0.453", "1198.6 Min", "11.8 GByte"
-  "SimCLR", "256", "200", "0.469", "1207.7 Min", "11.3 GByte"
-  "SimSiam", "256", "200", "0.534", "1175.0 Min", "11.1 GByte"
-  "SwaV", "256", "200", "0.678", "1569.2 Min", "16.9 GByte"
-
-You can reproduce the benchmarks using the following script:
-:download:`imagenet100_benchmark.py <benchmarks/imagenet100_benchmark.py>` 
-
 
 Next Steps
 ----------
 
-Now that you understand the performance of the different Lightly SSL methods how about
-looking into a tutorial to implement your favorite model?
-
-- :ref:`input-structure-label`
-- :ref:`lightly-moco-tutorial-2`
-- :ref:`lightly-simclr-tutorial-3`  
-- :ref:`lightly-simsiam-tutorial-4`
-- :ref:`lightly-custom-augmentation-5`
+Train your own self-supervised model following our :ref:`examples <models>` or
+check out our :ref:`tutorials <input-structure-label>`.