Add tico model #1398

IgorSusmelj · 2023-09-16T01:27:25Z

Changes

Add TiCo model code for ImageNet benchmark
Run experiments and add results to readme and benchmarks tab in the docs

Note that I'll have to rebase this PR on master as quite some time passed since I started this.

codecov · 2023-09-16T01:32:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (67dc269) 85.49% compared to head (d97102b) 85.50%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1398   +/-   ##
=======================================
  Coverage   85.49%   85.50%           
=======================================
  Files         135      135           
  Lines        5655     5657    +2     
=======================================
+ Hits         4835     4837    +2     
  Misses        820      820

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

IgorSusmelj · 2023-09-26T03:54:02Z

This will be a more tough one. I was running several experiments and the loss is going up and accuracy stays down. Somehow the training seems unstable. I varied some of the hyperparameters but didn't get it running.

IgorSusmelj · 2024-01-05T12:19:27Z

The training on ImageNet does not work as expected. We did several modifications to the loss but no matter what we do it ends up with having the loss saturing to the max value and the accuracy staying at 0%.

Epoch 3:  63%|██████▎   | 1583/2502 [14:05<08:10,  1.87it/s, v_num=0, train_loss=8.000, val_online_cls_loss=140.0, val_online_cls_top1=0.00106, val_online_cls_top5=0.00414]
...
Epoch 5:  10%|▉         | 238/2502 [02:11<20:47,  1.81it/s, v_num=0, train_loss=8.000, val_online_cls_loss=569.0, val_online_cls_top1=0.00094, val_online_cls_top5=0.00438]

Things we tried:

Detaching the auxiliary matrix to prevent backprop through it (suggestion from Guarin):

lightly/lightly/loss/tico_loss.py

Line 104 in efc559c

B = torch.mm(z_a.T, z_a) / z_a.shape[0]
Toggling gather_distributed (suggestion from Guarin): but both, with and without we get the same results
With the new default settings (this PR) the loss saturates at 8.0
Variations of the hyperparameters (beta, rho of the loss and learning rate)

I'll do one more run with more in depth logs to better isolate which parts of the loss go out of control :)

guarin · 2024-01-05T12:25:57Z

@IgorSusmelj could you quickly summarize the changes you tried?

IgorSusmelj · 2024-01-08T10:41:44Z

The latest changes seem promising:

guarin

LGTM! Let's update the benchmarks once we figure out what is wrong with the model.

I'll also create follow-up issues to update BYOL and MoCo to use the correct backbone for evaluation.

guarin · 2024-01-11T12:47:55Z

README.md

+| SimCLR\* + DCL  | Res50    | 256        | 100    | 65.1        | 73.5          | 49.6     | [link](https://tensorboard.dev/experiment/k4ZonZ77QzmBkc0lXswQlg/) | [link](https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_dcl_2023-07-04_16-51-40/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt)         |
+| SimCLR\* + DCLW | Res50    | 256        | 100    | 64.5        | 73.2          | 48.5     | [link](https://tensorboard.dev/experiment/TrALnpwFQ4OkZV3uvaX7wQ/) | [link](https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_dclw_2023-07-07_14-57-13/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt)        |
+| SwAV            | Res50    | 256        | 100    | 67.2        | 75.4          | 49.5     | [link](https://tensorboard.dev/experiment/Ipx4Oxl5Qkqm5Sl5kWyKKg)  | [link](https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_swav_2023-05-25_08-29-14/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt)        |
+| TiCo            | Res50    | 256        | 100    | 49.7        | 72.7          | 26.6     | -                                                                  | [link](https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_tico_2024-01-07_18-40-57/pretrain/version_0/checkpoints/epoch%3D99-step%3D250200.ckpt)        |


Interesting that the linear accuracy is so low. In the paper they report linear accuracy that is similar to the other methods.

100 epochs vs 1000 epochs could be the reason for that

benchmarks/imagenet/resnet50/tico.py

* Add tico model * Fix view * Fix wrong hyperparam * Fix hyperparam and make naming consistent * Fix wrong loss * Minor changes for debugging * Cleanup * Log individual losses. Detach B. * Fix issues in code. Remove debugging logs. * Add TiCo benchmarks results and checkpoints * Update codebase and use naming from paper. * Remove misleading comment about parameters for lr.

IgorSusmelj added 9 commits January 10, 2024 14:14

Add tico model

d5701ec

Fix view

bd95ea7

Fix wrong hyperparam

92bed23

Fix hyperparam and make naming consistent

05de213

Fix wrong loss

f7db194

Minor changes for debugging

6591e47

Cleanup

a88f88c

Log individual losses. Detach B.

596b4f1

Fix issues in code. Remove debugging logs.

5659080

IgorSusmelj marked this pull request as ready for review January 10, 2024 13:21

guarin approved these changes Jan 11, 2024

View reviewed changes

guarin reviewed Jan 11, 2024

View reviewed changes

benchmarks/imagenet/resnet50/tico.py Outdated Show resolved Hide resolved

IgorSusmelj added 2 commits January 11, 2024 15:22

Add TiCo benchmarks results and checkpoints

e499a9d

Update codebase and use naming from paper.

d97102b

IgorSusmelj force-pushed the igor-lig-3068-add-tico-imagenet-benchmark branch from a9c4979 to d97102b Compare January 11, 2024 14:27

Remove misleading comment about parameters for lr.

ccfbf24

IgorSusmelj merged commit deb3c31 into master Jan 11, 2024
8 checks passed

IgorSusmelj deleted the igor-lig-3068-add-tico-imagenet-benchmark branch January 11, 2024 14:41

guarin mentioned this pull request Aug 16, 2024

TiCo ImageNet Benchmark #1371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tico model #1398

Add tico model #1398

IgorSusmelj commented Sep 16, 2023 •

edited

Loading

codecov bot commented Sep 16, 2023 •

edited

Loading

IgorSusmelj commented Sep 26, 2023

IgorSusmelj commented Jan 5, 2024 •

edited

Loading

guarin commented Jan 5, 2024

IgorSusmelj commented Jan 8, 2024

guarin left a comment

guarin Jan 11, 2024

IgorSusmelj Jan 11, 2024

Add tico model #1398

Add tico model #1398

Conversation

IgorSusmelj commented Sep 16, 2023 • edited Loading

Changes

codecov bot commented Sep 16, 2023 • edited Loading

Codecov Report

IgorSusmelj commented Sep 26, 2023

IgorSusmelj commented Jan 5, 2024 • edited Loading

guarin commented Jan 5, 2024

IgorSusmelj commented Jan 8, 2024

guarin left a comment

Choose a reason for hiding this comment

guarin Jan 11, 2024

Choose a reason for hiding this comment

IgorSusmelj Jan 11, 2024

Choose a reason for hiding this comment

IgorSusmelj commented Sep 16, 2023 •

edited

Loading

codecov bot commented Sep 16, 2023 •

edited

Loading

IgorSusmelj commented Jan 5, 2024 •

edited

Loading