-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tico model #1398
Add tico model #1398
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1398 +/- ##
=======================================
Coverage 85.49% 85.50%
=======================================
Files 135 135
Lines 5655 5657 +2
=======================================
+ Hits 4835 4837 +2
Misses 820 820 ☔ View full report in Codecov by Sentry. |
This will be a more tough one. I was running several experiments and the loss is going up and accuracy stays down. Somehow the training seems unstable. I varied some of the hyperparameters but didn't get it running. |
The training on ImageNet does not work as expected. We did several modifications to the loss but no matter what we do it ends up with having the loss saturing to the max value and the accuracy staying at
Things we tried:
I'll do one more run with more in depth logs to better isolate which parts of the loss go out of control :) |
@IgorSusmelj could you quickly summarize the changes you tried? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's update the benchmarks once we figure out what is wrong with the model.
I'll also create follow-up issues to update BYOL and MoCo to use the correct backbone for evaluation.
| SimCLR\* + DCL | Res50 | 256 | 100 | 65.1 | 73.5 | 49.6 | [link](https://tensorboard.dev/experiment/k4ZonZ77QzmBkc0lXswQlg/) | [link](https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_dcl_2023-07-04_16-51-40/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt) | | ||
| SimCLR\* + DCLW | Res50 | 256 | 100 | 64.5 | 73.2 | 48.5 | [link](https://tensorboard.dev/experiment/TrALnpwFQ4OkZV3uvaX7wQ/) | [link](https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_dclw_2023-07-07_14-57-13/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt) | | ||
| SwAV | Res50 | 256 | 100 | 67.2 | 75.4 | 49.5 | [link](https://tensorboard.dev/experiment/Ipx4Oxl5Qkqm5Sl5kWyKKg) | [link](https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_swav_2023-05-25_08-29-14/pretrain/version_0/checkpoints/epoch%3D99-step%3D500400.ckpt) | | ||
| TiCo | Res50 | 256 | 100 | 49.7 | 72.7 | 26.6 | - | [link](https://lightly-ssl-checkpoints.s3.amazonaws.com/imagenet_resnet50_tico_2024-01-07_18-40-57/pretrain/version_0/checkpoints/epoch%3D99-step%3D250200.ckpt) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that the linear accuracy is so low. In the paper they report linear accuracy that is similar to the other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100 epochs vs 1000 epochs could be the reason for that
a9c4979
to
d97102b
Compare
* Add tico model * Fix view * Fix wrong hyperparam * Fix hyperparam and make naming consistent * Fix wrong loss * Minor changes for debugging * Cleanup * Log individual losses. Detach B. * Fix issues in code. Remove debugging logs. * Add TiCo benchmarks results and checkpoints * Update codebase and use naming from paper. * Remove misleading comment about parameters for lr.
Changes
Note that I'll have to rebase this PR on master as quite some time passed since I started this.