This repo. aims to record papers realted to NLP and contrastive learning.
Papers | Conference | Codes |
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval | ICLR 2021 | |
Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification | CVPR 2017 | |
Contractive Learning with Hard Negative Samples | ICLR 2021 | |
Deep Metric Learning: A Survey | Symmetry 2019 | |
Metric Learning: A Survey | FTML 2013 | |
Noise Contrastive Estimation and Negative Sampling for ConditionalModels: Consistency and Statistical Efficiency | 2018 | |
Noise-contrastive estimation: A new estimation principle forunnormalized statistical models | AISTATS 2010 | |
Not All Samples Are Created Equal:Deep Learning with Importance Sampling | PMLR 2018 | |
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines | ICLR 2021 | |
Online Learning to Sample | 2015 | |
Optimizing Dense Retrieval Model Training with Hard Negatives | ACM 2021 | |
Representation Learning withContrastive Predictive Coding | 2018 | |
Rethinking InfoNCE: How Many Negative Samples Do You Need? | 2021 | |
Training Deep Models Faster with Robust, Approximate Importance Sampling | NIPS 2018 | |
Understanding Hard Negatives in Noise Contrastive Estimation | 2021 | |
Variance Reduction in SGD by Distributed Importance Sampling | ICLR 2016 | |
More Robust Dense Retrieval with Contrastive Dual Learning | ACM 2021 | |
Contrastive Representation Learning: A Framework and Review | IEEE 2020 | |
Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere | ICML 2020 | ⭐️ |
A Simple Framework for Contrastive Learning of Visual Representations | ICML 2020 | |
Intriguing Properties of Contrastive Losses | CoRR 2020 | |
Representation Learning with Contrastive Predictive Coding | 2018 | |
On Mutual Information Maximization for Representation Learning | ICLR 2020 | |
SimCSE: Simple Contrastive Learning of Sentence Embeddings | 2021 | |
Understanding the Behaviour of Contrastive Loss | CVPR2021 | |
A theoretical analysis of contrastive unsupervised representation learning | ICML 2019 | |
Representation learning with contrastive predictive coding | 2018 | |
Decoupled Contrastive Learning | | 2021 |
Papers | Conference | Comments |
NormFace: L2 Hypersphere Embedding for Face Verification | 2017 | ⭐️ |
Imagenet classication with deep convolutional neural networks | NIPS 2012 | Local Response Normalization and Local Contrast Normalization |
Batch normalization: Accelerating deep network training by reducing internal covariate shift | 2015 | |
Layer normalization | 2016 | |
Weight normalization: A simple reparameterization to accelerate training of deep neural networks | NIPS 2016 | |
Local similarity-aware deep feature embedding | NIPS 2016 | |
Deep metric learning via lifted structured feature embedding | IEEE 2016 | |
Improved deep metric learning with multi-class n-pair loss objective | NIPS 2016 | |
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings | EMNLP 2019 | |
On the Sentence Embeddings from Pre-trained Language Models | EMNLP 2020 | |
Representation Degeneration Problem in Training Natural Language Generation Models | ICLR 2019 | |
Improving Neural Language Generation with Spectrum Control | ICLR 2020 | |
Universally optimal distribution of points on spheres | 2007 | |
a | b | c |
a | b | c |
I consider von Mises-Fisher distributions, because softmax loss with L2 normalization is a type of von Mises-Fisher distributions in the angle of statistics.
Papers | Conference | Comments |
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions | JMLR 2005 | |
Von Mises-Fisher Clustering Models | PMLR 2014 | |
Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks | 2017 | |
von Mises-Fisher Mixture Model-based Deeplearning: Application to Face Verification | 2017 | |
a | b | c |
- On mutual information maximization for representation learning, 2019
- Learning deep representations by mutual information estimation and maximization. 2018
- Learning representations by maximizing mutual information across views. 2019
- Contrastive multiview coding. 2019