https://github.com/wangz10/contrastive_loss
Experiments with supervised contrastive learning methods with different loss functions
https://github.com/wangz10/contrastive_loss
contrastive-learning deep-learning
Last synced: 5 months ago
JSON representation
Experiments with supervised contrastive learning methods with different loss functions
- Host: GitHub
- URL: https://github.com/wangz10/contrastive_loss
- Owner: wangz10
- Created: 2020-04-29T03:12:35.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T09:42:37.000Z (almost 3 years ago)
- Last Synced: 2024-11-15T05:31:58.445Z (11 months ago)
- Topics: contrastive-learning, deep-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 1.49 MB
- Stars: 215
- Watchers: 7
- Forks: 35
- Open Issues: 17
-
Metadata Files:
- Readme: README.ipynb
Awesome Lists containing this project
README
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preliminary\n",
"\n",
"Let $\\mathbf{x}$ be the input feature vector and $y$ be its label. Let $f(\\cdot)$ be a encoder network mapping the input space to the latent space and $\\mathbf{z} = f(\\mathbf{x})$ be the latent vector. \n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Types of contrastive loss functions\n",
"\n",
"### 1. Max margin contrastive loss (Hadsell et al. 2006)\n",
"\n",
"$$ \\mathcal{L}(\\mathbf{z_i}, \\mathbf{z_j}) = \n",
"\\mathbb{1}_{y_i=y_j} \\left\\lVert \\mathbf{z_i} - \\mathbf{z_j} \\right\\rVert^2_2 + \n",
"\\mathbb{1}_{y_i \\neq y_j} \\max(0, m - \\left\\lVert \\mathbf{z_i} - \\mathbf{z_j} \\right\\rVert_2)^2$$\n",
"\n",
", where $m > 0$ is a margin. The margin imposes a lower bound on the distance between a pair of samples with different labels. \n",
"\n",
"### 2. Triplet loss (Weinberger et al. 2006)\n",
"\n",
"Triplet loss operates on a triplet of vectors whose labels follow $y_i = y_j$ and $y_i \\neq y_k$. That is to say two of the three ($\\mathbf{z_i}$ and $\\mathbf{z_j}$) shared the same label and a third vector $\\mathbf{z_k}$ has a different label. In triplet learning literatures, they are termed anchor, positive, and negative, respectively. Triplet loss is defined as:\n",
"\n",
"$$ \\mathcal{L}(\\mathbf{z_i}, \\mathbf{z_j}, \\mathbf{z_k}) = \n",
"\\max(0, \\left\\lVert \\mathbf{z_i} - \\mathbf{z_j} \\right\\rVert^2_2 - \n",
" \\left\\lVert \\mathbf{z_i} - \\mathbf{z_k} \\right\\rVert^2_2 + m)\n",
"$$\n",
", where $m$ again is the margin parameter that requires the delta distances between anchor-positive and anchor-negative has to be larger than $m$. The intuition for this loss function is to push negative samples outside of the neighborhood by a margin while keeping positive samples within the neighborhood. Graphically:\n",
"\n",
"\n",
"\n",
"#### Triplet mining\n",
"\n",
"Based on the definition of the triplet loss, a triplet may have the following three scenarios before any training: \n",
"- **easy**: triplets with a loss of 0 because the negative is already more than a margin away from the anchor than the positive, i.e. $ \\left\\lVert \\mathbf{z_i} - \\mathbf{z_j} \\right\\rVert^2_2 + m < \n",
" \\left\\lVert \\mathbf{z_i} - \\mathbf{z_k} \\right\\rVert^2_2 $\n",
"- **hard**: triplets where the negative is closer to the anchor than the positive, i.e. $ \\left\\lVert \\mathbf{z_i} - \\mathbf{z_j} \\right\\rVert^2_2 >\n",
" \\left\\lVert \\mathbf{z_i} - \\mathbf{z_k} \\right\\rVert^2_2$ \n",
"- **semi-hard**: triplets where the negative lies in the margin, i.e. $ \\left\\lVert \\mathbf{z_i} - \\mathbf{z_j} \\right\\rVert^2_2 <\n",
" \\left\\lVert \\mathbf{z_i} - \\mathbf{z_k} \\right\\rVert^2_2 < \\left\\lVert \\mathbf{z_i} - \\mathbf{z_j} \\right\\rVert^2_2 + m$\n",
"\n",
"In the FaceNet (Schroff et al. 2015) paper, which uses triplet loss to learn embeddings for faces, the authors argued that triplet mining is crucial for model performance and convergence. They also found that hardest triplets led to local minima early on in training, specifically resulted in a collapsed model, whereas semi-hard triplets yields more stable results and faster convergence.\n",
"\n",
"\n",
"\n",
"### 3. Multi-class N-pair loss (Sohn 2016)\n",
"\n",
"Multi-class N-pair loss is a generalization of triplet loss allowing joint comparison among more than one negative samples. When applied on a pair of positive samples $\\mathbf{z_i}$ and $\\mathbf{z_j}$ sharing the same label ($y_i = y_j$) from a mini-batch with $2N$ samples, it is computed as:\n",
"\n",
"$$ \\mathcal{L}(\\mathbf{z_i}, \\mathbf{z_j}) = \n",
"\\log(1+\\sum_{k=1}^{2N}{\\mathbb{1}_{k \\neq i} \\exp(\\mathbf{z_i} \\mathbf{z_k} - \\mathbf{z_i} \\mathbf{z_j})})\n",
"$$\n",
", where $z_i z_j$ is the cosine similarity between the two vectors. \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With some algebraic manipulation, multi-class N-pair loss can be written as the following:\n",
"\n",
"\\begin{equation}\n",
"\\begin{split}\n",
"\\mathcal{L}(\\mathbf{z_i}, \\mathbf{z_j}) & = \\log(1+\\sum_{k=1}^{2N}{\\mathbb{1}_{k \\neq i} \\exp(\\mathbf{z_i} \\mathbf{z_k} - \\mathbf{z_i} \\mathbf{z_j})}) \\\\\n",
" & = -\\log \\frac{1}{1+\\sum_{k=1}^{2N}{\\mathbb{1}_{k \\neq i} \\exp(\\mathbf{z_i} \\mathbf{z_k} - \\mathbf{z_i} \\mathbf{z_j})}} \\\\\n",
" & = -\\log \\frac{1}{1+\\sum_{k=1}^{2N}{\\mathbb{1}_{k \\neq i} \\frac{\\exp(\\mathbf{z_i} \\mathbf{z_k})}{\\exp(\\mathbf{z_i} \\mathbf{z_j})}}} \\\\\n",
" & = -\\log \\frac{\\exp(\\mathbf{z_i} \\mathbf{z_j})}{\\exp(\\mathbf{z_i} \\mathbf{z_j}) + \\sum_{k=1}^{2N}\\mathbb{1}_{k \\neq i} \\exp(\\mathbf{z_i} \\mathbf{z_k})}\n",
"\\end{split}\n",
"\\end{equation}\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Supervised NT-Xent loss (Khosla et al. 2020)\n",
"\n",
"- Self-supervised NT-xent loss (Chen et al. 2020 in SimCLR paper) \n",
"\n",
"NT-Xent is coined by Chen et al. 2020 and is short for normalized temperature-scaled cross entropy loss. It is a modification of Multi-class N-pair loss with addition of the temperature parameter ($\\tau$).\n",
"\n",
"$$\n",
"\\mathcal{L}(\\mathbf{z_i}, \\mathbf{z_j}) = \n",
"-\\log \\frac{\\exp(\\mathbf{z_i} \\mathbf{z_j} / \\tau)}{\\sum_{k=1}^{2N}{\\mathbb{1}_{k \\neq i} \\exp(\\mathbf{z_i} \\mathbf{z_k} / \\tau)}}\n",
"$$\n",
"\n",
"- Supervised NT-xent loss\n",
"\n",
"$$\n",
"\\mathcal{L}(\\mathbf{z_i}, \\mathbf{z_j}) = \n",
"\\frac{-1}{2N_{y_i}-1} \\sum_{j=1}^{2N} \\log \\frac{\\exp(\\mathbf{z_i} \\mathbf{z_j} / \\tau)}{\\sum_{k=1}^{2N}{\\mathbb{1}_{k \\neq i} \\exp(\\mathbf{z_i} \\mathbf{z_k} / \\tau)}}\n",
"$$\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References\n",
"- [Hadsell, R., Chopra, S., & LeCun, Y. (2006, June). Dimensionality reduction by learning an invariant mapping.](http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf) In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 1735-1742). IEEE.\n",
"- [Weinberger, K. Q., Blitzer, J., & Saul, L. K. (2006). Distance metric learning for large margin nearest neighbor classification.](https://papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification.pdf) In Advances in neural information processing systems (pp. 1473-1480).\n",
"- [Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering.](https://arxiv.org/abs/1503.03832) In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823).\n",
"- [Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective.](https://papers.nips.cc/paper/6200-improved-deep-metric-learning-with-multi-class-n-pair-loss-objective) In Advances in neural information processing systems (pp. 1857-1865).\n",
"- [Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations.](https://arxiv.org/pdf/2002.05709.pdf) arXiv preprint arXiv:2002.05709.\n",
"- [Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., ... & Krishnan, D. (2020). Supervised Contrastive Learning.](https://arxiv.org/pdf/2004.11362.pdf) arXiv preprint arXiv:2004.11362."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}