https://github.com/lightly-ai/awesome-self-supervised-learning
Curated List of papers on Self-Supervised Representation Learning
https://github.com/lightly-ai/awesome-self-supervised-learning
List: awesome-self-supervised-learning
awesome-list awesome-lists self-supervised-learning
Last synced: 4 months ago
JSON representation
Curated List of papers on Self-Supervised Representation Learning
- Host: GitHub
- URL: https://github.com/lightly-ai/awesome-self-supervised-learning
- Owner: lightly-ai
- Created: 2024-08-21T11:00:30.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-09-20T13:01:32.000Z (5 months ago)
- Last Synced: 2024-10-11T12:01:18.499Z (4 months ago)
- Topics: awesome-list, awesome-lists, self-supervised-learning
- Homepage:
- Size: 11.7 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-self-supervised-learning - Curated List of papers on Self-Supervised Representation Learning. (Other Lists / Julia Lists)
README
# Awesome Self Supervised Learning [](https://github.com/sindresorhus/awesome) [](https://discord.gg/xvNJW94)
Check out [Lightly**SSL**](https://github.com/lightly-ai/lightly) a computer vision framework for self-supervised learning by the team at [lightly.ai](https://www.lightly.ai/).
## 2024
| Title | Relevant Links |
|:-----|:--------------|
| [Scalable Pre-training of Large Autoregressive Image Models](https://arxiv.org/abs/2401.08541) | [](https://arxiv.org/abs/2401.08541) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/aim.ipynb) |
| [SAM 2: Segment Anything in Images and Videos](https://arxiv.org/abs/2408.00714) | [](https://arxiv.org/abs/2408.00714) [](https://drive.google.com/file/d/1kWvZclajy7z3ize2KNCLzCfvZN2pDien/view?usp=sharing) |
| [Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach](https://arxiv.org/abs/2405.15613) | [](https://arxiv.org/abs/2405.15613) |
| [GLID: Pre-training a Generalist Encoder-Decoder Vision Model](https://arxiv.org/abs/2404.07603) | [](https://arxiv.org/abs/2404.07603) [](https://drive.google.com/file/d/1CEaZ00z-0hqGKp5cTN8fxP6tsHiHkFye/view?usp=sharing) |
| [Rethinking Patch Dependence for Masked Autoencoders](https://arxiv.org/abs/2401.14391) | [](https://arxiv.org/abs/2401.14391) [](https://drive.google.com/file/d/1LtIPoes3y1ZOHD-UBeKgj9AYBoQ-nO5A/view?usp=sharing) |
| [You Don't Need Data-Augmentation in Self-Supervised Learning](https://arxiv.org/abs/2406.09294) | [](https://arxiv.org/abs/2406.09294) |
| [Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?](https://arxiv.org/abs/2406.10743) | [](https://arxiv.org/abs/2406.10743) |
| [Asymmetric Masked Distillation for Pre-Training Small Foundation Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Asymmetric_Masked_Distillation_for_Pre-Training_Small_Foundation_Models_CVPR_2024_paper.pdf) | [](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Asymmetric_Masked_Distillation_for_Pre-Training_Small_Foundation_Models_CVPR_2024_paper.pdf) [](https://github.com/MCG-NJU/AMD) |
| [Revisiting Feature Prediction for Learning Visual Representations from Video](https://arxiv.org/abs/2404.08471) | [](https://arxiv.org/abs/2404.08471) [](https://github.com/facebookresearch/jepa) |
| [Rethinking Patch Dependence for Masked Autoencoders](https://arxiv.org/abs/2401.14391) | [](https://arxiv.org/abs/2401.14391) [](https://github.com/TonyLianLong/CrossMAE) |
| [ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning](https://arxiv.org/abs/2405.15160) | [](https://arxiv.org/abs/2405.15160) |## 2023
| Title | Relevant Links |
|:-----|:--------------|
| [A Cookbook of Self-Supervised Learning](https://arxiv.org/abs/2304.12210) | [](https://arxiv.org/abs/2304.12210) |
| [Masked Autoencoders Enable Efficient Knowledge Distillers](https://arxiv.org/abs/2208.12256) | [](https://arxiv.org/abs/2208.12256) [](https://drive.google.com/file/d/1bzuOab5fvKK7jpxv5bMoGk1gW446SCUL/view?usp=sharing) |
| [Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective](https://openreview.net/forum?id=DBlWCsOy94) | [](https://openreview.net/forum?id=DBlWCsOy94) [](https://drive.google.com/file/d/1hBEy-yh_KtkqY3rjeato-Cuo6ITzhowr/view?usp=sharing) |
| [CycleCL: Self-supervised Learning for Periodic Videos](https://arxiv.org/abs/2311.03402) | [](https://arxiv.org/abs/2311.03402) [](https://drive.google.com/file/d/1BDC891HX_JxF84UK_x8RKgHZockJqQFU/view?usp=sharing) |
| [Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data](https://arxiv.org/abs/2303.13664) | [](https://arxiv.org/abs/2303.13664) [](https://drive.google.com/file/d/1RabJuwtOevH9hg9wuFTN4z8y4gjQxCT_/view?usp=sharing) |
| [Reverse Engineering Self-Supervised Learning](https://arxiv.org/abs/2305.15614) | [](https://arxiv.org/abs/2305.15614) [](https://drive.google.com/file/d/1KsqV9_HE0y0EwlNivUdZPKqqCkdM-4HB/view?usp=sharing) |
| [Improved baselines for vision-language pre-training](https://arxiv.org/abs/2305.08675) | [](https://arxiv.org/abs/2305.08675) [](https://drive.google.com/file/d/1CNLvxt1jri7chCGy2ZqXBzDwPko0s6QP/view?usp=sharing) |
| [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) | [](https://arxiv.org/abs/2304.07193) [](https://drive.google.com/file/d/11szszgtsYESO3QF8jkFsLFTVtN797uH2/view?usp=sharing) |
| [Segment Anything](https://arxiv.org/abs/2304.02643) | [](https://arxiv.org/abs/2304.02643) [](https://drive.google.com/file/d/18yPuL8J6boi5pB1NRO6VAUbYEwmI3tFo/view?usp=sharing) |
| [Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture](https://arxiv.org/abs/2301.08243) | [](https://arxiv.org/abs/2301.08243) [](https://drive.google.com/file/d/1l5nHxqqbv7o3ESw3DLBqgJyXILJ0FgH6/view?usp=sharing) |
| [Self-supervised Object-Centric Learning for Videos](https://arxiv.org/abs/2310.06907) | [](https://arxiv.org/abs/2310.06907) |
| [Patch nā Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution](https://proceedings.neurips.cc/paper_files/paper/2023/file/06ea400b9b7cfce6428ec27a371632eb-Paper-Conference.pdf) | [](https://proceedings.neurips.cc/paper_files/paper/2023/file/06ea400b9b7cfce6428ec27a371632eb-Paper-Conference.pdf) |
| [An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization](https://proceedings.neurips.cc/paper_files/paper/2023/file/6b1d4c03391b0aa6ddde0b807a78c950-Paper-Conference.pdf) | [](https://proceedings.neurips.cc/paper_files/paper/2023/file/6b1d4c03391b0aa6ddde0b807a78c950-Paper-Conference.pdf) |
| [The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning](https://arxiv.org/abs/2307.10907) | [](https://arxiv.org/abs/2307.10907) [](https://github.com/apple/ml-entropy-reconstruction) |
| [Fast Segment Anything](https://arxiv.org/abs/2306.12156) | [](https://arxiv.org/abs/2306.12156) [](https://github.com/CASIA-IVA-Lab/FastSAM) |
| [Faster Segment Anything: Towards Lightweight SAM for Mobile Applications](https://arxiv.org/abs/2306.14289) | [](https://arxiv.org/abs/2306.14289) [](https://github.com/ChaoningZhang/MobileSAM) |
| [What Do Self-Supervised Vision Transformers Learn?](https://arxiv.org/abs/2305.00729) | [](https://arxiv.org/abs/2305.00729) [](https://github.com/naver-ai/cl-vs-mim) |
| [Improved baselines for vision-language pre-training](https://arxiv.org/abs/2305.08675) | [](https://arxiv.org/abs/2305.08675) [](https://github.com/facebookresearch/clip-rocket) |
| [Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need](https://arxiv.org/abs/2303.15256) | [](https://arxiv.org/abs/2303.15256) |
| [EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything](https://arxiv.org/abs/2312.00863) | [](https://arxiv.org/abs/2312.00863) [](https://github.com/yformer/EfficientSAM) |
| [DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions](https://arxiv.org/abs/2309.03576) | [](https://arxiv.org/abs/2309.03576) [](https://github.com/Haochen-Wang409/DropPos) |
| [VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_VideoMAE_V2_Scaling_Video_Masked_Autoencoders_With_Dual_Masking_CVPR_2023_paper.pdf) | [](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_VideoMAE_V2_Scaling_Video_Masked_Autoencoders_With_Dual_Masking_CVPR_2023_paper.pdf) |
| [MGMAE: Motion Guided Masking for Video Masked Autoencoding](https://openaccess.thecvf.com/content/ICCV2023/papers/Huang_MGMAE_Motion_Guided_Masking_for_Video_Masked_Autoencoding_ICCV_2023_paper.pdf) | [](https://openaccess.thecvf.com/content/ICCV2023/papers/Huang_MGMAE_Motion_Guided_Masking_for_Video_Masked_Autoencoding_ICCV_2023_paper.pdf) [](https://github.com/MCG-NJU/MGMAE) |## 2022
| Title | Relevant Links |
|:-----|:--------------|
| [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) | [](https://arxiv.org/abs/2204.07141) [](https://drive.google.com/file/d/15WGpYpxy4_1a927RWrmlkeJohZDznN8e/view?usp=sharing) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/msn.ipynb) |
| [The Hidden Uniform Cluster Prior in Self-Supervised Learning](https://arxiv.org/abs/2210.07277) | [](https://arxiv.org/abs/2210.07277) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/pmsn.ipynb) |
| [Unsupervised Visual Representation Learning by Synchronous Momentum Grouping](https://arxiv.org/abs/2207.06167) | [](https://arxiv.org/abs/2207.06167) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/smog.ipynb) |
| [TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation Learning](https://arxiv.org/abs/2206.10698) | [](https://arxiv.org/abs/2206.10698) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/tico.ipynb) |
| [VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning](https://arxiv.org/abs/2105.04906) | [](https://arxiv.org/abs/2105.04906) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/vicreg.ipynb) |
| [VICRegL: Self-Supervised Learning of Local Visual Features](https://arxiv.org/abs/2210.01571) | [](https://arxiv.org/abs/2210.01571) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/vicregl.ipynb) |
| [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) | [](https://arxiv.org/abs/2203.12602) [](https://drive.google.com/file/d/1F0oyiyyxCKzWS9Gv8TssHxaCMFnAoxfb/view?usp=sharing) |
| [Improving Visual Representation Learning through Perceptual Understanding](https://arxiv.org/abs/2212.14504) | [](https://arxiv.org/abs/2212.14504) [](https://drive.google.com/file/d/1n4Y0iiM368RaPxPg6qvsfACguaolFnhf/view?usp=sharing) |
| [RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank](https://arxiv.org/abs/2210.02885) | [](https://arxiv.org/abs/2210.02885) [](https://drive.google.com/file/d/1cEP1_G2wMM3-AMMrdntGN6Fq1E5qwPi1/view?usp=sharing) |
| [A Closer Look at Self-Supervised Lightweight Vision Transformers](https://arxiv.org/abs/2205.14443) | [](https://arxiv.org/abs/2205.14443) [](https://github.com/wangsr126/mae-lite) |
| [Beyond neural scaling laws: beating power law scaling via data pruning](https://arxiv.org/abs/2206.14486) | [](https://arxiv.org/abs/2206.14486) [](https://github.com/rgeirhos/dataset-pruning-metrics) |
| [A simple, efficient and scalable contrastive masked autoencoder for learning visual representations](https://arxiv.org/abs/2210.16870) | [](https://arxiv.org/abs/2210.16870) |
| [Masked Autoencoders are Robust Data Augmentors](https://arxiv.org/abs/2206.04846) | [](https://arxiv.org/abs/2206.04846) |
| [Is Self-Supervised Learning More Robust Than Supervised Learning?](https://arxiv.org/abs/2206.05259) | [](https://arxiv.org/abs/2206.05259) |
| [Can CNNs Be More Robust Than Transformers?](https://arxiv.org/abs/2206.03452) | [](https://arxiv.org/abs/2206.03452) [](https://github.com/UCSC-VLAA/RobustCNN) |
| [Patch-level Representation Learning for Self-supervised Vision Transformers](https://arxiv.org/abs/2206.07990) | [](https://arxiv.org/abs/2206.07990) [](https://github.com/alinlab/selfpatch) |## 2021
| Title | Relevant Links |
|:-----|:--------------|
| [Barlow Twins: Self-Supervised Learning via Redundancy Reduction](https://arxiv.org/abs/2103.03230) | [](https://arxiv.org/abs/2103.03230) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/barlowtwins.ipynb) |
| [Decoupled Contrastive Learning](https://arxiv.org/abs/2110.06848) | [](https://arxiv.org/abs/2110.06848) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/dcl.ipynb) |
| [Dense Contrastive Learning for Self-Supervised Visual Pre-Training](https://arxiv.org/abs/2011.09157) | [](https://arxiv.org/abs/2011.09157) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/densecl.ipynb) |
| [Emerging Properties in Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.14294) | [](https://arxiv.org/abs/2104.14294) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/dino.ipynb) |
| [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) | [](https://arxiv.org/abs/2111.06377) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/mae.ipynb) |
| [With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations](https://arxiv.org/abs/2104.14548) | [](https://arxiv.org/abs/2104.14548) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/nnclr.ipynb) |
| [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886) | [](https://arxiv.org/abs/2111.09886) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/simmim.ipynb) |
| [Exploring Simple Siamese Representation Learning](https://arxiv.org/abs/2011.10566) | [](https://arxiv.org/abs/2011.10566) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/simsiam.ipynb) |
| [When Does Contrastive Visual Representation Learning Work?](https://arxiv.org/abs/2105.05837) | [](https://arxiv.org/abs/2105.05837) |
| [Efficient Visual Pretraining with Contrastive Detection](https://arxiv.org/abs/2103.10957) | [](https://arxiv.org/abs/2103.10957) |## 2020
| Title | Relevant Links |
|:-----|:--------------|
| [Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733) | [](https://arxiv.org/abs/2006.07733) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/byol.ipynb) |
| [A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/abs/2002.05709) | [](https://arxiv.org/abs/2002.05709) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/simclr.ipynb) |
| [Unsupervised Learning of Visual Features by Contrasting Cluster Assignments](https://arxiv.org/abs/2006.09882) | [](https://arxiv.org/abs/2006.09882) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/swav.ipynb) |## 2019
| Title | Relevant Links |
|:-----|:--------------|
| [Momentum Contrast for Unsupervised Visual Representation Learning](https://arxiv.org/abs/1911.05722) | [](https://arxiv.org/abs/1911.05722) [](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/moco.ipynb) |## 2018
| Title | Relevant Links |
|:-----|:--------------|
| [Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination](https://arxiv.org/abs/1805.01978) | [](https://arxiv.org/abs/1805.01978) |## 2016
| Title | Relevant Links |
|:-----|:--------------|
| [Context Encoders: Feature Learning by Inpainting](https://arxiv.org/abs/1604.07379) | [](https://arxiv.org/abs/1604.07379) |