https://github.com/rosinality/ml-papers
My collection of machine learning papers
https://github.com/rosinality/ml-papers
Last synced: about 1 month ago
JSON representation
My collection of machine learning papers
- Host: GitHub
- URL: https://github.com/rosinality/ml-papers
- Owner: rosinality
- License: mit
- Created: 2021-05-10T13:55:31.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-08-10T06:15:19.000Z (almost 2 years ago)
- Last Synced: 2025-04-28T13:08:44.958Z (about 1 month ago)
- Size: 1.14 MB
- Stars: 283
- Watchers: 30
- Forks: 23
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ML Papers
### Reviews
1. [191210 최근 논문들에 대한 생각](papers/reviews/191210%20Thoughts%20on%20recent%20papers.md)
1. [200323 최근 논문들에 대한 생각](papers/reviews/200323%20Thoughts%20on%20recent%20papers.md)
1. [200326 최근 논문들에 대한 생각](papers/reviews/200326%20Thoughts%20on%20recent%20papers.md)
1. [200403 최근 논문들에 대한 생각](papers/reviews/200403%20Thoughts%20on%20recent%20papers.md)
1. [200411 최근 논문들에 대한 생각](papers/reviews/200411%20Thoughts%20on%20recent%20papers.md)
1. [200708 최근 논문들에 대한 생각](papers/reviews/200708%20Thoughts%20on%20recent%20papers.md)
1. [200717 최근 논문들에 대한 생각](papers/reviews/200717%20Thoughts%20on%20recent%20papers.md)
1. [200726 최근 논문들에 대한 생각](papers/reviews/200726%20Thoughts%20on%20recent%20papers.md)
1. [200802 최근 논문들에 대한 생각](papers/reviews/200802%20Thoughts%20on%20recent%20papers.md)
1. [201118 최근 논문들에 대한 생각](papers/reviews/201118%20Thoughts%20on%20recent%20papers.md)
1. [201120 최근 논문들에 대한 생각](papers/reviews/201120%20Thoughts%20on%20recent%20papers.md)
1. [201125 최근 논문들에 대한 생각](papers/reviews/201125%20Thoughts%20on%20recent%20papers.md)
1. [201126 최근 논문들에 대한 생각 1](papers/reviews/201126%20Thoughts%20on%20recent%20papers%201.md)
1. [201126 최근 논문들에 대한 생각 2](papers/reviews/201126%20Thoughts%20on%20recent%20papers%202.md)
1. [201204 최근 논문들에 대한 생각](papers/reviews/201204%20Thoughts%20on%20recent%20papers.md)
1. [210121 최근 논문들에 대한 생각](papers/reviews/210121%20Thoughts%20on%20recent%20papers.md)
1. [210121 최근 논문들에 대한 생각](papers/reviews/210227%20Thoughts%20on%20recent%20papers.md)
1. [210305 최근 논문들에 대한 생각](papers/reviews/210305%20Thoughts%20on%20recent%20papers.md)
1. [210319 최근 논문들에 대한 생각](papers/reviews/210319%20Thoughts%20on%20recent%20papers.md)
1. [210323 최근 논문들에 대한 생각](papers/reviews/210323%20Thoughts%20on%20recent%20papers.md)
1. [210326 최근 논문들에 대한 생각](papers/reviews/210326%20Thoughts%20on%20recent%20papers.md)
1. [210403 최근 논문들에 대한 생각](papers/reviews/210403%20Thoughts%20on%20recent%20papers.md)
1. [210412 최근 논문들에 대한 생각](papers/reviews/210412%20Thoughts%20on%20recent%20papers.md)
1. [210424 최근 논문들에 대한 생각](papers/reviews/210424%20Thoughts%20on%20recent%20papers.md)
1. [210429 최근 논문들에 대한 생각](papers/reviews/210429%20Thoughts%20on%20recent%20papers.md)
1. [210430 최근 논문들에 대한 생각 1](papers/reviews/210430%20Thoughts%20on%20recent%20papers%201.md)
1. [210430 최근 논문들에 대한 생각](papers/reviews/210430%20Thoughts%20on%20recent%20papers%202.md)
1. [210505 최근 논문들에 대한 생각](papers/reviews/210505%20Thoughts%20on%20recent%20papers.md)
1. [210508 최근 논문들에 대한 생각](papers/reviews/210508%20Thoughts%20on%20recent%20papers.md)
1. [230222 LLM 필요 데이터셋에 대한 리뷰](papers/reviews/llm-dataset.md)
## Table of contents
1. [3d generative model](#3d-generative-model)
1. [activation](#activation)
1. [active learning](#active-learning)
1. [adaptation](#adaptation)
1. [adapter](#adapter)
1. [adversarial training](#adversarial-training)
1. [alignment](#alignment)
1. [antialiasing](#antialiasing)
1. [asr](#asr)
1. [attention](#attention)
1. [audio generation](#audio-generation)
1. [audio source separation](#audio-source-separation)
1. [augmentation](#augmentation)
1. [autoregressive model](#autoregressive-model)
1. [backbone](#backbone)
1. [bayesian](#bayesian)
1. [benchmark](#benchmark)
1. [bert](#bert)
1. [bias](#bias)
1. [calibration](#calibration)
1. [causality](#causality)
1. [channel attention](#channel-attention)
1. [chat](#chat)
1. [classificiation](#classificiation)
1. [clip](#clip)
1. [computation](#computation)
1. [continual learning](#continual-learning)
1. [contrastive learning](#contrastive-learning)
1. [convolution](#convolution)
1. [dataset](#dataset)
1. [ddpm](#ddpm)
1. [decoding](#decoding)
1. [deep prior](#deep-prior)
1. [detr](#detr)
1. [dewarping](#dewarping)
1. [dialog](#dialog)
1. [differentiable operator](#differentiable-operator)
1. [differentiable tree](#differentiable-tree)
1. [discrete vae](#discrete-vae)
1. [disentangle](#disentangle)
1. [distillation](#distillation)
1. [distributed training](#distributed-training)
1. [domain adaptation](#domain-adaptation)
1. [dropout](#dropout)
1. [efficiency](#efficiency)
1. [efficient attention](#efficient-attention)
1. [efficient training](#efficient-training)
1. [embedding](#embedding)
1. [end2end](#end2end)
1. [energy based model](#energy-based-model)
1. [ensemble](#ensemble)
1. [federated learning](#federated-learning)
1. [few shot](#few-shot)
1. [finetuning](#finetuning)
1. [flow](#flow)
1. [fpn](#fpn)
1. [gan](#gan)
1. [gan inversion](#gan-inversion)
1. [generalization](#generalization)
1. [generative model](#generative-model)
1. [graph](#graph)
1. [hallucination](#hallucination)
1. [hypernetwork](#hypernetwork)
1. [hyperparameter](#hyperparameter)
1. [identifiability](#identifiability)
1. [image editing](#image-editing)
1. [image generation](#image-generation)
1. [img2img](#img2img)
1. [implicit model](#implicit-model)
1. [implicit representation](#implicit-representation)
1. [in context learning](#in-context-learning)
1. [instance segmentation](#instance-segmentation)
1. [instruct](#instruct)
1. [interpolation](#interpolation)
1. [knowledge base](#knowledge-base)
1. [language generation](#language-generation)
1. [language model](#language-model)
1. [layout](#layout)
1. [lightweight](#lightweight)
1. [line](#line)
1. [linear attention](#linear-attention)
1. [llm](#llm)
1. [lm](#lm)
1. [local attention](#local-attention)
1. [loss](#loss)
1. [loss surface](#loss-surface)
1. [matting](#matting)
1. [memory](#memory)
1. [meta learning](#meta-learning)
1. [metric](#metric)
1. [metric learning](#metric-learning)
1. [mixture of experts](#mixture-of-experts)
1. [mixup](#mixup)
1. [mlm](#mlm)
1. [mlops](#mlops)
1. [moe](#moe)
1. [multilingual](#multilingual)
1. [multimodal](#multimodal)
1. [multimodal generation](#multimodal-generation)
1. [multitask](#multitask)
1. [nas](#nas)
1. [nerf](#nerf)
1. [neural computer](#neural-computer)
1. [neural ode](#neural-ode)
1. [neural rendering](#neural-rendering)
1. [nlp](#nlp)
1. [nmt](#nmt)
1. [non autoregressive](#non-autoregressive)
1. [norm free](#norm-free)
1. [normalization](#normalization)
1. [object detection](#object-detection)
1. [ocr](#ocr)
1. [open set recognition](#open-set-recognition)
1. [optimization](#optimization)
1. [optimizer](#optimizer)
1. [oriented object detection](#oriented-object-detection)
1. [out of distribution](#out-of-distribution)
1. [panoptic segmentation](#panoptic-segmentation)
1. [perceptual loss](#perceptual-loss)
1. [point cloud](#point-cloud)
1. [pooling](#pooling)
1. [pose](#pose)
1. [positional encoding](#positional-encoding)
1. [practice](#practice)
1. [pretraining](#pretraining)
1. [probabilistic model](#probabilistic-model)
1. [prompt](#prompt)
1. [pruning](#pruning)
1. [qa](#qa)
1. [quantization](#quantization)
1. [reasoning](#reasoning)
1. [recommender](#recommender)
1. [regularization](#regularization)
1. [reinforcement learning](#reinforcement-learning)
1. [rendering](#rendering)
1. [representation](#representation)
1. [resampling](#resampling)
1. [restoration](#restoration)
1. [retrieval](#retrieval)
1. [review](#review)
1. [rl](#rl)
1. [robustness](#robustness)
1. [saliency](#saliency)
1. [salient object detection](#salient-object-detection)
1. [scale](#scale)
1. [score](#score)
1. [self supervised](#self-supervised)
1. [self supervised discovery](#self-supervised-discovery)
1. [semantic factor](#semantic-factor)
1. [semantic segmentation](#semantic-segmentation)
1. [semi supervised learning](#semi-supervised-learning)
1. [seq2seq](#seq2seq)
1. [sgld](#sgld)
1. [singing voice synthesis](#singing-voice-synthesis)
1. [single image](#single-image)
1. [speech](#speech)
1. [state space model](#state-space-model)
1. [structure learning](#structure-learning)
1. [style transfer](#style-transfer)
1. [stylegan](#stylegan)
1. [super resolution](#super-resolution)
1. [table](#table)
1. [text generation](#text-generation)
1. [text2img](#text2img)
1. [tokenizer](#tokenizer)
1. [topic model](#topic-model)
1. [topology](#topology)
1. [tracking](#tracking)
1. [training](#training)
1. [transducer](#transducer)
1. [transfer](#transfer)
1. [transformer](#transformer)
1. [tropical geometry](#tropical-geometry)
1. [tts](#tts)
1. [uncertainty](#uncertainty)
1. [unsupervised img2img](#unsupervised-img2img)
1. [unsupervised nmt](#unsupervised-nmt)
1. [vae](#vae)
1. [video](#video)
1. [video transformer](#video-transformer)
1. [vision](#vision)
1. [vision language](#vision-language)
1. [vision transformer](#vision-transformer)
1. [visual grounding](#visual-grounding)
1. [vit](#vit)
1. [vocoder](#vocoder)
1. [vq](#vq)
1. [vqa](#vqa)
1. [weak supervision](#weak-supervision)
1. [yolo](#yolo)
1. [uncategorized](#uncategorized)
## 3d generative model
1. [211220 3D-aware Image Synthesis via Learning Structural and Textural Representations](papers/2021/211220%203D-aware%20Image%20Synthesis%20via%20Learning%20Structural%20and%20Textural%20Representations.md)
1. [220615 GRAM-HD](papers/2022/220615%20GRAM-HD.md)
1. [220621 EpiGRAF](papers/2022/220621%20EpiGRAF.md)
1. [221126 AvatarGen](papers/2022/221126%20AvatarGen.md)
1. [230209 In-N-Out](papers/2023/230209%20In-N-Out.md) #gan_inversion
1. [230216 3D-aware Conditional Image Synthesis](papers/2023/230216%203D-aware%20Conditional%20Image%20Synthesis.md)
1. [230302 3D generation on ImageNet](papers/2023/230302%203D%20generation%20on%20ImageNet.md)
1. [230627 Free-style and Fast 3D Portrait Synthesis](papers/2023/230627%20Free-style%20and%20Fast%203D%20Portrait%20Synthesis.md)
1. [230630 Magic123](papers/2023/230630%20Magic123.md)
## activation
1. [201019 Smooth activations and reproducibility in deep networks](papers/2020/201019%20Smooth%20activations%20and%20reproducibility%20in%20deep%20networks.md) #stability
## active learning
1. [200630 Similarity Search for Efficient Active Learning and Search of Rare](papers/2020/200630%20Similarity%20Search%20for%20Efficient%20Active%20Learning%20and%20Search%20of%20Rare.md)
1. [210729 Batch Active Learning at Scale](papers/2021/210729%20Batch%20Active%20Learning%20at%20Scale.md)
## adaptation
1. [200129 Side-Tuning](papers/2020/200129%20Side-Tuning.md)
1. [200130 Once for All](papers/2020/200130%20Once%20for%20All.md) #deploy
## adapter
1. [210608 Compacter](papers/2021/210608%20Compacter.md)
1. [220524 AdaMix](papers/2022/220524%20AdaMix.md) #moe
## adversarial training
1. [200130 Adversarial Examples Improve Image Recognition](papers/2020/200130%20Adversarial%20Examples%20Improve%20Image%20Recognition.md)
1. [200625 Smooth Adversarial Training](papers/2020/200625%20Smooth%20Adversarial%20Training.md)
## alignment
1. [230504 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision](papers/2023/230504%20Principle-Driven%20Self-Alignment%20of%20Language%20Models%20from%20Scratch%20with%20Minimal%20Human%20Supervision.md)
1. [230517 LeTI](papers/2023/230517%20LeTI.md) #prompt
1. [230517 SLiC-HF](papers/2023/230517%20SLiC-HF.md)
1. [230518 LIMA](papers/2023/230518%20LIMA.md)
1. [230526 Training Socially Aligned Language Models in Simulated Human Society](papers/2023/230526%20Training%20Socially%20Aligned%20Language%20Models%20in%20Simulated%20Human%20Society.md)
1. [230529 Direct Preference Optimization](papers/2023/230529%20Direct%20Preference%20Optimization.md)
1. [230607 How Far Can Camels Go](papers/2023/230607%20How%20Far%20Can%20Camels%20Go.md)
1. [230625 Is RLHF More Difficult than Standard RL](papers/2023/230625%20Is%20RLHF%20More%20Difficult%20than%20Standard%20RL.md) #rl
1. [230628 Towards Measuring the Representation of Subjective Global Opinions in Language Models](papers/2023/230628%20Towards%20Measuring%20the%20Representation%20of%20Subjective%20Global%20Opinions%20in%20Language%20Models.md)
1. [230630 Preference Ranking Optimization for Human Alignment](papers/2023/230630%20Preference%20Ranking%20Optimization%20for%20Human%20Alignment.md)
1. [230705 Jailbroken](papers/2023/230705%20Jailbroken.md)
1. [230711 Secrets of RLHF in Large Language Models Part I](papers/2023/230711%20Secrets%20of%20RLHF%20in%20Large%20Language%20Models%20Part%20I.md) #reinforcement_learning
1. [230717 AlpaGasus](papers/2023/230717%20AlpaGasus.md)
1. [230720 FLASK](papers/2023/230720%20FLASK.md) #benchmark
1. [230727 Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback](papers/2023/230727%20Open%20Problems%20and%20Fundamental%20Limitations%20of%20Reinforcement%20Learning%20from%20Human%20Feedback.md)
1. [230727 PanGu-Coder2](papers/2023/230727%20PanGu-Coder2.md)
1. [230731 ToolLLM](papers/2023/230731%20ToolLLM.md)
1. [230801 Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models](papers/2023/230801%20Tool%20Documentation%20Enables%20Zero-Shot%20Tool-Usage%20with%20Large%20Language%20Models.md)
1. [230807 TPTU](papers/2023/230807%20TPTU.md)
1. [230808 Shepherd](papers/2023/230808%20Shepherd.md)
## antialiasing
1. [201120 An Effective Anti-Aliasing Approach for Residual Networks](papers/2020/201120%20An%20Effective%20Anti-Aliasing%20Approach%20for%20Residual%20Networks.md)
1. [201128 Truly shift-invariant convolutional neural networks](papers/2020/201128%20Truly%20shift-invariant%20convolutional%20neural%20networks.md)
## asr
1. [200220 Imputer](papers/2020/200220%20Imputer.md) #non-autoregressive #ctc
1. [200506 RNN-T Models Fail to Generalize to Out-of-Domain Audio](papers/2020/200506%20RNN-T%20Models%20Fail%20to%20Generalize%20to%20Out-of-Domain%20Audio.md) #transducer #out_of_distribution #domain #regularization
1. [200510 Listen Attentively, and Spell Once](papers/2020/200510%20Listen%20Attentively%2C%20and%20Spell%20Once.md) #non-autoregressive
1. [200516 Large scale weakly and semi-supervised learning for low-resource video ASR](papers/2020/200516%20Large%20scale%20weakly%20and%20semi-supervised%20learning%20for%20low-resource%20video%20ASR.md) #weak_supervision #semi_supervised_learning
1. [200516 Reducing Spelling Inconsistencies in Code-Switching ASR using](papers/2020/200516%20Reducing%20Spelling%20Inconsistencies%20in%20Code-Switching%20ASR%20using.md) #ctc
1. [200516 Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition](papers/2020/200516%20Spike-Triggered%20Non-Autoregressive%20Transformer%20for%20End-to-End%20Speech%20Recognition.md) #non-autoregressive
1. [200518 Attention-based Transducer for Online Speech Recognition](papers/2020/200518%20Attention-based%20Transducer%20for%20Online%20Speech%20Recognition.md) #transducer
1. [200518 Iterative Pseudo-Labeling for Speech Recognition](papers/2020/200518%20Iterative%20Pseudo-Labeling%20for%20Speech%20Recognition.md)
1. [200519 Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition](papers/2020/200519%20Distilling%20Knowledge%20from%20Ensembles%20of%20Acoustic%20Models%20for%20Joint%20CTC-Attention%20End-to-End%20Speech%20Recognition.md) #ctc
1. [200519 Improved Noisy Student Training for Automatic Speech Recognition](papers/2020/200519%20Improved%20Noisy%20Student%20Training%20for%20Automatic%20Speech%20Recognition.md) #semi_supervised_learning
1. [200729 Developing RNN-T Models Surpassing High-Performance Hybrid Models with](papers/2020/200729%20Developing%20RNN-T%20Models%20Surpassing%20High-Performance%20Hybrid%20Models%20with.md) #rnn_t
1. [201021 FastEmit](papers/2020/201021%20FastEmit.md) #transducer #decoding
1. [201027 CASS-NAT](papers/2020/201027%20CASS-NAT.md) #non-autoregressive
1. [201125 Streaming end-to-end multi-talker speech recognition](papers/2020/201125%20Streaming%20end-to-end%20multi-talker%20speech%20recognition.md) #transducer
1. [210524 Unsupervised Speech Recognition](papers/2021/210524%20Unsupervised%20Speech%20Recognition.md) #unsupervised_training
1. [210608 SpeechBrain](papers/2021/210608%20SpeechBrain.md)
1. [211012 Word Order Does Not Matter For Speech Recognition](papers/2021/211012%20Word%20Order%20Does%20Not%20Matter%20For%20Speech%20Recognition.md) #weak_supervision
1. [211030 Pseudo-Labeling for Massively Multilingual Speech Recognition](papers/2021/211030%20Pseudo-Labeling%20for%20Massively%20Multilingual%20Speech%20Recognition.md) #semi_supervised_learning #multilingual
1. [211210 Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition](papers/2021/211210%20Building%20a%20great%20multi-lingual%20teacher%20with%20sparsely-gated%20mixture%20of%20experts%20for%20speech%20recognition.md) #moe
1. [220829 A Language Agnostic Multilingual Streaming On-Device ASR System](papers/2022/220829%20A%20Language%20Agnostic%20Multilingual%20Streaming%20On-Device%20ASR%20System.md) #multilingual
1. [220922 Whisper](papers/2022/220922%20Whisper.md)
1. [230302 Google USM](papers/2023/230302%20Google%20USM.md) #multilingual
## attention
1. [200122 Object Contextual Representations](papers/2020/200122%20Object%20Contextual%20Representations.md) #semantic_segmentation
1. [200129 Empirical Attention](papers/2020/200129%20Empirical%20Attention.md)
1. [200130 Axial Attention](papers/2020/200130%20Axial%20Attention.md) #generative_model
1. [200130 Criss-Cross Attention](papers/2020/200130%20Criss-Cross%20Attention.md) #semantic_segmentation
1. [200212 Capsules with Inverted Dot-Product Attention Routing](papers/2020/200212%20Capsules%20with%20Inverted%20Dot-Product%20Attention%20Routing.md) #capsule
1. [200219 Tree-structured Attention with Hierarchical Accumulation](papers/2020/200219%20Tree-structured%20Attention%20with%20Hierarchical%20Accumulation.md) #parse
1. [200226 Sparse Sinkhorn Attention](papers/2020/200226%20Sparse%20Sinkhorn%20Attention.md) #sparse_attention
1. [200317 Axial-DeepLab](papers/2020/200317%20Axial-DeepLab.md) #panoptic_segmentation
1. [200404 Neural Architecture Search for Lightweight Non-Local Networks](papers/2020/200404%20Neural%20Architecture%20Search%20for%20Lightweight%20Non-Local%20Networks.md)
1. [200421 Attention is Not Only a Weight](papers/2020/200421%20Attention%20is%20Not%20Only%20a%20Weight.md) #bert
1. [200423 Self-Attention Attribution](papers/2020/200423%20Self-Attention%20Attribution.md) #bert
1. [200428 Exploring Self-attention for Image Recognition](papers/2020/200428%20Exploring%20Self-attention%20for%20Image%20Recognition.md)
1. [200510 CTC-synchronous Training for Monotonic Attention Model](papers/2020/200510%20CTC-synchronous%20Training%20for%20Monotonic%20Attention%20Model.md) #asr #ctc
1. [200516 Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory](papers/2020/200516%20Streaming%20Transformer-based%20Acoustic%20Models%20Using%20Self-attention%20with%20Augmented%20Memory.md) #asr #memory
1. [200519 Normalized Attention Without Probability Cage](papers/2020/200519%20Normalized%20Attention%20Without%20Probability%20Cage.md)
1. [200519 Staying True to Your Word](papers/2020/200519%20Staying%20True%20to%20Your%20Word.md)
1. [200626 Object-Centric Learning with Slot Attention](papers/2020/200626%20Object-Centric%20Learning%20with%20Slot%20Attention.md)
1. [201119 On the Dynamics of Training Attention Models](papers/2020/201119%20On%20the%20Dynamics%20of%20Training%20Attention%20Models.md) #training
1. [210223 Linear Transformers Are Secretly Fast Weight Memory Systems](papers/2021/210223%20Linear%20Transformers%20Are%20Secretly%20Fast%20Weight%20Memory%20Systems.md) #linear_attention #efficient_attention
1. [210225 LazyFormer](papers/2021/210225%20LazyFormer.md) #bert
1. [210517 Pay Attention to MLPs](papers/2021/210517%20Pay%20Attention%20to%20MLPs.md) #mlp
1. [210524 Self-Attention Networks Can Process Bounded Hierarchical Languages](papers/2021/210524%20Self-Attention%20Networks%20Can%20Process%20Bounded%20Hierarchical%20Languages.md) #nlp
1. [210826 Train Short, Test Long](papers/2021/210826%20Train%20Short%2C%20Test%20Long.md) #positional_encoding
## audio generation
1. [220220 It's Raw! Audio Generation with State-Space Models](papers/2022/220220%20It%27s%20Raw%21%20Audio%20Generation%20with%20State-Space%20Models.md)
1. [230126 MusicLM](papers/2023/230126%20MusicLM.md)
1. [230208 Noise2Music](papers/2023/230208%20Noise2Music.md)
## audio source separation
1. [211019 The Cocktail Fork Problem](papers/2021/211019%20The%20Cocktail%20Fork%20Problem.md)
## augmentation
1. [200122 FixMatch](papers/2020/200122%20FixMatch.md) #semi_supervised_learning #manifold #mixup
1. [200220 Affinity and Diversity](papers/2020/200220%20Affinity%20and%20Diversity.md)
1. [200621 AdvAug](papers/2020/200621%20AdvAug.md) #mixup #nlp #adversarial_training
1. [200710 Meta-Learning Requires Meta-Augmentation](papers/2020/200710%20Meta-Learning%20Requires%20Meta-Augmentation.md) #metalearning
1. [201117 Sequence-Level Mixed Sample Data Augmentation](papers/2020/201117%20Sequence-Level%20Mixed%20Sample%20Data%20Augmentation.md) #nlp
1. [201213 Simple Copy-Paste is a Strong Data Augmentation Method for Instance](papers/2020/201213%20Simple%20Copy-Paste%20is%20a%20Strong%20Data%20Augmentation%20Method%20for%20Instance.md) #instance_segmentation
1. [201214 Improving Panoptic Segmentation at All Scales](papers/2020/201214%20Improving%20Panoptic%20Segmentation%20at%20All%20Scales.md) #panoptic_segmentation
1. [210318 AlignMix](papers/2021/210318%20AlignMix.md) #mixup
1. [210318 TrivialAugment](papers/2021/210318%20TrivialAugment.md)
1. [210429 Ensembling with Deep Generative Views](papers/2021/210429%20Ensembling%20with%20Deep%20Generative%20Views.md) #ensemble #gan_inversion
1. [220830 Augraphy](papers/2022/220830%20Augraphy.md)
## autoregressive model
1. [200129 Semi Autorgressive Training](papers/2020/200129%20Semi%20Autorgressive%20Training.md)
1. [201027 Scaling Laws for Autoregressive Generative Modeling](papers/2020/201027%20Scaling%20Laws%20for%20Autoregressive%20Generative%20Modeling.md) #scale
1. [211216 Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling](papers/2021/211216%20Characterizing%20and%20addressing%20the%20issue%20of%20oversmoothing%20in%20neural%20autoregressive%20sequence%20modeling.md)
1. [220622 Scaling Autoregressive Models for Content-Rich Text-to-Image Generation](papers/2022/220622%20Scaling%20Autoregressive%20Models%20for%20Content-Rich%20Text-to-Image%20Generation.md) #image_generation
1. [230202 Accelerating Large Language Model Decoding with Speculative Sampling](papers/2023/230202%20Accelerating%20Large%20Language%20Model%20Decoding%20with%20Speculative%20Sampling.md) #decoding
## backbone
1. [190724 MixNet](papers/2019/190724%20MixNet.md) #convolution
1. [200123 Antialiasing](papers/2020/200123%20Antialiasing.md) #invariance
1. [200128 Attentive Normalization](papers/2020/200128%20Attentive%20Normalization.md)
1. [200128 IBN-Net](papers/2020/200128%20IBN-Net.md)
1. [200128 Selective Kernel](papers/2020/200128%20Selective%20Kernel.md)
1. [200128 SpineNet](papers/2020/200128%20SpineNet.md)
1. [200128 Squeeze-Excitation](papers/2020/200128%20Squeeze-Excitation.md)
1. [200128 Switchable Normalization](papers/2020/200128%20Switchable%20Normalization.md)
1. [200128 Switchable Whitening](papers/2020/200128%20Switchable%20Whitening.md)
1. [200129 Assembled Techniques](papers/2020/200129%20Assembled%20Techniques.md) #regularization
1. [200129 DenseNet](papers/2020/200129%20DenseNet.md)
1. [200129 Dual Path Networks](papers/2020/200129%20Dual%20Path%20Networks.md)
1. [200129 HarDNet](papers/2020/200129%20HarDNet.md)
1. [200129 PyramidNet](papers/2020/200129%20PyramidNet.md)
1. [200129 SelecSLS](papers/2020/200129%20SelecSLS.md)
1. [200129 ShuffleNet V2](papers/2020/200129%20ShuffleNet%20V2.md) #efficiency
1. [200129 VoVNet](papers/2020/200129%20VoVNet.md)
1. [200130 FishNet](papers/2020/200130%20FishNet.md)
1. [200130 HRNet](papers/2020/200130%20HRNet.md)
1. [200130 MixConv](papers/2020/200130%20MixConv.md) #convolution
1. [200330 Designing Network Design Spaces](papers/2020/200330%20Designing%20Network%20Design%20Spaces.md) #hypernetwork
1. [200330 TResNet](papers/2020/200330%20TResNet.md) #antialiasing
1. [200419 ResNeSt](papers/2020/200419%20ResNeSt.md)
1. [200630 Deep Isometric Learning for Visual Recognition](papers/2020/200630%20Deep%20Isometric%20Learning%20for%20Visual%20Recognition.md) #normalization #resnet #cnn #norm_free
1. [200712 PSConv](papers/2020/200712%20PSConv.md) #cnn #multiscale
1. [201015 HS-ResNet](papers/2020/201015%20HS-ResNet.md) #multiscale
1. [201221 FcaNet](papers/2020/201221%20FcaNet.md) #channel_attention
1. [210226 Transformer in Transformer](papers/2021/210226%20Transformer%20in%20Transformer.md) #vision_transformer
1. [210304 Barlow Twins](papers/2021/210304%20Barlow%20Twins.md) #self_supervised #contrastive_learning
1. [210310 Involution](papers/2021/210310%20Involution.md) #convolution #attention
1. [210312 Revisiting ResNets](papers/2021/210312%20Revisiting%20ResNets.md) #resnet
1. [210317 Learning to Resize Images for Computer Vision Tasks](papers/2021/210317%20Learning%20to%20Resize%20Images%20for%20Computer%20Vision%20Tasks.md) #resizing
1. [210331 EfficientNetV2](papers/2021/210331%20EfficientNetV2.md)
1. [210408 SI-Score](papers/2021/210408%20SI-Score.md) #robustness #vision_transformer
1. [210505 RepMLP](papers/2021/210505%20RepMLP.md) #mlp
1. [210506 Do You Even Need Attention](papers/2021/210506%20Do%20You%20Even%20Need%20Attention.md) #mlp
1. [210510 ResMLP](papers/2021/210510%20ResMLP.md) #mlp
1. [210617 Layer Folding](papers/2021/210617%20Layer%20Folding.md) #efficiency #pruning
1. [210628 Early Convolutions Help Transformers See Better](papers/2021/210628%20Early%20Convolutions%20Help%20Transformers%20See%20Better.md) #cnn #vit
1. [210718 AS-MLP](papers/2021/210718%20AS-MLP.md) #mlp
1. [210726 Contextual Transformer Networks for Visual Recognition](papers/2021/210726%20Contextual%20Transformer%20Networks%20for%20Visual%20Recognition.md)
1. [211014 Non-deep Networks](papers/2021/211014%20Non-deep%20Networks.md)
1. [211018 HRFormer](papers/2021/211018%20HRFormer.md) #vit
1. [211227 Augmenting Convolutional networks with attention-based aggregation](papers/2021/211227%20Augmenting%20Convolutional%20networks%20with%20attention-based%20aggregation.md) #vit #cnn
1. [220110 A ConvNet for the 2020s](papers/2022/220110%20A%20ConvNet%20for%20the%202020s.md) #cnn #vit
1. [220313 Scaling Up Your Kernels to 31x31](papers/2022/220313%20Scaling%20Up%20Your%20Kernels%20to%2031x31.md)
1. [220318 Three things everyone should know about Vision Transformers](papers/2022/220318%20Three%20things%20everyone%20should%20know%20about%20Vision%20Transformers.md) #vit
1. [220728 HorNet](papers/2022/220728%20HorNet.md) #cnn
1. [230302 Image as Set of Points](papers/2023/230302%20Image%20as%20Set%20of%20Points.md)
## bayesian
1. [200207 Bayes Posterior](papers/2020/200207%20Bayes%20Posterior.md)
1. [200210 Liberty or Depth](papers/2020/200210%20Liberty%20or%20Depth.md) #mean_field
1. [200514 Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors](papers/2020/200514%20Efficient%20and%20Scalable%20Bayesian%20Neural%20Nets%20with%20Rank-1%20Factors.md) #ensemble #variational_inference
## benchmark
1. [230720 SciBench](papers/2023/230720%20SciBench.md)
1. [230807 AgentBench](papers/2023/230807%20AgentBench.md)
## bert
1. [200305 What the [MASK]](papers/2020/200305%20What%20the%20%5BMASK%5D.md)
1. [200405 FastBERT](papers/2020/200405%20FastBERT.md) #distillation #lightweight
1. [200408 DynaBERT](papers/2020/200408%20DynaBERT.md) #distillation #pruning
1. [200412 XtremeDistil](papers/2020/200412%20XtremeDistil.md) #distillation #lightweight
1. [200427 DeeBERT](papers/2020/200427%20DeeBERT.md) #lightweight
1. [200518 Audio ALBERT](papers/2020/200518%20Audio%20ALBERT.md) #audio #representation
1. [200601 Amnesic Probing](papers/2020/200601%20Amnesic%20Probing.md)
1. [200608 On the Stability of Fine-tuning BERT](papers/2020/200608%20On%20the%20Stability%20of%20Fine-tuning%20BERT.md) #finetuning
1. [200610 Revisiting Few-sample BERT Fine-tuning](papers/2020/200610%20Revisiting%20Few-sample%20BERT%20Fine-tuning.md) #finetuning
1. [210906 An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models](papers/2021/210906%20An%20Empirical%20Study%20on%20Few-shot%20Knowledge%20Probing%20for%20Pretrained%20Language%20Models.md) #few_shot #knowledge_base #prompt
1. [210907 Beyond Preserved Accuracy](papers/2021/210907%20Beyond%20Preserved%20Accuracy.md) #lightweight #distillation
## bias
1. [200519 Identifying Statistical Bias in Dataset Replication](papers/2020/200519%20Identifying%20Statistical%20Bias%20in%20Dataset%20Replication.md)
1. [201202 Learning from others' mistakes](papers/2020/201202%20Learning%20from%20others%27%20mistakes.md) #product_of_experts
1. [220919 The Biased Artist](papers/2022/220919%20The%20Biased%20Artist.md) #image_generation
1. [230731 KoBBQ](papers/2023/230731%20KoBBQ.md)
## calibration
1. [200221 Calibrating Deep Neural Networks using Focal Loss](papers/2020/200221%20Calibrating%20Deep%20Neural%20Networks%20using%20Focal%20Loss.md) #loss
1. [200223 Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks](papers/2020/200223%20Being%20Bayesian%2C%20Even%20Just%20a%20Bit%2C%20Fixes%20Overconfidence%20in%20ReLU%20Networks.md) #bayesian
1. [200620 Regression Prior Networks](papers/2020/200620%20Regression%20Prior%20Networks.md)
1. [210730 Soft Calibration Objectives for Neural Networks](papers/2021/210730%20Soft%20Calibration%20Objectives%20for%20Neural%20Networks.md)
## causality
1. [200518 An Analysis of the Adaptation Speed of Causal Models](papers/2020/200518%20An%20Analysis%20of%20the%20Adaptation%20Speed%20of%20Causal%20Models.md)
## channel attention
1. [200129 GCNet](papers/2020/200129%20GCNet.md)
## chat
1. [200630 PLATO-2](papers/2020/200630%20PLATO-2.md) #text_gen #chatbot
## classificiation
1. [220107 Generalized Category Discovery](papers/2022/220107%20Generalized%20Category%20Discovery.md) #open_set_recognition
## clip
1. [230515 Improved baselines for vision-language pre-training](papers/2023/230515%20Improved%20baselines%20for%20vision-language%20pre-training.md)
## computation
1. [200213 Training Large Neural Networks with Constant Memory using a New Execution Algorithm](papers/2020/200213%20Training%20Large%20Neural%20Networks%20with%20Constant%20Memory%20using%20a%20New%20Execution%20Algorithm.md)
1. [201204 Nimble](papers/2020/201204%20Nimble.md)
## continual learning
1. [201124 Energy-Based Models for Continual Learning](papers/2020/201124%20Energy-Based%20Models%20for%20Continual%20Learning.md) #energy_based_model
1. [211103 One Pass ImageNet](papers/2021/211103%20One%20Pass%20ImageNet.md) #online_learning
## contrastive learning
1. [200213 A Simple Framework for Contrastive Learning of Visual Representations](papers/2020/200213%20A%20Simple%20Framework%20for%20Contrastive%20Learning%20of%20Visual%20Representations.md) #augmentation
1. [200309 Improved Baselines with Momentum Contrastive Learning](papers/2020/200309%20Improved%20Baselines%20with%20Momentum%20Contrastive%20Learning.md)
1. [200311 Improved Baselines with Momentum Contrastive Learning](papers/2020/200311%20Improved%20Baselines%20with%20Momentum%20Contrastive%20Learning.md) #review
1. [200423 Supervised Contrastive Learning](papers/2020/200423%20Supervised%20Contrastive%20Learning.md) #metric_learning
1. [200511 Prototypical Contrastive Learning of Unsupervised Representations](papers/2020/200511%20Prototypical%20Contrastive%20Learning%20of%20Unsupervised%20Representations.md)
1. [200520 What Makes for Good Views for Contrastive Learning](papers/2020/200520%20What%20Makes%20for%20Good%20Views%20for%20Contrastive%20Learning.md)
1. [200613 Bootstrap your own latent](papers/2020/200613%20Bootstrap%20your%20own%20latent.md)
1. [200630 Debiased Contrastive Learning](papers/2020/200630%20Debiased%20Contrastive%20Learning.md)
1. [200730 Contrastive Learning for Unpaired Image-to-Image Translation](papers/2020/200730%20Contrastive%20Learning%20for%20Unpaired%20Image-to-Image%20Translation.md) #img2img
1. [200803 LoCo](papers/2020/200803%20LoCo.md)
1. [201020 BYOL works even without batch statistics](papers/2020/201020%20BYOL%20works%20even%20without%20batch%20statistics.md)
1. [201109 Towards Domain-Agnostic Contrastive Learning](papers/2020/201109%20Towards%20Domain-Agnostic%20Contrastive%20Learning.md) #mixup #multimodal
1. [201116 AdCo](papers/2020/201116%20AdCo.md) #adversarial_training
1. [201117 Dense Contrastive Learning for Self-Supervised Visual Pre-Training](papers/2020/201117%20Dense%20Contrastive%20Learning%20for%20Self-Supervised%20Visual%20Pre-Training.md)
1. [201119 Heterogeneous Contrastive Learning](papers/2020/201119%20Heterogeneous%20Contrastive%20Learning.md)
1. [201119 Propagate Yourself](papers/2020/201119%20Propagate%20Yourself.md)
1. [201121 Run Away From your Teacher](papers/2020/201121%20Run%20Away%20From%20your%20Teacher.md)
1. [201123 Boosting Contrastive Self-Supervised Learning with False Negative](papers/2020/201123%20Boosting%20Contrastive%20Self-Supervised%20Learning%20with%20False%20Negative.md)
1. [201126 Beyond Single Instance Multi-view Unsupervised Representation Learning](papers/2020/201126%20Beyond%20Single%20Instance%20Multi-view%20Unsupervised%20Representation%20Learning.md) #self_supervised #mixup
1. [201126 How Well Do Self-Supervised Models Transfer](papers/2020/201126%20How%20Well%20Do%20Self-Supervised%20Models%20Transfer.md) #self_supervised #transfer
1. [201127 Self-EMD](papers/2020/201127%20Self-EMD.md)
1. [201201 Towards Good Practices in Self-supervised Representation Learning](papers/2020/201201%20Towards%20Good%20Practices%20in%20Self-supervised%20Representation%20Learning.md) #self_supervised
1. [201204 Seed the Views](papers/2020/201204%20Seed%20the%20Views.md) #mixup
1. [201212 Contrastive Learning for Label-Efficient Semantic Segmentation](papers/2020/201212%20Contrastive%20Learning%20for%20Label-Efficient%20Semantic%20Segmentation.md) #semantic_segmentation
1. [201221 Online Bag-of-Visual-Words Generation for Unsupervised Representation](papers/2020/201221%20Online%20Bag-of-Visual-Words%20Generation%20for%20Unsupervised%20Representation.md) #self_supervised #discrete_vae
1. [201226 Spatial Contrastive Learning for Few-Shot Classification](papers/2020/201226%20Spatial%20Contrastive%20Learning%20for%20Few-Shot%20Classification.md) #few_shot #attention
1. [210324 A Broad Study on the Transferability of Visual Representations with Contrastive Learning](papers/2021/210324%20A%20Broad%20Study%20on%20the%20Transferability%20of%20Visual%20Representations%20with%20Contrastive%20Learning.md) #review
1. [210325 Contrasting Contrastive Self-Supervised Representation Learning Models](papers/2021/210325%20Contrasting%20Contrastive%20Self-Supervised%20Representation%20Learning%20Models.md) #review
1. [210325 Rethinking Self-Supervised Learning](papers/2021/210325%20Rethinking%20Self-Supervised%20Learning.md) #training
1. [210405 An Empirical Study of Training Self-Supervised Vision Transformers](papers/2021/210405%20An%20Empirical%20Study%20of%20Training%20Self-Supervised%20Vision%20Transformers.md) #vision_transformer
1. [210426 Multimodal Contrastive Training for Visual Representation Learning](papers/2021/210426%20Multimodal%20Contrastive%20Training%20for%20Visual%20Representation%20Learning.md) #multimodal
1. [210429 A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning](papers/2021/210429%20A%20Large-Scale%20Study%20on%20Unsupervised%20Spatiotemporal%20Representation%20Learning.md) #video
1. [210429 Emerging Properties in Self-Supervised Vision Transformers](papers/2021/210429%20Emerging%20Properties%20in%20Self-Supervised%20Vision%20Transformers.md) #saliency #vision_transformer #representation
1. [210429 With a Little Help from My Friends](papers/2021/210429%20With%20a%20Little%20Help%20from%20My%20Friends.md) #knn
1. [210510 Self-Supervised Learning with Swin Transformers](papers/2021/210510%20Self-Supervised%20Learning%20with%20Swin%20Transformers.md) #vision_transformer
1. [210511 VICReg](papers/2021/210511%20VICReg.md)
1. [210512 When Does Contrastive Visual Representation Learning Work](papers/2021/210512%20When%20Does%20Contrastive%20Visual%20Representation%20Learning%20Work.md) #self_supervised #transfer #review
1. [210517 Divide and Contrast](papers/2021/210517%20Divide%20and%20Contrast.md) #self_supervised #dataset #distillation
1. [210601 Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task](papers/2021/210601%20Exploring%20the%20Diversity%20and%20Invariance%20in%20Yourself%20for%20Visual%20Pre-Training%20Task.md)
1. [211018 Understanding Dimensional Collapse in Contrastive Self-supervised Learning](papers/2021/211018%20Understanding%20Dimensional%20Collapse%20in%20Contrastive%20Self-supervised%20Learning.md)
1. [220701 e-CLIP](papers/2022/220701%20e-CLIP.md) #vision-language #retrieval
1. [220727 Contrastive Masked Autoencoders are Stronger Vision Learners](papers/2022/220727%20Contrastive%20Masked%20Autoencoders%20are%20Stronger%20Vision%20Learners.md) #self_supervised #mlm
1. [220804 Fine-Grained Semantically Aligned Vision-Language Pre-Training](papers/2022/220804%20Fine-Grained%20Semantically%20Aligned%20Vision-Language%20Pre-Training.md) #vision-language
1. [221017 Non-Contrastive Learning Meets Language-Image Pre-Training](papers/2022/221017%20Non-Contrastive%20Learning%20Meets%20Language-Image%20Pre-Training.md) #clip
1. [230327 Sigmoid Loss for Language Image Pre-Training](papers/2023/230327%20Sigmoid%20Loss%20for%20Language%20Image%20Pre-Training.md) #clip
1. [230414 DINOv2](papers/2023/230414%20DINOv2.md)
1. [230418 Hyperbolic Image-Text Representations](papers/2023/230418%20Hyperbolic%20Image-Text%20Representations.md) #clip #vision-language
1. [230501 What Do Self-Supervised Vision Transformers Learn](papers/2023/230501%20What%20Do%20Self-Supervised%20Vision%20Transformers%20Learn.md) #self_supervised #mlm
1. [230627 CLIPA-v2](papers/2023/230627%20CLIPA-v2.md) #vision-language #multimodal
## convolution
1. [200316 SlimConv](papers/2020/200316%20SlimConv.md)
1. [210429 Decoupled Dynamic Filter Networks](papers/2021/210429%20Decoupled%20Dynamic%20Filter%20Networks.md)
1. [230221 Hyena Hierarchy](papers/2023/230221%20Hyena%20Hierarchy.md) #state_space_model
## dataset
1. [200218 DivideMix](papers/2020/200218%20DivideMix.md) #mixup #noise #semi_supervised_learning
1. [200509 Building a Manga Dataset](papers/2020/200509%20Building%20a%20Manga%20Dataset.md)
1. [201130 Image Quality Assessment for Perceptual Image Restoration](papers/2020/201130%20Image%20Quality%20Assessment%20for%20Perceptual%20Image%20Restoration.md) #score
1. [201201 Weakly-Supervised Arbitrary-Shaped Text Detection with](papers/2020/201201%20Weakly-Supervised%20Arbitrary-Shaped%20Text%20Detection%20with.md) #ocr #weak_supervision
1. [210601 Comparing Test Sets with Item Response Theory](papers/2021/210601%20Comparing%20Test%20Sets%20with%20Item%20Response%20Theory.md)
1. [210907 Datasets](papers/2021/210907%20Datasets.md)
1. [210927 PASS](papers/2021/210927%20PASS.md)
1. [211103 LAION-400M](papers/2021/211103%20LAION-400M.md)
1. [220704 How Much More Data Do I Need](papers/2022/220704%20How%20Much%20More%20Data%20Do%20I%20Need.md)
1. [230220 Poisoning Web-Scale Training Datasets is Practical](papers/2023/230220%20Poisoning%20Web-Scale%20Training%20Datasets%20is%20Practical.md)
1. [230317 On the De-duplication of LAION-2B](papers/2023/230317%20On%20the%20De-duplication%20of%20LAION-2B.md) #clip
1. [230428 CCpdf](papers/2023/230428%20CCpdf.md)
## ddpm
1. [200619 Denoising Diffusion Probabilistic Models](papers/2020/200619%20Denoising%20Diffusion%20Probabilistic%20Models.md)
1. [201126 Score-Based Generative Modeling through Stochastic Differential](papers/2020/201126%20Score-Based%20Generative%20Modeling%20through%20Stochastic%20Differential.md) #generative_model
1. [201214 Learning Energy-Based Models by Diffusion Recovery Likelihood](papers/2020/201214%20Learning%20Energy-Based%20Models%20by%20Diffusion%20Recovery%20Likelihood.md) #energy_based_model
1. [210302 Fixing Data Augmentation to Improve Adversarial Robustness](papers/2021/210302%20Fixing%20Data%20Augmentation%20to%20Improve%20Adversarial%20Robustness.md) #augmentation #generative_model
1. [210305 Fixing Data Augmentation to Improve Adversarial Robustness 2](papers/2021/210305%20Fixing%20Data%20Augmentation%20to%20Improve%20Adversarial%20Robustness%202.md) #robustness #augmentation #generative_model
1. [210506 DiffSinger](papers/2021/210506%20DiffSinger.md) #singing_voice_synthesis
1. [210511 Diffusion Models Beat GANs on Image Synthesis](papers/2021/210511%20Diffusion%20Models%20Beat%20GANs%20on%20Image%20Synthesis.md)
1. [210528 Gotta Go Fast When Generating Data with Score-Based Models](papers/2021/210528%20Gotta%20Go%20Fast%20When%20Generating%20Data%20with%20Score-Based%20Models.md)
1. [210531 On Fast Sampling of Diffusion Probabilistic Models](papers/2021/210531%20On%20Fast%20Sampling%20of%20Diffusion%20Probabilistic%20Models.md)
1. [210607 Learning to Efficiently Sample from Diffusion Probabilistic Models](papers/2021/210607%20Learning%20to%20Efficiently%20Sample%20from%20Diffusion%20Probabilistic%20Models.md)
1. [210610 Cascaded Diffusion Models for High Fidelity Image Generation](papers/2021/210610%20Cascaded%20Diffusion%20Models%20for%20High%20Fidelity%20Image%20Generation.md)
1. [210610 Score-based Generative Modeling in Latent Space](papers/2021/210610%20Score-based%20Generative%20Modeling%20in%20Latent%20Space.md)
1. [210612 D2C](papers/2021/210612%20D2C.md)
1. [210701 Variational Diffusion Models](papers/2021/210701%20Variational%20Diffusion%20Models.md)
1. [210802 SDEdit](papers/2021/210802%20SDEdit.md)
1. [210819 ImageBART](papers/2021/210819%20ImageBART.md) #vq #autoregressive_model
1. [211129 Blended Diffusion for Text-driven Editing of Natural Images](papers/2021/211129%20Blended%20Diffusion%20for%20Text-driven%20Editing%20of%20Natural%20Images.md) #clip #image_editing
1. [211130 Diffusion Autoencoders](papers/2021/211130%20Diffusion%20Autoencoders.md)
1. [211220 GLIDE](papers/2021/211220%20GLIDE.md) #multimodal
1. [211220 High-Resolution Image Synthesis with Latent Diffusion Models](papers/2021/211220%20High-Resolution%20Image%20Synthesis%20with%20Latent%20Diffusion%20Models.md) #vae #vq
1. [220201 Progressive Distillation for Fast Sampling of Diffusion Models](papers/2022/220201%20Progressive%20Distillation%20for%20Fast%20Sampling%20of%20Diffusion%20Models.md) #distillation
1. [220316 Dual Diffusion Implicit Bridges for Image-to-Image Translation](papers/2022/220316%20Dual%20Diffusion%20Implicit%20Bridges%20for%20Image-to-Image%20Translation.md)
1. [220524 Imagen](papers/2022/220524%20Imagen.md) #conditional_generative_model
1. [220601 Elucidating the Design Space of Diffusion-Based Generative Models](papers/2022/220601%20Elucidating%20the%20Design%20Space%20of%20Diffusion-Based%20Generative%20Models.md)
1. [220803 Pyramidal Denoising Diffusion Probabilistic Models](papers/2022/220803%20Pyramidal%20Denoising%20Diffusion%20Probabilistic%20Models.md)
1. [220808 Analog Bits](papers/2022/220808%20Analog%20Bits.md)
1. [220912 Blurring Diffusion Models](papers/2022/220912%20Blurring%20Diffusion%20Models.md)
1. [220912 Soft Diffusion](papers/2022/220912%20Soft%20Diffusion.md)
1. [220929 DreamFusion](papers/2022/220929%20DreamFusion.md) #3d_generative_model
1. [221017 Imagic](papers/2022/221017%20Imagic.md) #image_editing
1. [221018 Differentially Private Diffusion Models](papers/2022/221018%20Differentially%20Private%20Diffusion%20Models.md)
1. [221102 eDiffi](papers/2022/221102%20eDiffi.md) #text2img
1. [221115 Versatile Diffusion](papers/2022/221115%20Versatile%20Diffusion.md) #vae
1. [221117 Null-text Inversion for Editing Real Images using Guided Diffusion Models](papers/2022/221117%20Null-text%20Inversion%20for%20Editing%20Real%20Images%20using%20Guided%20Diffusion%20Models.md) #image_editing
1. [221118 Magic3D](papers/2022/221118%20Magic3D.md) #3d_generative_model #text2img #nerf
1. [221120 Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models](papers/2022/221120%20Synthesizing%20Coherent%20Story%20with%20Auto-Regressive%20Latent%20Diffusion%20Models.md) #text2img
1. [221124 Fast Sampling of Diffusion Models via Operator Learning](papers/2022/221124%20Fast%20Sampling%20of%20Diffusion%20Models%20via%20Operator%20Learning.md)
1. [230126 On the Importance of Noise Scheduling for Diffusion Models](papers/2023/230126%20On%20the%20Importance%20of%20Noise%20Scheduling%20for%20Diffusion%20Models.md)
1. [230126 simple diffusion](papers/2023/230126%20simple%20diffusion.md)
1. [230131 Attend-and-Excite](papers/2023/230131%20Attend-and-Excite.md) #text2img
1. [230205 Design Booster](papers/2023/230205%20Design%20Booster.md) #image_editing
1. [230206 Zero-shot Image-to-Image Translation](papers/2023/230206%20Zero-shot%20Image-to-Image%20Translation.md) #image_editing
1. [230207 Long Horizon Temperature Scaling](papers/2023/230207%20Long%20Horizon%20Temperature%20Scaling.md) #calibration #lm
1. [230208 Q-Diffusion](papers/2023/230208%20Q-Diffusion.md) #quantization
1. [230212 I$^2$SB](papers/2023/230212%20I%24%5E2%24SB.md) #sde #image_restoration
1. [230215 PRedItOR](papers/2023/230215%20PRedItOR.md) #image_editing
1. [230216 MultiDiffusion](papers/2023/230216%20MultiDiffusion.md) #image_editing
1. [230220 Composer](papers/2023/230220%20Composer.md) #image_editing
1. [230221 Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels](papers/2023/230221%20Diffusion%20Models%20and%20Semi-Supervised%20Learners%20Benefit%20Mutually%20with%20Few%20Labels.md) #semi_supervised_learning #self_supervised
1. [230221 On Calibrating Diffusion Probabilistic Models](papers/2023/230221%20On%20Calibrating%20Diffusion%20Probabilistic%20Models.md)
1. [230223 Controlled and Conditional Text to Image Generation with Diffusion Prior](papers/2023/230223%20Controlled%20and%20Conditional%20Text%20to%20Image%20Generation%20with%20Diffusion%20Prior.md)
1. [230227 ELITE](papers/2023/230227%20ELITE.md) #text2img
1. [230301 Unlimited-Size Diffusion Restoration](papers/2023/230301%20Unlimited-Size%20Diffusion%20Restoration.md) #image_restoration
1. [230302 Consistency Models](papers/2023/230302%20Consistency%20Models.md) #generative_model
1. [230307 TRACT](papers/2023/230307%20TRACT.md) #distillation
1. [230309 Cones](papers/2023/230309%20Cones.md) #image_editing
1. [230316 $P+$](papers/2023/230316%20%24P%2B%24.md) #text2img
1. [230316 Efficient Diffusion Training via Min-SNR Weighting Strategy](papers/2023/230316%20Efficient%20Diffusion%20Training%20via%20Min-SNR%20Weighting%20Strategy.md)
1. [230320 SVDiff](papers/2023/230320%20SVDiff.md) #image_editing
1. [230405 Generative Novel View Synthesis with 3D-Aware Diffusion Models](papers/2023/230405%20Generative%20Novel%20View%20Synthesis%20with%203D-Aware%20Diffusion%20Models.md) #nerf
1. [230405 Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models](papers/2023/230405%20Taming%20Encoder%20for%20Zero%20Fine-tuning%20Image%20Customization%20with%20Text-to-Image%20Diffusion%20Models.md)
1. [230406 Diffusion Models as Masked Autoencoders](papers/2023/230406%20Diffusion%20Models%20as%20Masked%20Autoencoders.md) #representation
1. [230406 InstantBooth](papers/2023/230406%20InstantBooth.md) #image_editing
1. [230501 In-Context Learning Unlocked for Diffusion Models](papers/2023/230501%20In-Context%20Learning%20Unlocked%20for%20Diffusion%20Models.md) #few_shot #text2img
1. [230515 Common Diffusion Noise Schedules and Sample Steps are Flawed](papers/2023/230515%20Common%20Diffusion%20Noise%20Schedules%20and%20Sample%20Steps%20are%20Flawed.md)
1. [230529 RAPHAEL](papers/2023/230529%20RAPHAEL.md)
1. [230601 StyleDrop](papers/2023/230601%20StyleDrop.md) #style_transfer
1. [230706 Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback](papers/2023/230706%20Censored%20Sampling%20of%20Diffusion%20Models%20Using%203%20Minutes%20of%20Human%20Feedback.md)
1. [230707 SDXL](papers/2023/230707%20SDXL.md) #text2img
1. [230710 AnimateDiff](papers/2023/230710%20AnimateDiff.md)
## decoding
1. [200516 Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning](papers/2020/200516%20Layer-Wise%20Cross-View%20Decoding%20for%20Sequence-to-Sequence%20Learning.md)
1. [200601 Cascaded Text Generation with Markov Transformers](papers/2020/200601%20Cascaded%20Text%20Generation%20with%20Markov%20Transformers.md) #text_generation
1. [210608 FastSeq](papers/2021/210608%20FastSeq.md)
## deep prior
1. [200408 Deep Manifold Prior](papers/2020/200408%20Deep%20Manifold%20Prior.md)
## detr
1. [201201 MaX-DeepLab](papers/2020/201201%20MaX-DeepLab.md) #panoptic_segmentation #end2end
1. [210813 Conditional DETR for Fast Training Convergence](papers/2021/210813%20Conditional%20DETR%20for%20Fast%20Training%20Convergence.md)
1. [211202 Masked-attention Mask Transformer for Universal Image Segmentation](papers/2021/211202%20Masked-attention%20Mask%20Transformer%20for%20Universal%20Image%20Segmentation.md) #panoptic_segmentation
1. [220726 Group DETR](papers/2022/220726%20Group%20DETR.md) #efficient_training
1. [230803 DETR Doesn't Need Multi-Scale or Locality Design](papers/2023/230803%20DETR%20Doesn%27t%20Need%20Multi-Scale%20or%20Locality%20Design.md) #multiscale
## dewarping
1. [211025 DocTr](papers/2021/211025%20DocTr.md)
1. [211028 DocScanner](papers/2021/211028%20DocScanner.md)
## dialog
1. [200129 Meena](papers/2020/200129%20Meena.md) #NLP
1. [210715 Beyond Goldfish Memory](papers/2021/210715%20Beyond%20Goldfish%20Memory.md)
1. [220120 LaMDA](papers/2022/220120%20LaMDA.md)
## differentiable operator
1. [200220 Fast Differentiable Sorting and Ranking](papers/2020/200220%20Fast%20Differentiable%20Sorting%20and%20Ranking.md)
## differentiable tree
1. [200218 The Tree Ensemble Layer](papers/2020/200218%20The%20Tree%20Ensemble%20Layer.md)
## discrete vae
1. [200518 Robust Training of Vector Quantized Bottleneck Models](papers/2020/200518%20Robust%20Training%20of%20Vector%20Quantized%20Bottleneck%20Models.md)
## disentangle
1. [200130 ID-GAN](papers/2020/200130%20ID-GAN.md) #GAN
1. [200130 MixNMatch](papers/2020/200130%20MixNMatch.md) #conditional_generative_model
1. [200515 Face Identity Disentanglement via Latent Space Mapping](papers/2020/200515%20Face%20Identity%20Disentanglement%20via%20Latent%20Space%20Mapping.md)
## distillation
1. [200129 Learning by Cheating](papers/2020/200129%20Learning%20by%20Cheating.md)
1. [200209 Understanding and Improving Knowledge Distillation](papers/2020/200209%20Understanding%20and%20Improving%20Knowledge%20Distillation.md)
1. [200210 Subclass Distillation](papers/2020/200210%20Subclass%20Distillation.md)
1. [200219 Knapsack Pruning with Inner Distillation](papers/2020/200219%20Knapsack%20Pruning%20with%20Inner%20Distillation.md) #pruning #lightweight
1. [200221 Residual Knowledge Distillation](papers/2020/200221%20Residual%20Knowledge%20Distillation.md)
1. [200309 Knowledge distillation via adaptive instance normalization](papers/2020/200309%20Knowledge%20distillation%20via%20adaptive%20instance%20normalization.md) #normalization
1. [200521 Why distillation helps](papers/2020/200521%20Why%20distillation%20helps.md) #calibration
1. [200629 An EM Approach to Non-autoregressive Conditional Sequence Generation](papers/2020/200629%20An%20EM%20Approach%20to%20Non-autoregressive%20Conditional%20Sequence%20Generation.md) #non-autoregressive
1. [200701 Go Wide, Then Narrow](papers/2020/200701%20Go%20Wide%2C%20Then%20Narrow.md) #lightweight
1. [200702 Interactive Knowledge Distillation](papers/2020/200702%20Interactive%20Knowledge%20Distillation.md)
1. [210726 Text is Text, No Matter What](papers/2021/210726%20Text%20is%20Text%2C%20No%20Matter%20What.md) #multitask
## distributed training
1. [210510 GSPMD](papers/2021/210510%20GSPMD.md)
1. [230121 SuperScaler](papers/2023/230121%20SuperScaler.md)
## domain adaptation
1. [200526 Keep it Simple](papers/2020/200526%20Keep%20it%20Simple.md)
## dropout
1. [200701 On Dropout, Overfitting, and Interaction Effects in Deep Neural Networks](papers/2020/200701%20On%20Dropout%2C%20Overfitting%2C%20and%20Interaction%20Effects%20in%20Deep%20Neural%20Networks.md)
## efficiency
1. [230130 Alternating Updates for Efficient Transformers](papers/2023/230130%20Alternating%20Updates%20for%20Efficient%20Transformers.md)
1. [230530 Blockwise Parallel Transformer for Long Context Large Models](papers/2023/230530%20Blockwise%20Parallel%20Transformer%20for%20Long%20Context%20Large%20Models.md)
1. [230624 H$_2$O](papers/2023/230624%20H%24_2%24O.md)
1. [230705 SkipDecode](papers/2023/230705%20SkipDecode.md)
1. [230728 Skeleton-of-Thought](papers/2023/230728%20Skeleton-of-Thought.md)
## efficient attention
1. [200410 Longformer](papers/2020/200410%20Longformer.md)
1. [200412 ProFormer](papers/2020/200412%20ProFormer.md)
1. [200605 Masked Language Modeling for Proteins via Linearly Scalable Long-Context](papers/2020/200605%20Masked%20Language%20Modeling%20for%20Proteins%20via%20Linearly%20Scalable%20Long-Context.md)
1. [200608 Linformer](papers/2020/200608%20Linformer.md)
1. [210324 Finetuning Pretrained Transformers into RNNs](papers/2021/210324%20Finetuning%20Pretrained%20Transformers%20into%20RNNs.md)
1. [210505 Beyond Self-attention](papers/2021/210505%20Beyond%20Self-attention.md)
1. [210510 Poolingformer](papers/2021/210510%20Poolingformer.md)
1. [210603 Luna](papers/2021/210603%20Luna.md)
1. [210623 Stable, Fast and Accurate](papers/2021/210623%20Stable%2C%20Fast%20and%20Accurate.md)
1. [210705 Long-Short Transformer](papers/2021/210705%20Long-Short%20Transformer.md) #local_attention
1. [210712 Combiner](papers/2021/210712%20Combiner.md) #sparse_attention #local_attention
1. [210725 H-Transformer-1D](papers/2021/210725%20H-Transformer-1D.md)
1. [211210 Self-attention Does Not Need $O(n^2)$ Memory](papers/2021/211210%20Self-attention%20Does%20Not%20Need%20%24O%28n%5E2%29%24%20Memory.md)
1. [220527 FlashAttention](papers/2022/220527%20FlashAttention.md)
1. [220726 DETRs with Hybrid Matching](papers/2022/220726%20DETRs%20with%20Hybrid%20Matching.md) #detr
1. [220911 On The Computational Complexity of Self-Attention](papers/2022/220911%20On%20The%20Computational%20Complexity%20of%20Self-Attention.md)
1. [220921 Mega](papers/2022/220921%20Mega.md)
1. [230317 CoLT5](papers/2023/230317%20CoLT5.md)
1. [230705 LongNet](papers/2023/230705%20LongNet.md)
1. [230706 Focused Transformer](papers/2023/230706%20Focused%20Transformer.md)
## efficient training
1. [230216 Decoupled Model Schedule for Deep Learning Training](papers/2023/230216%20Decoupled%20Model%20Schedule%20for%20Deep%20Learning%20Training.md) #distributed_training
1. [230711 Stack More Layers Differently](papers/2023/230711%20Stack%20More%20Layers%20Differently.md)
1. [230712 No Train No Gain](papers/2023/230712%20No%20Train%20No%20Gain.md)
1. [230807 LoRA-FA](papers/2023/230807%20LoRA-FA.md)
## embedding
1. [200424 All Word Embeddings from One Embedding](papers/2020/200424%20All%20Word%20Embeddings%20from%20One%20Embedding.md)
1. [200717 A Unifying Perspective on Neighbor Embeddings along the](papers/2020/200717%20A%20Unifying%20Perspective%20on%20Neighbor%20Embeddings%20along%20the.md)
1. [210907 Rare Words Degenerate All Words](papers/2021/210907%20Rare%20Words%20Degenerate%20All%20Words.md)
## end2end
1. [200605 End-to-End Adversarial Text-to-Speech](papers/2020/200605%20End-to-End%20Adversarial%20Text-to-Speech.md) #tts
1. [200608 FastSpeech 2](papers/2020/200608%20FastSpeech%202.md) #tts
1. [201106 Wave-Tacotron](papers/2020/201106%20Wave-Tacotron.md) #tts
1. [210716 Autonomy 2.0](papers/2021/210716%20Autonomy%202.0.md)
1. [211215 SPTS](papers/2021/211215%20SPTS.md)
## energy based model
1. [200504 How to Train Your Energy-Based Model for Regression](papers/2020/200504%20How%20to%20Train%20Your%20Energy-Based%20Model%20for%20Regression.md)
## ensemble
1. [200217 BatchEnsemble](papers/2020/200217%20BatchEnsemble.md)
## federated learning
1. [210415 See through Gradients](papers/2021/210415%20See%20through%20Gradients.md)
## few shot
1. [200228 AdarGCN](papers/2020/200228%20AdarGCN.md) #graph
1. [210608 Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks](papers/2021/210608%20Parameter-efficient%20Multi-task%20Fine-tuning%20for%20Transformers%20via%20Shared%20Hypernetworks.md) #adapter #multitask
1. [210910 LibFewShot](papers/2021/210910%20LibFewShot.md)
1. [220715 Plex](papers/2022/220715%20Plex.md) #uncertainty #generalization
## finetuning
1. [200214 AutoLR](papers/2020/200214%20AutoLR.md) #pruning
1. [200426 Masking as an Efficient Alternative to Finetuning for Pretrained](papers/2020/200426%20Masking%20as%20an%20Efficient%20Alternative%20to%20Finetuning%20for%20Pretrained.md)
1. [200709 Sample-based Regularization](papers/2020/200709%20Sample-based%20Regularization.md) #transfer
1. [230428 Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs](papers/2023/230428%20Empirical%20Analysis%20of%20the%20Strengths%20and%20Weaknesses%20of%20PEFT%20Techniques%20for%20LLMs.md)
## flow
1. [200220 Regularized Autoencoders via Relaxed Injective Probability Flow](papers/2020/200220%20Regularized%20Autoencoders%20via%20Relaxed%20Injective%20Probability%20Flow.md)
1. [200227 Woodbury Transformations for Deep Generative Flows](papers/2020/200227%20Woodbury%20Transformations%20for%20Deep%20Generative%20Flows.md)
## fpn
1. [200122 CARAFE](papers/2020/200122%20CARAFE.md) #resampling
1. [200129 Mixture FPN](papers/2020/200129%20Mixture%20FPN.md)
1. [200506 Scale-Equalizing Pyramid Convolution for Object Detection](papers/2020/200506%20Scale-Equalizing%20Pyramid%20Convolution%20for%20Object%20Detection.md)
1. [201201 Dynamic Feature Pyramid Networks for Object Detection](papers/2020/201201%20Dynamic%20Feature%20Pyramid%20Networks%20for%20Object%20Detection.md)
1. [201202 Dual Refinement Feature Pyramid Networks for Object Detection](papers/2020/201202%20Dual%20Refinement%20Feature%20Pyramid%20Networks%20for%20Object%20Detection.md)
1. [201202 Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate](papers/2020/201202%20Parallel%20Residual%20Bi-Fusion%20Feature%20Pyramid%20Network%20for%20Accurate.md)
1. [201225 Implicit Feature Pyramid Network for Object Detection](papers/2020/201225%20Implicit%20Feature%20Pyramid%20Network%20for%20Object%20Detection.md) #equilibrium_model #implicit_model
## gan
1. [170629 Do GANs actually learn the distribution](papers/2017/170629%20Do%20GANs%20actually%20learn%20the%20distribution.md)
1. [191022 MelGAN](papers/2019/191022%20MelGAN.md) #tts
1. [200129 Adversarial Lipschitz Regularization](papers/2020/200129%20Adversarial%20Lipschitz%20Regularization.md)
1. [200129 GAN generalization metric](papers/2020/200129%20GAN%20generalization%20metric.md)
1. [200129 OneGAN](papers/2020/200129%20OneGAN.md)
1. [200130 AttentionGAN](papers/2020/200130%20AttentionGAN.md) #attention #img2img
1. [200130 Evaluation metrics of GAN](papers/2020/200130%20Evaluation%20metrics%20of%20GAN.md) #metric #evaluation #generative_model
1. [200130 Local GAN](papers/2020/200130%20Local%20GAN.md) #attention
1. [200130 Noise Robust GAN](papers/2020/200130%20Noise%20Robust%20GAN.md) #robustness
1. [200130 Small-GAN](papers/2020/200130%20Small-GAN.md)
1. [200130 Smoothness and Stability in GANs](papers/2020/200130%20Smoothness%20and%20Stability%20in%20GANs.md)
1. [200206 Unbalanced GANs](papers/2020/200206%20Unbalanced%20GANs.md) #vae
1. [200210 Unsupervised Discovery of Interpretable Directions in the GAN Latent](papers/2020/200210%20Unsupervised%20Discovery%20of%20Interpretable%20Directions%20in%20the%20GAN%20Latent.md) #semantic_factor
1. [200211 Improved Consistency Regularization for GANs](papers/2020/200211%20Improved%20Consistency%20Regularization%20for%20GANs.md) #augmentation #consistency_regularization
1. [200211 Smoothness and Stability in GANs](papers/2020/200211%20Smoothness%20and%20Stability%20in%20GANs.md) #regularization
1. [200212 Image-to-Image Translation with Text Guidance](papers/2020/200212%20Image-to-Image%20Translation%20with%20Text%20Guidance.md) #multimodal #multimodal_generation #img2img
1. [200212 Real or Not Real, that is the Question](papers/2020/200212%20Real%20or%20Not%20Real%2C%20that%20is%20the%20Question.md)
1. [200214 Top-k Training of GANs](papers/2020/200214%20Top-k%20Training%20of%20GANs.md) #regularization
1. [200220 The Benefits of Pairwise Discriminators for Adversarial Training](papers/2020/200220%20The%20Benefits%20of%20Pairwise%20Discriminators%20for%20Adversarial%20Training.md) #regularization
1. [200223 GANHopper](papers/2020/200223%20GANHopper.md) #img2img
1. [200224 When Relation Networks meet GANs](papers/2020/200224%20When%20Relation%20Networks%20meet%20GANs.md) #regularization
1. [200225 Freeze the Discriminator](papers/2020/200225%20Freeze%20the%20Discriminator.md) #finetuning #transfer
1. [200226 On Leveraging Pretrained GANs for Generation with Limited Data](papers/2020/200226%20On%20Leveraging%20Pretrained%20GANs%20for%20Generation%20with%20Limited%20Data.md) #finetuning #transfer
1. [200227 Topology Distance](papers/2020/200227%20Topology%20Distance.md) #topology #score
1. [200228 A U-Net Based Discriminator for Generative Adversarial Networks](papers/2020/200228%20A%20U-Net%20Based%20Discriminator%20for%20Generative%20Adversarial%20Networks.md)
1. [200304 Creating High Resolution Images with a Latent Adversarial Generator](papers/2020/200304%20Creating%20High%20Resolution%20Images%20with%20a%20Latent%20Adversarial%20Generator.md) #generative_model #super_resolution
1. [200308 Perceptual Image Super-Resolution with Progressive Adversarial Network](papers/2020/200308%20Perceptual%20Image%20Super-Resolution%20with%20Progressive%20Adversarial%20Network.md) #super_resolution
1. [200312 Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling](papers/2020/200312%20Your%20GAN%20is%20Secretly%20an%20Energy-based%20Model%20and%20You%20Should%20use%20Discriminator%20Driven%20Latent%20Sampling.md) #energy_based_model #sampling
1. [200317 Blur, Noise, and Compression Robust Generative Adversarial Networks](papers/2020/200317%20Blur%2C%20Noise%2C%20and%20Compression%20Robust%20Generative%20Adversarial%20Networks.md) #noise
1. [200318 OpenGAN](papers/2020/200318%20OpenGAN.md) #metric_learning
1. [200325 Improved Techniques for Training Single-Image GANs](papers/2020/200325%20Improved%20Techniques%20for%20Training%20Single-Image%20GANs.md) #single_image
1. [200326 Image Generation Via Minimizing Fréchet Distance in Discriminator Feature Space](papers/2020/200326%20Image%20Generation%20Via%20Minimizing%20Fr%C3%A9chet%20Distance%20in%20Discriminator%20Feature%20Space.md)
1. [200402 Controllable Orthogonalization in Training DNNs](papers/2020/200402%20Controllable%20Orthogonalization%20in%20Training%20DNNs.md) #regularization
1. [200404 Feature Quantization Improves GAN Training](papers/2020/200404%20Feature%20Quantization%20Improves%20GAN%20Training.md) #discrete_vae
1. [200405 Discriminator Contrastive Divergence](papers/2020/200405%20Discriminator%20Contrastive%20Divergence.md)
1. [200407 Inclusive GAN](papers/2020/200407%20Inclusive%20GAN.md)
1. [200408 Attentive Normalization for Conditional Image Generation](papers/2020/200408%20Attentive%20Normalization%20for%20Conditional%20Image%20Generation.md) #attention
1. [200504 Transforming and Projecting Images into Class-conditional Generative](papers/2020/200504%20Transforming%20and%20Projecting%20Images%20into%20Class-conditional%20Generative.md) #generative_model
1. [200518 Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization](papers/2020/200518%20Unconditional%20Audio%20Generation%20with%20Generative%20Adversarial%20Networks%20and%20Cycle%20Regularization.md) #audio_generation
1. [200519 CIAGAN](papers/2020/200519%20CIAGAN.md)
1. [200519 Regularization Methods for Generative Adversarial Networks](papers/2020/200519%20Regularization%20Methods%20for%20Generative%20Adversarial%20Networks.md) #review #regularization
1. [200604 Image Augmentations for GAN Training](papers/2020/200604%20Image%20Augmentations%20for%20GAN%20Training.md) #augmentation
1. [200611 Training Generative Adversarial Networks with Limited Data](papers/2020/200611%20Training%20Generative%20Adversarial%20Networks%20with%20Limited%20Data.md) #augmentation
1. [200618 Differentiable Augmentation for Data-Efficient GAN Training](papers/2020/200618%20Differentiable%20Augmentation%20for%20Data-Efficient%20GAN%20Training.md) #augmentation
1. [200618 Diverse Image Generation via Self-Conditioned GANs](papers/2020/200618%20Diverse%20Image%20Generation%20via%20Self-Conditioned%20GANs.md) #generative_model
1. [200630 PriorGAN](papers/2020/200630%20PriorGAN.md)
1. [200708 InfoMax-GAN](papers/2020/200708%20InfoMax-GAN.md) #regularization
1. [200713 Closed-Form Factorization of Latent Semantics in GANs](papers/2020/200713%20Closed-Form%20Factorization%20of%20Latent%20Semantics%20in%20GANs.md) #semantic_factor
1. [200729 Instance Selection for GANs](papers/2020/200729%20Instance%20Selection%20for%20GANs.md)
1. [200729 VocGAN](papers/2020/200729%20VocGAN.md) #vocoder
1. [200730 Rewriting a Deep Generative Model](papers/2020/200730%20Rewriting%20a%20Deep%20Generative%20Model.md)
1. [200804 Open-Edit](papers/2020/200804%20Open-Edit.md) #image_editing
1. [200807 Improving the Speed and Quality of GAN by Adversarial Training](papers/2020/200807%20Improving%20the%20Speed%20and%20Quality%20of%20GAN%20by%20Adversarial%20Training.md) #robustness
1. [201028 Training Generative Adversarial Networks by Solving Ordinary](papers/2020/201028%20Training%20Generative%20Adversarial%20Networks%20by%20Solving%20Ordinary.md) #neural_ode
1. [201109 Learning Semantic-aware Normalization for Generative Adversarial Networks](papers/2020/201109%20Learning%20Semantic-aware%20Normalization%20for%20Generative%20Adversarial%20Networks.md) #normalization
1. [201109 Towards a Better Global Loss Landscape of GANs](papers/2020/201109%20Towards%20a%20Better%20Global%20Loss%20Landscape%20of%20GANs.md) #training
1. [201118 Style Intervention](papers/2020/201118%20Style%20Intervention.md) #semantic_factor
1. [201124 Adversarial Generation of Continuous Images](papers/2020/201124%20Adversarial%20Generation%20of%20Continuous%20Images.md) #implicit_representation
1. [201125 How to train your conditional GAN](papers/2020/201125%20How%20to%20train%20your%20conditional%20GAN.md) #img2img #generative_model
1. [201125 Omni-GAN](papers/2020/201125%20Omni-GAN.md) #generative_model
1. [201127 Image Generators with Conditionally-Independent Pixel Synthesis](papers/2020/201127%20Image%20Generators%20with%20Conditionally-Independent%20Pixel%20Synthesis.md) #implicit_representation
1. [201201 Refining Deep Generative Models via Discriminator Gradient Flow](papers/2020/201201%20Refining%20Deep%20Generative%20Models%20via%20Discriminator%20Gradient%20Flow.md) #sampling
1. [201201 pi-GAN](papers/2020/201201%20pi-GAN.md) #implicit_representation
1. [201203 Self-labeled Conditional GANs](papers/2020/201203%20Self-labeled%20Conditional%20GANs.md) #unsupervised_training
1. [201204 A Note on Data Biases in Generative Models](papers/2020/201204%20A%20Note%20on%20Data%20Biases%20in%20Generative%20Models.md) #bias #generative_model
1. [201208 You Only Need Adversarial Supervision for Semantic Image Synthesis](papers/2020/201208%20You%20Only%20Need%20Adversarial%20Supervision%20for%20Semantic%20Image%20Synthesis.md) #img2img
1. [210227 Ultra-Data-Efficient GAN Training](papers/2021/210227%20Ultra-Data-Efficient%20GAN%20Training.md) #augmentation #few_shot
1. [210317 Training GANs with Stronger Augmentations via Contrastive Discriminator](papers/2021/210317%20Training%20GANs%20with%20Stronger%20Augmentations%20via%20Contrastive%20Discriminator.md) #contrastive_learning #augmentation
1. [210318 Drop the GAN](papers/2021/210318%20Drop%20the%20GAN.md) #single_image #generative_model #patch
1. [210330 Dual Contrastive Loss and Attention for GANs](papers/2021/210330%20Dual%20Contrastive%20Loss%20and%20Attention%20for%20GANs.md) #contrastive_learning
1. [210401 Partition-Guided GANs](papers/2021/210401%20Partition-Guided%20GANs.md)
1. [210407 Regularizing Generative Adversarial Networks under Limited Data](papers/2021/210407%20Regularizing%20Generative%20Adversarial%20Networks%20under%20Limited%20Data.md) #regularization
1. [210408 InfinityGAN](papers/2021/210408%20InfinityGAN.md)
1. [210413 DatasetGAN](papers/2021/210413%20DatasetGAN.md) #few_shot
1. [210413 Few-shot Image Generation via Cross-domain Correspondence](papers/2021/210413%20Few-shot%20Image%20Generation%20via%20Cross-domain%20Correspondence.md) #img2img #generative_model #few_shot
1. [210414 Aligning Latent and Image Spaces to Connect the Unconnectable](papers/2021/210414%20Aligning%20Latent%20and%20Image%20Spaces%20to%20Connect%20the%20Unconnectable.md)
1. [210415 GANcraft](papers/2021/210415%20GANcraft.md) #nerf
1. [210422 On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation](papers/2021/210422%20On%20Buggy%20Resizing%20Libraries%20and%20Surprising%20Subtleties%20in%20FID%20Calculation.md) #antialiasing
1. [210426 EigenGAN](papers/2021/210426%20EigenGAN.md) #semantic_factor
1. [210608 Data-Efficient Instance Generation from Instance Discrimination](papers/2021/210608%20Data-Efficient%20Instance%20Generation%20from%20Instance%20Discrimination.md) #contrastive_learning
1. [210614 Improved Transformer for High-Resolution GANs](papers/2021/210614%20Improved%20Transformer%20for%20High-Resolution%20GANs.md) #transformer #efficient_training
1. [210623 Alias-Free Generative Adversarial Networks](papers/2021/210623%20Alias-Free%20Generative%20Adversarial%20Networks.md) #antialiasing
1. [210910 Instance-Conditioned GAN](papers/2021/210910%20Instance-Conditioned%20GAN.md)
1. [210927 WarpedGANSpace](papers/2021/210927%20WarpedGANSpace.md)
1. [211017 AE-StyleGAN](papers/2021/211017%20AE-StyleGAN.md) #gan_inversion
1. [211101 Projected GANs Converge Faster](papers/2021/211101%20Projected%20GANs%20Converge%20Faster.md)
1. [211215 Efficient Geometry-aware 3D Generative Adversarial Networks](papers/2021/211215%20Efficient%20Geometry-aware%203D%20Generative%20Adversarial%20Networks.md) #nerf
1. [211216 GRAM](papers/2021/211216%20GRAM.md) #3d_generative_model #nerf
1. [220201 StyleGAN-XL](papers/2022/220201%20StyleGAN-XL.md)
1. [220219 Truncated Diffusion Probabilistic Models](papers/2022/220219%20Truncated%20Diffusion%20Probabilistic%20Models.md) #generative_model #ddpm
1. [220224 Self-Distilled StyleGAN](papers/2022/220224%20Self-Distilled%20StyleGAN.md)
1. [220311 The Role of ImageNet Classes in Fréchet Inception Distance](papers/2022/220311%20The%20Role%20of%20ImageNet%20Classes%20in%20Fr%C3%A9chet%20Inception%20Distance.md)
1. [220314 InsetGAN for Full-Body Image Generation](papers/2022/220314%20InsetGAN%20for%20Full-Body%20Image%20Generation.md) #pose
1. [220414 Any-resolution Training for High-resolution Image Synthesis](papers/2022/220414%20Any-resolution%20Training%20for%20High-resolution%20Image%20Synthesis.md)
1. [230123 StyleGAN-T](papers/2023/230123%20StyleGAN-T.md) #text2img
1. [230309 Scaling up GANs for Text-to-Image Synthesis](papers/2023/230309%20Scaling%20up%20GANs%20for%20Text-to-Image%20Synthesis.md) #text2img
## gan inversion
1. [200330 Exploiting Deep Generative Prior for Versatile Image Restoration and](papers/2020/200330%20Exploiting%20Deep%20Generative%20Prior%20for%20Versatile%20Image%20Restoration%20and.md) #perceptual_loss
1. [200331 In-Domain GAN Inversion for Real Image Editing](papers/2020/200331%20In-Domain%20GAN%20Inversion%20for%20Real%20Image%20Editing.md)
1. [200703 Collaborative Learning for Faster StyleGAN Embedding](papers/2020/200703%20Collaborative%20Learning%20for%20Faster%20StyleGAN%20Embedding.md)
1. [200803 Encoding in Style](papers/2020/200803%20Encoding%20in%20Style.md) #stylegan
1. [220223 Near Perfect GAN Inversion](papers/2022/220223%20Near%20Perfect%20GAN%20Inversion.md)
## generalization
1. [200130 Fantastic Generalization Measures](papers/2020/200130%20Fantastic%20Generalization%20Measures.md)
1. [200225 Rethinking Bias-Variance Trade-off for Generalization of Neural Networks](papers/2020/200225%20Rethinking%20Bias-Variance%20Trade-off%20for%20Generalization%20of%20Neural%20Networks.md)
## generative model
1. [190325 Implicit Generative and Generalization in Energy-Based Models](papers/2019/190325%20Implicit%20Generative%20and%20Generalization%20in%20Energy-Based%20Models.md) #energy_based_model
1. [200129 Controlling Generative Model](papers/2020/200129%20Controlling%20Generative%20Model.md)
1. [200129 Deep Automodulator](papers/2020/200129%20Deep%20Automodulator.md)
1. [200129 Frechet Joint Distance](papers/2020/200129%20Frechet%20Joint%20Distance.md)
1. [200129 Spot CNN generated image](papers/2020/200129%20Spot%20CNN%20generated%20image.md)
1. [200130 BIVA](papers/2020/200130%20BIVA.md)
1. [200130 Glow](papers/2020/200130%20Glow.md) #flow
1. [200130 IGEBM](papers/2020/200130%20IGEBM.md) #energy_based_model
1. [200130 Neural Spline Flows](papers/2020/200130%20Neural%20Spline%20Flows.md) #flow
1. [200130 VQ-VAE-2](papers/2020/200130%20VQ-VAE-2.md) #autoregressive_model
1. [200217 Augmented Normalizing Flows](papers/2020/200217%20Augmented%20Normalizing%20Flows.md) #flow
1. [200313 Semantic Pyramid for Image Generation](papers/2020/200313%20Semantic%20Pyramid%20for%20Image%20Generation.md) #perceptual_loss #image_editing
1. [200616 Improved Techniques for Training Score-Based Generative Models](papers/2020/200616%20Improved%20Techniques%20for%20Training%20Score-Based%20Generative%20Models.md) #ncsn
1. [201117 DeepNAG](papers/2020/201117%20DeepNAG.md)
1. [201202 Improved Contrastive Divergence Training of Energy Based Models](papers/2020/201202%20Improved%20Contrastive%20Divergence%20Training%20of%20Energy%20Based%20Models.md) #energy_based_model
1. [201204 Few-shot Image Generation with Elastic Weight Consolidation](papers/2020/201204%20Few-shot%20Image%20Generation%20with%20Elastic%20Weight%20Consolidation.md) #few_shot #continual_learning
1. [201209 Positional Encoding as Spatial Inductive Bias in GANs](papers/2020/201209%20Positional%20Encoding%20as%20Spatial%20Inductive%20Bias%20in%20GANs.md) #positional_encoding
1. [201224 Soft-IntroVAE](papers/2020/201224%20Soft-IntroVAE.md) #vae
1. [210223 Zero-Shot Text-to-Image Generation](papers/2021/210223%20Zero-Shot%20Text-to-Image%20Generation.md) #discrete_vae #autoregressive_model #multimodal
1. [210318 Few-shot Semantic Image Synthesis Using StyleGAN Prior](papers/2021/210318%20Few-shot%20Semantic%20Image%20%20Synthesis%20Using%20StyleGAN%20Prior.md) #stylegan #few_shot
1. [210824 SimVLM](papers/2021/210824%20SimVLM.md) #vision-language
1. [211015 MaGNET](papers/2021/211015%20MaGNET.md) #sampling
1. [220208 MaskGIT](papers/2022/220208%20MaskGIT.md) #autoregressive_model #non-autoregressive #vq
## graph
1. [200129 Multi-Graph Transformer](papers/2020/200129%20Multi-Graph%20Transformer.md)
## hallucination
1. [210413 The Curious Case of Hallucinations in Neural Machine Translation](papers/2021/210413%20The%20Curious%20Case%20of%20Hallucinations%20in%20Neural%20Machine%20Translation.md) #mt
## hypernetwork
1. [200722 WeightNet](papers/2020/200722%20WeightNet.md) #channel_attention
## hyperparameter
1. [200425 Learning to Guide Random Search](papers/2020/200425%20Learning%20to%20Guide%20Random%20Search.md)
1. [200521 HyperSTAR](papers/2020/200521%20HyperSTAR.md)
## identifiability
1. [200701 On Linear Identifiability of Learned Representations](papers/2020/200701%20On%20Linear%20Identifiability%20of%20Learned%20Representations.md)
## image editing
1. [200515 Semantic Photo Manipulation with a Generative Image Prior](papers/2020/200515%20Semantic%20Photo%20Manipulation%20with%20a%20Generative%20Image%20Prior.md)
1. [200702 Deep Single Image Manipulation](papers/2020/200702%20Deep%20Single%20Image%20Manipulation.md) #single_image #img2img
1. [201123 HistoGAN](papers/2020/201123%20HistoGAN.md)
1. [201127 Navigating the GAN Parameter Space for Semantic Image Editing](papers/2020/201127%20Navigating%20the%20GAN%20Parameter%20Space%20for%20Semantic%20Image%20Editing.md) #semantic_factor
1. [210318 Using latent space regression to analyze and leverage compositionality](papers/2021/210318%20Using%20latent%20space%20regression%20to%20analyze%20and%20leverage%20compositionality.md)
1. [220531 IDE-3D](papers/2022/220531%20IDE-3D.md) #3d_generative_model
1. [220802 An Image is Worth One Word](papers/2022/220802%20An%20Image%20is%20Worth%20One%20Word.md)
1. [220802 Prompt-to-Prompt Image Editing with Cross Attention Control](papers/2022/220802%20Prompt-to-Prompt%20Image%20Editing%20with%20Cross%20Attention%20Control.md)
1. [230202 Dreamix](papers/2023/230202%20Dreamix.md) #video
1. [230213 3D-aware Blending with Generative NeRFs](papers/2023/230213%203D-aware%20Blending%20with%20Generative%20NeRFs.md) #3d_generative_model
1. [230626 DragDiffusion](papers/2023/230626%20DragDiffusion.md)
1. [230626 Localized Text-to-Image Generation for Free via Cross Attention Control](papers/2023/230626%20Localized%20Text-to-Image%20Generation%20for%20Free%20via%20Cross%20Attention%20Control.md) #text2img
1. [230705 DragonDiffusion](papers/2023/230705%20DragonDiffusion.md)
## image generation
1. [200426 Disentangled Image Generation Through Structured Noise Injection](papers/2020/200426%20Disentangled%20Image%20Generation%20Through%20Structured%20Noise%20Injection.md)
## img2img
1. [200130 FUNIT](papers/2020/200130%20FUNIT.md)
1. [200305 SketchyCOCO](papers/2020/200305%20SketchyCOCO.md)
1. [200315 GMM-UNIT](papers/2020/200315%20GMM-UNIT.md) #multimodal_generation
1. [200319 High-Resolution Daytime Translation Without Domain Labels](papers/2020/200319%20High-Resolution%20Daytime%20Translation%20Without%20Domain%20Labels.md)
1. [200330 Semi-supervised Learning for Few-shot Image-to-Image Translation](papers/2020/200330%20Semi-supervised%20Learning%20for%20Few-shot%20Image-to-Image%20Translation.md) #semi_supervised_learning #few_shot
1. [200406 Rethinking Spatially-Adaptive Normalization](papers/2020/200406%20Rethinking%20Spatially-Adaptive%20Normalization.md) #lightweight
1. [200409 TuiGAN](papers/2020/200409%20TuiGAN.md) #few_shot #single_image
1. [200419 TriGAN](papers/2020/200419%20TriGAN.md) #domain_adaptation
1. [200709 Improving Style-Content Disentanglement in Image-to-Image Translation](papers/2020/200709%20Improving%20Style-Content%20Disentanglement%20in%20Image-to-Image%20Translation.md) #disentangle
1. [200714 COCO-FUNIT](papers/2020/200714%20COCO-FUNIT.md)
1. [200715 Transformation Consistency Regularization- A Semi-Supervised Paradigm](papers/2020/200715%20Transformation%20Consistency%20Regularization-%20A%20Semi-Supervised%20Paradigm.md) #augmentation #semi_supervised_learning
1. [200723 TSIT](papers/2020/200723%20TSIT.md)
1. [200724 The Surprising Effectiveness of Linear Unsupervised Image-to-Image](papers/2020/200724%20The%20Surprising%20Effectiveness%20of%20Linear%20Unsupervised%20Image-to-Image.md)
1. [201203 CoCosNet v2](papers/2020/201203%20CoCosNet%20v2.md) #patch #pose
1. [201205 Spatially-Adaptive Pixelwise Networks for Fast Image Translation](papers/2020/201205%20Spatially-Adaptive%20Pixelwise%20Networks%20for%20Fast%20Image%20Translation.md) #implicit_representation
## implicit model
1. [200615 Multiscale Deep Equilibrium Models](papers/2020/200615%20Multiscale%20Deep%20Equilibrium%20Models.md)
## implicit representation
1. [211026 NeRV](papers/2021/211026%20NeRV.md)
1. [211122 Neural Fields in Visual Computing and Beyond](papers/2021/211122%20Neural%20Fields%20in%20Visual%20Computing%20and%20Beyond.md)
1. [220117 Instant Neural Graphics Primitives with a Multiresolution Hash Encoding](papers/2022/220117%20Instant%20Neural%20Graphics%20Primitives%20with%20a%20Multiresolution%20Hash%20Encoding.md)
1. [220522 ReLU Fields](papers/2022/220522%20ReLU%20Fields.md)
1. [230202 Factor Fields](papers/2023/230202%20Factor%20Fields.md)
## in context learning
1. [220520 Prototypical Calibration for Few-shot Learning of Language Models](papers/2022/220520%20Prototypical%20Calibration%20for%20Few-shot%20Learning%20of%20Language%20Models.md)
1. [220522 Instruction Induction](papers/2022/220522%20Instruction%20Induction.md)
1. [230613 TART](papers/2023/230613%20TART.md)
## instance segmentation
1. [200129 BlendMask](papers/2020/200129%20BlendMask.md)
1. [200129 COCO 2018 Instance Segmentation](papers/2020/200129%20COCO%202018%20Instance%20Segmentation.md) #challenge
1. [200129 Deep Snake](papers/2020/200129%20Deep%20Snake.md)
1. [200130 PointRend](papers/2020/200130%20PointRend.md)
1. [200311 Conditional Convolutions for Instance Segmentation](papers/2020/200311%20Conditional%20Convolutions%20for%20Instance%20Segmentation.md)
1. [200313 PointINS](papers/2020/200313%20PointINS.md) #dynamic_conv
1. [200722 Deep Variational Instance Segmentation](papers/2020/200722%20Deep%20Variational%20Instance%20Segmentation.md)
1. [200730 LevelSet R-CNN](papers/2020/200730%20LevelSet%20R-CNN.md)
1. [201119 DCT-Mask](papers/2020/201119%20DCT-Mask.md)
1. [201119 Unifying Instance and Panoptic Segmentation with Dynamic Rank-1](papers/2020/201119%20Unifying%20Instance%20and%20Panoptic%20Segmentation%20with%20Dynamic%20Rank-1.md) #panoptic_segmentation #dynamic_conv
1. [201126 The Devil is in the Boundary](papers/2020/201126%20The%20Devil%20is%20in%20the%20Boundary.md)
1. [201129 End-to-End Video Instance Segmentation with Transformers](papers/2020/201129%20End-to-End%20Video%20Instance%20Segmentation%20with%20Transformers.md) #end2end #detr #video
1. [201203 BoxInst](papers/2020/201203%20BoxInst.md) #dataset #weak_supervision
1. [210503 ISTR](papers/2021/210503%20ISTR.md) #end2end
1. [210505 QueryInst](papers/2021/210505%20QueryInst.md) #end2end
1. [210604 SOLQ](papers/2021/210604%20SOLQ.md)
1. [210713 Per-Pixel Classification is Not All You Need for Semantic Segmentation](papers/2021/210713%20Per-Pixel%20Classification%20is%20Not%20All%20You%20Need%20for%20Semantic%20Segmentation.md) #panoptic_segmentation #semantic_segmentation #detr
1. [221110 OneFormer](papers/2022/221110%20OneFormer.md) #semantic_segmentation #panoptic_segmentation #detr
## instruct
1. [230131 The Flan Collection](papers/2023/230131%20The%20Flan%20Collection.md)
1. [230210 The Wisdom of Hindsight Makes Language Models Better Instruction Followers](papers/2023/230210%20The%20Wisdom%20of%20Hindsight%20Makes%20Language%20Models%20Better%20Instruction%20Followers.md) #reinforcement_learning
1. [230406 Instruction Tuning with GPT-4](papers/2023/230406%20Instruction%20Tuning%20with%20GPT-4.md)
1. [230704 Instruction Tuning Review](papers/2023/230704%20Instruction%20Tuning%20Review.md)
## interpolation
1. [200804 Autoencoder Image Interpolation by Shaping the Latent Space](papers/2020/200804%20Autoencoder%20Image%20Interpolation%20by%20Shaping%20the%20Latent%20Space.md)
1. [211018 Learning in High Dimension Always Amounts to Extrapolation](papers/2021/211018%20Learning%20in%20High%20Dimension%20Always%20Amounts%20to%20Extrapolation.md) #extrapolation
## knowledge base
1. [200214 Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base](papers/2020/200214%20Scalable%20Neural%20Methods%20for%20Reasoning%20With%20a%20Symbolic%20Knowledge%20Base.md)
## language generation
1. [200712 Do You Have the Right Scissors](papers/2020/200712%20Do%20You%20Have%20the%20Right%20Scissors.md)
1. [200729 Mirostat](papers/2020/200729%20Mirostat.md)
## language model
1. [200128 Scaling Laws for LM](papers/2020/200128%20Scaling%20Laws%20for%20LM.md)
1. [200205 K-Adapter](papers/2020/200205%20K-Adapter.md) #multitask #adapter
1. [200206 Consistency of a Recurrent Language Model With Respect to Incomplete](papers/2020/200206%20Consistency%20of%20a%20Recurrent%20Language%20Model%20With%20Respect%20to%20Incomplete.md) #decoding #hallucination #language_generation
1. [200222 Training Question Answering Models From Synthetic Data](papers/2020/200222%20Training%20Question%20Answering%20Models%20From%20Synthetic%20Data.md) #qa #bert
1. [200225 MiniLM](papers/2020/200225%20MiniLM.md) #distillation #lightweight
1. [200406 Sparse Text Generation](papers/2020/200406%20Sparse%20Text%20Generation.md) #language_generation #sampling
1. [200427 Recall and Learn](papers/2020/200427%20Recall%20and%20Learn.md) #finetuning #continual_learning
1. [200505 Stolen Probability](papers/2020/200505%20Stolen%20Probability.md)
1. [200516 MicroNet for Efficient Language Modeling](papers/2020/200516%20MicroNet%20for%20Efficient%20Language%20Modeling.md) #lightweight
1. [200518 Contextual Embeddings](papers/2020/200518%20Contextual%20Embeddings.md)
1. [201015 Fine-Tuning Pre-trained Language Model with Weak Supervision](papers/2020/201015%20Fine-Tuning%20Pre-trained%20Language%20Model%20with%20Weak%20Supervision.md) #transfer #weak_supervision
1. [201023 Rethinking embedding coupling in pre-trained language models](papers/2020/201023%20Rethinking%20embedding%20coupling%20in%20pre-trained%20language%20models.md) #regularization
1. [201201 How Can We Know When Language Models Know](papers/2020/201201%20How%20Can%20We%20Know%20When%20Language%20Models%20Know.md) #qa #calibration
1. [201228 Universal Sentence Representation Learning with Conditional Masked](papers/2020/201228%20Universal%20Sentence%20Representation%20Learning%20with%20Conditional%20Masked.md) #sentence_embedding #mlm
1. [210216 Non-Autoregressive Text Generation with Pre-trained Language Models](papers/2021/210216%20Non-Autoregressive%20Text%20Generation%20with%20Pre-trained%20Language%20Models.md) #non-autoregressive #text_generation
1. [210318 GPT Understands, Too](papers/2021/210318%20GPT%20Understands%2C%20Too.md) #finetuning #prompt
1. [210407 Revisiting Simple Neural Probabilistic Language Models](papers/2021/210407%20Revisiting%20Simple%20Neural%20Probabilistic%20Language%20Models.md)
1. [210420 Carbon Emissions and Large Neural Network Training](papers/2021/210420%20Carbon%20Emissions%20and%20Large%20Neural%20Network%20Training.md) #nlp
1. [210922 Recursively Summarizing Books with Human Feedback](papers/2021/210922%20Recursively%20Summarizing%20Books%20with%20Human%20Feedback.md) #summarization
## layout
1. [210601 Incorporating Visual Layout Structures for Scientific Text Classification](papers/2021/210601%20Incorporating%20Visual%20Layout%20Structures%20for%20Scientific%20Text%20Classification.md)
1. [210902 Skim-Attention](papers/2021/210902%20Skim-Attention.md)
1. [220418 LayoutLMv3](papers/2022/220418%20LayoutLMv3.md)
1. [220517 MATrIX -- Modality-Aware Transformer for Information eXtraction](papers/2022/220517%20MATrIX%20--%20Modality-Aware%20Transformer%20for%20Information%20eXtraction.md)
1. [220912 PreSTU](papers/2022/220912%20PreSTU.md)
1. [220918 ERNIE-mmLayout](papers/2022/220918%20ERNIE-mmLayout.md)
## lightweight
1. [200624 Neural Architecture Design for GPU-Efficient Networks](papers/2020/200624%20Neural%20Architecture%20Design%20for%20GPU-Efficient%20Networks.md)
1. [201124 MicroNet](papers/2020/201124%20MicroNet.md)
1. [210507 Pareto-Optimal Quantized ResNet Is Mostly 4-bit](papers/2021/210507%20Pareto-Optimal%20Quantized%20ResNet%20Is%20Mostly%204-bit.md) #quantization
1. [220409 Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs](papers/2022/220409%20Searching%20for%20Efficient%20Neural%20Architectures%20for%20On-Device%20ML%20on%20Edge%20TPUs.md)
## line
1. [210601 Towards Real-time and Light-weight Line Segment Detection](papers/2021/210601%20Towards%20Real-time%20and%20Light-weight%20Line%20Segment%20Detection.md)
## linear attention
1. [230717 Retentive Network](papers/2023/230717%20Retentive%20Network.md) #recurrent
## llm
1. [220521 Scaling Laws and Interpretability of Learning from Repeated Data](papers/2022/220521%20Scaling%20Laws%20and%20Interpretability%20of%20Learning%20from%20Repeated%20Data.md)
1. [220522 Memorization Without Overfitting](papers/2022/220522%20Memorization%20Without%20Overfitting.md)
1. [220524 Large Language Models are Zero-Shot Reasoners](papers/2022/220524%20Large%20Language%20Models%20are%20Zero-Shot%20Reasoners.md) #prompt
1. [220711 Exploring Length Generalization in Large Language Models](papers/2022/220711%20Exploring%20Length%20Generalization%20in%20Large%20Language%20Models.md)
1. [220711 Language Models (Mostly) Know What They Know](papers/2022/220711%20Language%20Models%20%28Mostly%29%20Know%20What%20They%20Know.md)
1. [220926 Can Large Language Models Truly Understand Prompts](papers/2022/220926%20Can%20Large%20Language%20Models%20Truly%20Understand%20Prompts.md)
1. [220929 Compositional Semantic Parsing with Large Language Models](papers/2022/220929%20Compositional%20Semantic%20Parsing%20with%20Large%20Language%20Models.md) #semantic_parsing
1. [221017 Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them](papers/2022/221017%20Challenging%20BIG-Bench%20Tasks%20and%20Whether%20Chain-of-Thought%20Can%20Solve%20Them.md) #prompt #reasoning
1. [221020 Transcending Scaling Laws with 0.1% Extra Compute](papers/2022/221020%20Transcending%20Scaling%20Laws%20with%200.1%25%20Extra%20Compute.md) #mlm
1. [221103 Inverse scaling can become U-shaped](papers/2022/221103%20Inverse%20scaling%20can%20become%20U-shaped.md) #prompt
1. [221109 BLOOM](papers/2022/221109%20BLOOM.md)
1. [221109 Efficiently Scaling Transformer Inference](papers/2022/221109%20Efficiently%20Scaling%20Transformer%20Inference.md) #efficiency
1. [221118 PAL](papers/2022/221118%20PAL.md) #prompt
1. [221118 SmoothQuant](papers/2022/221118%20SmoothQuant.md) #quantization
1. [230124 A Watermark for Large Language Models](papers/2023/230124%20A%20Watermark%20for%20Large%20Language%20Models.md)
1. [230126 DetectGPT](papers/2023/230126%20DetectGPT.md)
1. [230131 Faithful Chain-of-Thought Reasoning](papers/2023/230131%20Faithful%20Chain-of-Thought%20Reasoning.md) #prompt
1. [230131 Grounding Language Models to Images for Multimodal Generation](papers/2023/230131%20Grounding%20Language%20Models%20to%20Images%20for%20Multimodal%20Generation.md) #multimodal_generation #vision-language
1. [230131 Large Language Models Can Be Easily Distracted by Irrelevant Context](papers/2023/230131%20Large%20Language%20Models%20Can%20Be%20Easily%20Distracted%20by%20Irrelevant%20Context.md) #in_context_learning
1. [230206 Chain of Hindsight Aligns Language Models with Feedback](papers/2023/230206%20Chain%20of%20Hindsight%20Aligns%20Language%20Models%20with%20Feedback.md) #alignment
1. [230209 Toolformer](papers/2023/230209%20Toolformer.md)
1. [230211 Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models](papers/2023/230211%20Characterizing%20Attribution%20and%20Fluency%20Tradeoffs%20for%20Retrieval-Augmented%20Large%20Language%20Models.md) #retrieval
1. [230215 Learning Performance-Improving Code Edits](papers/2023/230215%20Learning%20Performance-Improving%20Code%20Edits.md) #in_context_learning
1. [230215 The Capacity for Moral Self-Correction in Large Language Models](papers/2023/230215%20The%20Capacity%20for%20Moral%20Self-Correction%20in%20Large%20Language%20Models.md) #instruct #ethics
1. [230216 Pretraining Language Models with Human Preferences](papers/2023/230216%20Pretraining%20Language%20Models%20with%20Human%20Preferences.md) #instruct #alignment
1. [230219 Semantic Uncertainty](papers/2023/230219%20Semantic%20Uncertainty.md) #uncertainty
1. [230221 ChatGPT](papers/2023/230221%20ChatGPT.md) #instruct
1. [230224 Check Your Facts and Try Again](papers/2023/230224%20Check%20Your%20Facts%20and%20Try%20Again.md) #retrieval
1. [230306 PaLM-E](papers/2023/230306%20PaLM-E.md) #robotics #multimodal #3d
1. [230307 Flamingo](papers/2023/230307%20Flamingo.md) #multimodal
1. [230307 Larger language models do in-context learning differently](papers/2023/230307%20Larger%20language%20models%20do%20in-context%20learning%20differently.md) #in_context_learning
1. [230307 The BigScience ROOTS Corpus](papers/2023/230307%20The%20BigScience%20ROOTS%20Corpus.md) #dataset
1. [230313 High-throughput Generative Inference of Large Language Models with a Single GPU](papers/2023/230313%20High-throughput%20Generative%20Inference%20of%20Large%20Language%20Models%20with%20a%20Single%20GPU.md)
1. [230315 A Comprehensive Study on Post-Training Quantization for Large Language Models](papers/2023/230315%20A%20Comprehensive%20Study%20on%20Post-Training%20Quantization%20for%20Large%20Language%20Models.md) #quantization
1. [230316 ART](papers/2023/230316%20ART.md) #in_context_learning #prompt
1. [230317 GPTs are GPTs](papers/2023/230317%20GPTs%20are%20GPTs.md)
1. [230322 MEGA](papers/2023/230322%20MEGA.md) #multilingual
1. [230322 RepoCoder](papers/2023/230322%20RepoCoder.md) #retrieval
1. [230322 Sparks of Artificial General Intelligence](papers/2023/230322%20Sparks%20of%20Artificial%20General%20Intelligence.md)
1. [230407 Generative Agents](papers/2023/230407%20Generative%20Agents.md)
1. [230410 Inference with Reference](papers/2023/230410%20Inference%20with%20Reference.md) #efficiency
1. [230411 RRHF](papers/2023/230411%20RRHF.md) #alignment
1. [230416 Sabiá](papers/2023/230416%20Sabi%C3%A1.md) #multilingual
1. [230417 A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model](papers/2023/230417%20A%20Comparative%20Study%20between%20Full-Parameter%20and%20LoRA-based%20Fine-Tuning%20on%20Chinese%20Instruction%20Data%20for%20Instruction%20Following%20Large%20Language%20Model.md) #instruct
1. [230417 Low-code LLM](papers/2023/230417%20Low-code%20LLM.md) #prompt
1. [230418 UniMax](papers/2023/230418%20UniMax.md) #multilingual
1. [230419 A Theory on Adam Instability in Large-Scale Machine Learning](papers/2023/230419%20A%20Theory%20on%20Adam%20Instability%20in%20Large-Scale%20Machine%20Learning.md) #optimizer
1. [230421 Can GPT-4 Perform Neural Architecture Search](papers/2023/230421%20Can%20GPT-4%20Perform%20Neural%20Architecture%20Search.md) #nas
1. [230421 Inducing anxiety in large language models increases exploration and bias](papers/2023/230421%20Inducing%20anxiety%20in%20large%20language%20models%20increases%20exploration%20and%20bias.md)
1. [230424 Why we need RLHF](papers/2023/230424%20Why%20we%20need%20RLHF.md) #alignment #rl
1. [230428 Causal Reasoning and Large Language Models](papers/2023/230428%20Causal%20Reasoning%20and%20Large%20Language%20Models.md) #causality
1. [230428 Speak, Memory](papers/2023/230428%20Speak%2C%20Memory.md)
1. [230503 Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs](papers/2023/230503%20Cheaply%20Evaluating%20Inference%20Efficiency%20Metrics%20for%20Autoregressive%20Transformer%20APIs.md) #efficiency
1. [230504 Can LLM Already Serve as A Database Interface](papers/2023/230504%20Can%20LLM%20Already%20Serve%20as%20A%20Database%20Interface.md)
1. [230509 Large Language Model Programs](papers/2023/230509%20Large%20Language%20Model%20Programs.md) #prompt
1. [230509 MoT](papers/2023/230509%20MoT.md) #prompt
1. [230511 Chain-of-Dictionary Prompting Elicits Translation in Large Language Models](papers/2023/230511%20Chain-of-Dictionary%20Prompting%20Elicits%20Translation%20in%20Large%20Language%20Models.md)
1. [230511 INGENIOUS](papers/2023/230511%20INGENIOUS.md) #dataset
1. [230511 Not All Languages Are Created Equal in LLMs](papers/2023/230511%20Not%20All%20Languages%20Are%20Created%20Equal%20in%20LLMs.md)
1. [230513 CodeT5+](papers/2023/230513%20CodeT5%2B.md)
1. [230515 Symbol tuning improves in-context learning in language models](papers/2023/230515%20Symbol%20tuning%20improves%20in-context%20learning%20in%20language%20models.md) #prompt
1. [230516 Towards Expert-Level Medical Question Answering with Large Language Models](papers/2023/230516%20Towards%20Expert-Level%20Medical%20Question%20Answering%20with%20Large%20Language%20Models.md)
1. [230517 DoReMi](papers/2023/230517%20DoReMi.md) #dataset #multitask #pretraining
1. [230517 Searching for Needles in a Haystack](papers/2023/230517%20Searching%20for%20Needles%20in%20a%20Haystack.md) #nmt #multilingual
1. [230519 Cross-Lingual Supervision improves Large Language Models Pre-training](papers/2023/230519%20Cross-Lingual%20Supervision%20improves%20Large%20Language%20Models%20Pre-training.md) #nmt #multilingual
1. [230521 A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models](papers/2023/230521%20A%20PhD%20Student%27s%20Perspective%20on%20Research%20in%20NLP%20in%20the%20Era%20of%20Very%20Large%20Language%20Models.md) #nlp
1. [230522 How Language Model Hallucinations Can Snowball](papers/2023/230522%20How%20Language%20Model%20Hallucinations%20Can%20Snowball.md) #alignment
1. [230522 To Repeat or Not To Repeat](papers/2023/230522%20To%20Repeat%20or%20Not%20To%20Repeat.md)
1. [230523 Aligning Large Language Models through Synthetic Feedback](papers/2023/230523%20Aligning%20Large%20Language%20Models%20through%20Synthetic%20Feedback.md) #alignment
1. [230523 Goat](papers/2023/230523%20Goat.md)
1. [230523 QLoRA](papers/2023/230523%20QLoRA.md) #quantization #alignment #finetuning
1. [230525 Scaling Data-Constrained Language Models](papers/2023/230525%20Scaling%20Data-Constrained%20Language%20Models.md) #scaling
1. [230526 Large Language Models as Tool Makers](papers/2023/230526%20Large%20Language%20Models%20as%20Tool%20Makers.md) #alignment
1. [230614 WizardCoder](papers/2023/230614%20WizardCoder.md)
1. [230615 Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models](papers/2023/230615%20Exploring%20the%20MIT%20Mathematics%20and%20EECS%20Curriculum%20Using%20Large%20Language%20Models.md)
1. [230615 Inverse Scaling](papers/2023/230615%20Inverse%20Scaling.md)
1. [230616 Demystifying GPT Self-Repair for Code Generation](papers/2023/230616%20Demystifying%20GPT%20Self-Repair%20for%20Code%20Generation.md)
1. [230616 Full Parameter Fine-tuning for Large Language Models with Limited Resources](papers/2023/230616%20Full%20Parameter%20Fine-tuning%20for%20Large%20Language%20Models%20with%20Limited%20Resources.md) #finetuning
1. [230619 BayLing](papers/2023/230619%20BayLing.md) #alignment
1. [230620 Learning to Generate Better Than Your LLM](papers/2023/230620%20Learning%20to%20Generate%20Better%20Than%20Your%20LLM.md) #alignment
1. [230620 Textbooks Are All You Need](papers/2023/230620%20Textbooks%20Are%20All%20You%20Need.md)
1. [230621 Deep Language Networks](papers/2023/230621%20Deep%20Language%20Networks.md)
1. [230622 AudioPaLM](papers/2023/230622%20AudioPaLM.md) #audio #speech
1. [230623 Bring Your Own Data! Self-Supervised Evaluation for Large Language Models](papers/2023/230623%20Bring%20Your%20Own%20Data%21%20Self-Supervised%20Evaluation%20for%20Large%20Language%20Models.md) #evaluation
1. [230623 GKD](papers/2023/230623%20GKD.md) #distillation
1. [230624 Beyond Scale](papers/2023/230624%20Beyond%20Scale.md) #dataset
1. [230628 Towards Language Models That Can See](papers/2023/230628%20Towards%20Language%20Models%20That%20Can%20See.md) #multimodal #vision-language
1. [230629 Benchmarking Large Language Model Capabilities for Conditional Generation](papers/2023/230629%20Benchmarking%20Large%20Language%20Model%20Capabilities%20for%20Conditional%20Generation.md) #evaluation
1. [230630 Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting](papers/2023/230630%20Large%20Language%20Models%20are%20Effective%20Text%20Rankers%20with%20Pairwise%20Ranking%20Prompting.md)
1. [230705 Reasoning or Reciting](papers/2023/230705%20Reasoning%20or%20Reciting.md) #evaluation
1. [230706 Style Over Substance](papers/2023/230706%20Style%20Over%20Substance.md) #evaluation
1. [230713 In-context Autoencoder for Context Compression in a Large Language Model](papers/2023/230713%20In-context%20Autoencoder%20for%20Context%20Compression%20in%20a%20Large%20Language%20Model.md)
1. [230717 GEAR](papers/2023/230717%20GEAR.md) #alignment
1. [230803 Scaling Relationship on Learning Mathematical Reasoning with Large Language Models](papers/2023/230803%20Scaling%20Relationship%20on%20Learning%20Mathematical%20Reasoning%20with%20Large%20Language%20Models.md)
## lm
1. [210524 StructuralLM](papers/2021/210524%20StructuralLM.md) #layout
1. [210524 True Few-Shot Learning with Language Models](papers/2021/210524%20True%20Few-Shot%20Learning%20with%20Language%20Models.md) #few_shot
1. [210528 ByT5](papers/2021/210528%20ByT5.md)
1. [210617 LoRA](papers/2021/210617%20LoRA.md) #adapter #finetuning
1. [210623 Charformer](papers/2021/210623%20Charformer.md) #tokenizer
1. [210714 Deduplicating Training Data Makes Language Models Better](papers/2021/210714%20Deduplicating%20Training%20Data%20Makes%20Language%20Models%20Better.md) #corpus
1. [210714 HTLM](papers/2021/210714%20HTLM.md)
1. [210811 DEMix Layers](papers/2021/210811%20DEMix%20Layers.md) #mixture_of_experts
1. [210813 Curriculum Learning](papers/2021/210813%20Curriculum%20Learning.md) #curriculum
1. [210816 On the Opportunities and Risks of Foundation Models](papers/2021/210816%20On%20the%20Opportunities%20and%20Risks%20of%20Foundation%20Models.md)
1. [210902 Do Prompt-Based Models Really Understand the Meaning of their Prompts](papers/2021/210902%20Do%20Prompt-Based%20Models%20Really%20Understand%20the%20Meaning%20of%20their%20Prompts.md) #prompt
1. [210903 Finetuned Language Models Are Zero-Shot Learners](papers/2021/210903%20Finetuned%20Language%20Models%20Are%20Zero-Shot%20Learners.md) #zero-shot
1. [210908 A Recipe For Arbitrary Text Style Transfer with Large Language Models](papers/2021/210908%20A%20Recipe%20For%20Arbitrary%20Text%20Style%20Transfer%20with%20Large%20Language%20Models.md) #prompt
1. [211011 Unsupervised Neural Machine Translation with Generative Language Models Only](papers/2021/211011%20Unsupervised%20Neural%20Machine%20Translation%20with%20Generative%20Language%20Models%20Only.md) #unsupervised_nmt
1. [211015 Multitask Prompted Training Enables Zero-Shot Task Generalization](papers/2021/211015%20Multitask%20Prompted%20Training%20Enables%20Zero-Shot%20Task%20Generalization.md) #zero-shot
1. [211016 Invariant Language Modeling](papers/2021/211016%20Invariant%20Language%20Modeling.md) #irm
1. [211016 MarkupLM](papers/2021/211016%20MarkupLM.md) #layout
1. [211016 Sharpness-Aware Minimization Improves Language Model Generalization](papers/2021/211016%20Sharpness-Aware%20Minimization%20Improves%20Language%20Model%20Generalization.md) #regularization
1. [211020 Shaking the foundations](papers/2021/211020%20Shaking%20the%20foundations.md) #causality
1. [211027 Training Verifiers to Solve Math Word Problems](papers/2021/211027%20Training%20Verifiers%20to%20Solve%20Math%20Word%20Problems.md)
1. [211213 GLaM](papers/2021/211213%20GLaM.md) #moe
1. [211220 Efficient Large Scale Language Modeling with Mixtures of Experts](papers/2021/211220%20Efficient%20Large%20Scale%20Language%20Modeling%20with%20Mixtures%20of%20Experts.md) #mixture_of_experts
1. [220210 Red Teaming Language Models with Language Models](papers/2022/220210%20Red%20Teaming%20Language%20Models%20with%20Language%20Models.md) #safety
1. [220213 A Contrastive Framework for Neural Text Generation](papers/2022/220213%20A%20Contrastive%20Framework%20for%20Neural%20Text%20Generation.md) #decoding
1. [220215 General-purpose, long-context autoregressive modeling with Perceiver AR](papers/2022/220215%20General-purpose%2C%20long-context%20autoregressive%20modeling%20with%20Perceiver%20AR.md) #efficient_attention #autoregressive_model
1. [220314 Efficient Language Modeling with Sparse all-MLP](papers/2022/220314%20Efficient%20Language%20Modeling%20with%20Sparse%20all-MLP.md) #mlp
1. [220329 Training Compute-Optimal Large Language Models](papers/2022/220329%20Training%20Compute-Optimal%20Large%20Language%20Models.md)
1. [220413 METRO](papers/2022/220413%20METRO.md)
1. [220414 GPT-NeoX-20B](papers/2022/220414%20GPT-NeoX-20B.md)
1. [220502 OPT](papers/2022/220502%20OPT.md)
1. [220524 On the Role of Bidirectionality in Language Model Pre-Training](papers/2022/220524%20On%20the%20Role%20of%20Bidirectionality%20in%20Language%20Model%20Pre-Training.md) #bert
1. [220728 Efficient Training of Language Models to Fill in the Middle](papers/2022/220728%20Efficient%20Training%20of%20Language%20Models%20to%20Fill%20in%20the%20Middle.md) #mlm
1. [220805 Branch-Train-Merge](papers/2022/220805%20Branch-Train-Merge.md) #product_of_experts #ensemble
1. [220805 Few-shot Learning with Retrieval Augmented Language Model](papers/2022/220805%20Few-shot%20Learning%20with%20Retrieval%20Augmented%20Language%20Model.md) #retrieval #few_shot
1. [221110 The CRINGE Loss](papers/2022/221110%20The%20CRINGE%20Loss.md) #safety
1. [230131 In-Context Retrieval-Augmented Language Models](papers/2023/230131%20In-Context%20Retrieval-Augmented%20Language%20Models.md) #retrieval
1. [230503 CodeGen2](papers/2023/230503%20CodeGen2.md)
1. [230526 MixCE](papers/2023/230526%20MixCE.md)
1. [230612 Gradient Ascent Post-training Enhances Language Model Generalization](papers/2023/230612%20Gradient%20Ascent%20Post-training%20Enhances%20Language%20Model%20Generalization.md)
## local attention
1. [210323 Scaling Local Self-Attention for Parameter Efficient Visual Backbones](papers/2021/210323%20Scaling%20Local%20Self-Attention%20for%20Parameter%20Efficient%20Visual%20Backbones.md)
## loss
1. [200712 It Is Likely That Your Loss Should be a Likelihood](papers/2020/200712%20It%20Is%20Likely%20That%20Your%20Loss%20Should%20be%20a%20Likelihood.md)
## loss surface
1. [210225 Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling](papers/2021/210225%20Loss%20Surface%20Simplexes%20for%20Mode%20Connecting%20Volumes%20and%20Fast%20Ensembling.md)
## matting
1. [200401 Background Matting](papers/2020/200401%20Background%20Matting.md)
1. [201123 Is a Green Screen Really Necessary for Real-Time Portrait Matting](papers/2020/201123%20Is%20a%20Green%20Screen%20Really%20Necessary%20for%20Real-Time%20Portrait%20Matting.md)
## memory
1. [200206 Product Kanerva Machines](papers/2020/200206%20Product%20Kanerva%20Machines.md)
## meta learning
1. [200221 Learning to Continually Learn](papers/2020/200221%20Learning%20to%20Continually%20Learn.md) #continual_learning
1. [200312 Online Fast Adaptation and Knowledge Accumulation](papers/2020/200312%20Online%20Fast%20Adaptation%20and%20Knowledge%20Accumulation.md)
1. [200401 Editable Neural Networks](papers/2020/200401%20Editable%20Neural%20Networks.md)
1. [200706 Meta-Learning Symmetries by Reparameterization](papers/2020/200706%20Meta-Learning%20Symmetries%20by%20Reparameterization.md) #group_equivariance
## metric
1. [211025 The Efficiency Misnomer](papers/2021/211025%20The%20Efficiency%20Misnomer.md)
1. [230307 Is ChatGPT a Good NLG Evaluator](papers/2023/230307%20Is%20ChatGPT%20a%20Good%20NLG%20Evaluator.md)
## metric learning
1. [200319 A unifying mutual information view of metric learning](papers/2020/200319%20A%20unifying%20mutual%20information%20view%20of%20metric%20learning.md)
## mixture of experts
1. [220202 Unified Scaling Laws for Routed Language Models](papers/2022/220202%20Unified%20Scaling%20Laws%20for%20Routed%20Language%20Models.md)
1. [230220 TA-MoE](papers/2023/230220%20TA-MoE.md)
1. [230310 Towards MoE Deployment](papers/2023/230310%20Towards%20MoE%20Deployment.md)
1. [230311 A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training](papers/2023/230311%20A%20Novel%20Tensor-Expert%20Hybrid%20Parallelism%20Approach%20to%20Scale%20Mixture-of-Experts%20Training.md)
1. [230324 Scaling Expert Language Models with Unsupervised Domain Discovery](papers/2023/230324%20Scaling%20Expert%20Language%20Models%20with%20Unsupervised%20Domain%20Discovery.md)
1. [230524 Mixture-of-Experts Meets Instruction Tuning](papers/2023/230524%20Mixture-of-Experts%20Meets%20Instruction%20Tuning.md)
## mixup
1. [201220 ResizeMix](papers/2020/201220%20ResizeMix.md)
1. [211228 LINDA](papers/2021/211228%20LINDA.md) #interpolation
## mlm
1. [200424 Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order](papers/2020/200424%20Probabilistically%20Masked%20Language%20Model%20Capable%20of%20Autoregressive%20Generation%20in%20Arbitrary%20Word%20Order.md) #language_generation
1. [210502 Larger-Scale Transformers for Multilingual Masked Language Modeling](papers/2021/210502%20Larger-Scale%20Transformers%20for%20Multilingual%20Masked%20Language%20Modeling.md) #multilingual #scale
1. [220216 Should You Mask 15% in Masked Language Modeling](papers/2022/220216%20Should%20You%20Mask%2015%25%20in%20Masked%20Language%20Modeling.md)
1. [220715 Position Prediction as an Effective Pretraining Strategy](papers/2022/220715%20Position%20Prediction%20as%20an%20Effective%20Pretraining%20Strategy.md) #unsupervised_training
1. [220929 Bidirectional Language Models Are Also Few-shot Learners](papers/2022/220929%20Bidirectional%20Language%20Models%20Are%20Also%20Few-shot%20Learners.md) #in_context_learning
1. [221006 XDoc](papers/2022/221006%20XDoc.md) #layoutlm
1. [221114 EVA](papers/2022/221114%20EVA.md) #clip
1. [230204 Representation Deficiency in Masked Language Modeling](papers/2023/230204%20Representation%20Deficiency%20in%20Masked%20Language%20Modeling.md)
## mlops
1. [230203 PyGlove](papers/2023/230203%20PyGlove.md)
## moe
1. [230802 From Sparse to Soft Mixtures of Experts](papers/2023/230802%20From%20Sparse%20to%20Soft%20Mixtures%20of%20Experts.md)
## multilingual
1. [200207 A Multilingual View of Unsupervised Machine Translation](papers/2020/200207%20A%20Multilingual%20View%20of%20Unsupervised%20Machine%20Translation.md) #nmt
1. [211015 Breaking Down Multilingual Machine Translation](papers/2021/211015%20Breaking%20Down%20Multilingual%20Machine%20Translation.md) #nmt
1. [220512 Lifting the Curse of Multilinguality by Pre-training Modular Transformers](papers/2022/220512%20Lifting%20the%20Curse%20of%20Multilinguality%20by%20Pre-training%20Modular%20Transformers.md) #adapter #mixture_of_experts
1. [230219 Scaling Laws for Multilingual Neural Machine Translation](papers/2023/230219%20Scaling%20Laws%20for%20Multilingual%20Neural%20Machine%20Translation.md) #nmt #scaling
1. [230406 On the Pareto Front of Multilingual Neural Machine Translation](papers/2023/230406%20On%20the%20Pareto%20Front%20of%20Multilingual%20Neural%20Machine%20Translation.md) #multitask #scaling
1. [230611 Language Versatilists vs. Specialists](papers/2023/230611%20Language%20Versatilists%20vs.%20Specialists.md)
## multimodal
1. [200401 Pixel-BERT](papers/2020/200401%20Pixel-BERT.md)
1. [200513 INFOTABS](papers/2020/200513%20INFOTABS.md)
1. [200514 Behind the Scene](papers/2020/200514%20Behind%20the%20Scene.md)
1. [201130 Multimodal Pretraining Unmasked](papers/2020/201130%20Multimodal%20Pretraining%20Unmasked.md)
1. [210928 VideoCLIP](papers/2021/210928%20VideoCLIP.md) #video_transformer #retrieval
1. [211103 An Empirical Study of Training End-to-End Vision-and-Language Transformers](papers/2021/211103%20An%20Empirical%20Study%20of%20Training%20End-to-End%20Vision-and-Language%20Transformers.md) #vision-language
1. [220512 A Generalist Agent](papers/2022/220512%20A%20Generalist%20Agent.md) #reinforcement_learning
1. [220527 GIT](papers/2022/220527%20GIT.md)
1. [230110 Scaling Laws for Generative Mixed-Modal Language Models](papers/2023/230110%20Scaling%20Laws%20for%20Generative%20Mixed-Modal%20Language%20Models.md)
1. [230123 Zorro](papers/2023/230123%20Zorro.md) #video #audio
1. [230201 mPLUG-2](papers/2023/230201%20mPLUG-2.md)
1. [230202 Multimodal Chain-of-Thought Reasoning in Language Models](papers/2023/230202%20Multimodal%20Chain-of-Thought%20Reasoning%20in%20Language%20Models.md) #vision-language
1. [230304 Prismer](papers/2023/230304%20Prismer.md) #vision-language
1. [230308 Visual ChatGPT](papers/2023/230308%20Visual%20ChatGPT.md) #chatgpt
1. [230507 X-LLM](papers/2023/230507%20X-LLM.md)
1. [230511 Musketeer (All for One, and One for All)](papers/2023/230511%20Musketeer%20%28All%20for%20One%2C%20and%20One%20for%20All%29.md) #vision-language #multitask
1. [230513 On the Hidden Mystery of OCR in Large Multimodal Models](papers/2023/230513%20On%20the%20Hidden%20Mystery%20of%20OCR%20in%20Large%20Multimodal%20Models.md) #vision-language
1. [230529 PaLI-X](papers/2023/230529%20PaLI-X.md) #vision-language
1. [230613 Image Captioners Are Scalable Vision Learners Too](papers/2023/230613%20Image%20Captioners%20Are%20Scalable%20Vision%20Learners%20Too.md) #vision-language
1. [230626 Kosmos-2](papers/2023/230626%20Kosmos-2.md) #vision-language
## multimodal generation
1. [211122 L-Verse](papers/2021/211122%20L-Verse.md)
1. [211124 NÜWA](papers/2021/211124%20N%C3%9CWA.md)
## multitask
1. [200508 Transforming task representations to perform novel tasks](papers/2020/200508%20Transforming%20task%20representations%20to%20perform%20novel%20tasks.md) #continual_learning
1. [200625 MTAdam](papers/2020/200625%20MTAdam.md)
1. [210825 Multi-Task Self-Training for Learning General Representations](papers/2021/210825%20Multi-Task%20Self-Training%20for%20Learning%20General%20Representations.md)
1. [220520 UViM](papers/2022/220520%20UViM.md)
1. [230207 Exploring the Benefits of Training Expert Language Models over Instruction Tuning](papers/2023/230207%20Exploring%20the%20Benefits%20of%20Training%20Expert%20Language%20Models%20over%20Instruction%20Tuning.md) #instruct
1. [230705 Flacuna](papers/2023/230705%20Flacuna.md)
## nas
1. [200324 BigNAS](papers/2020/200324%20BigNAS.md)
1. [200326 Are Labels Necessary for Neural Architecture Search](papers/2020/200326%20Are%20Labels%20Necessary%20for%20Neural%20Architecture%20Search.md) #unsupervised_training
1. [200406 Network Adjustment](papers/2020/200406%20Network%20Adjustment.md)
1. [200412 FBNetV2](papers/2020/200412%20FBNetV2.md)
1. [200428 Angle-based Search Space Shrinking for Neural Architecture Search](papers/2020/200428%20Angle-based%20Search%20Space%20Shrinking%20for%20Neural%20Architecture%20Search.md)
1. [200506 Local Search is State of the Art for Neural Architecture Search](papers/2020/200506%20Local%20Search%20is%20State%20of%20the%20Art%20for%20Neural%20Architecture%20Search.md)
1. [200507 Noisy Differentiable Architecture Search](papers/2020/200507%20Noisy%20Differentiable%20Architecture%20Search.md)
1. [200602 FBNetV3](papers/2020/200602%20FBNetV3.md) #hyperparameter #training #swa
1. [200720 NSGANetV2](papers/2020/200720%20NSGANetV2.md)
1. [220831 Efficient Sparsely Activated Transformers](papers/2022/220831%20Efficient%20Sparsely%20Activated%20Transformers.md) #moe
## nerf
1. [201014 NeRF++](papers/2020/201014%20NeRF%2B%2B.md)
1. [201125 Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes](papers/2020/201125%20Neural%20Scene%20Flow%20Fields%20for%20Space-Time%20View%20Synthesis%20of%20Dynamic%20Scenes.md)
1. [201127 D-NeRF](papers/2020/201127%20D-NeRF.md)
1. [201203 Learned Initializations for Optimizing Coordinate-Based Neural](papers/2020/201203%20Learned%20Initializations%20for%20Optimizing%20Coordinate-Based%20Neural.md) #implicit_representation
1. [201203 pixelNeRF](papers/2020/201203%20pixelNeRF.md)
1. [201215 Object-Centric Neural Scene Rendering](papers/2020/201215%20Object-Centric%20Neural%20Scene%20Rendering.md)
1. [210225 IBRNet](papers/2021/210225%20IBRNet.md)
1. [210318 FastNeRF](papers/2021/210318%20FastNeRF.md)
1. [210318 GNeRF](papers/2021/210318%20GNeRF.md)
1. [210318 MVSNeRF](papers/2021/210318%20MVSNeRF.md)
1. [210318 NeMI](papers/2021/210318%20NeMI.md)
1. [210324 Mip-NeRF](papers/2021/210324%20Mip-NeRF.md)
1. [210325 KiloNeRF](papers/2021/210325%20KiloNeRF.md)
1. [210325 PlenOctrees for Real-time Rendering of Neural Radiance Fields](papers/2021/210325%20PlenOctrees%20for%20Real-time%20Rendering%20of%20Neural%20Radiance%20Fields.md)
1. [210706 Depth-supervised NeRF](papers/2021/210706%20Depth-supervised%20NeRF.md)
1. [210809 NeuralMVS](papers/2021/210809%20NeuralMVS.md)
1. [211019 CIPS-3D](papers/2021/211019%20CIPS-3D.md) #stylegan
1. [211129 Deblur-NeRF](papers/2021/211129%20Deblur-NeRF.md)
1. [211129 HDR-NeRF](papers/2021/211129%20HDR-NeRF.md)
1. [211129 Urban Radiance Fields](papers/2021/211129%20Urban%20Radiance%20Fields.md)
1. [211210 CityNeRF](papers/2021/211210%20CityNeRF.md)
1. [221010 NerfAcc](papers/2022/221010%20NerfAcc.md)
1. [230204 AV-NeRF](papers/2023/230204%20AV-NeRF.md)
1. [230208 Nerfstudio](papers/2023/230208%20Nerfstudio.md)
1. [230413 Zip-NeRF](papers/2023/230413%20Zip-NeRF.md) #antialiasing
1. [230503 3D Gaussian Splatting for Real-Time Radiance Field Rendering](papers/2023/230503%203D%20Gaussian%20Splatting%20for%20Real-Time%20Radiance%20Field%20Rendering.md) #neural_rendering
## neural computer
1. [200720 Distributed Associative Memory Network with Memory Refreshing Loss](papers/2020/200720%20Distributed%20Associative%20Memory%20Network%20with%20Memory%20Refreshing%20Loss.md)
1. [211130 Show Your Work](papers/2021/211130%20Show%20Your%20Work.md)
## neural ode
1. [200207 How to train your neural ODE](papers/2020/200207%20How%20to%20train%20your%20neural%20ODE.md)
1. [200520 Neural Controlled Differential Equations](papers/2020/200520%20Neural%20Controlled%20Differential%20Equations.md)
1. [200708 Learning Differential Equations that are Easy to Solve](papers/2020/200708%20Learning%20Differential%20Equations%20that%20are%20Easy%20to%20Solve.md)
## neural rendering
1. [200226 Learning to Shadow Hand-drawn Sketches](papers/2020/200226%20Learning%20to%20Shadow%20Hand-drawn%20Sketches.md)
1. [200427 Neural Hair Rendering](papers/2020/200427%20Neural%20Hair%20Rendering.md)
1. [200506 CONFIG](papers/2020/200506%20CONFIG.md)
1. [201116 Stylized Neural Painting](papers/2020/201116%20Stylized%20Neural%20Painting.md)
1. [201119 Creative Sketch Generation](papers/2020/201119%20Creative%20Sketch%20Generation.md)
1. [201130 Animating Pictures with Eulerian Motion Fields](papers/2020/201130%20Animating%20Pictures%20with%20Eulerian%20Motion%20Fields.md) #single_image
1. [210319 Paint by Word](papers/2021/210319%20Paint%20by%20Word.md)
1. [210512 Enhancing Photorealism Enhancement](papers/2021/210512%20Enhancing%20Photorealism%20Enhancement.md)
1. [211013 ADOP](papers/2021/211013%20ADOP.md)
1. [220728 Neural Strands](papers/2022/220728%20Neural%20Strands.md)
## nlp
1. [200518 (Re)construing Meaning in NLP](papers/2020/200518%20%28Re%29construing%20Meaning%20in%20NLP.md)
1. [200715 Towards Debiasing Sentence Representations](papers/2020/200715%20Towards%20Debiasing%20Sentence%20Representations.md) #bias
1. [220826 What Do NLP Researchers Believe](papers/2022/220826%20What%20Do%20NLP%20Researchers%20Believe.md)
## nmt
1. [200427 Lexically Constrained Neural Machine Translation with Levenshtein Transformer](papers/2020/200427%20Lexically%20Constrained%20Neural%20Machine%20Translation%20with%20Levenshtein%20Transformer.md)
1. [200710 Learn to Use Future Information in Simultaneous Translation](papers/2020/200710%20Learn%20to%20Use%20Future%20Information%20in%20Simultaneous%20Translation.md) #simultaneous_translation
1. [201224 Why Neural Machine Translation Prefers Empty Outputs](papers/2020/201224%20Why%20Neural%20Machine%20Translation%20Prefers%20Empty%20Outputs.md) #hallucination
1. [230120 Is ChatGPT A Good Translator](papers/2023/230120%20Is%20ChatGPT%20A%20Good%20Translator.md) #chatgpt
1. [230228 Large Language Models Are State-of-the-Art Evaluators of Translation Quality](papers/2023/230228%20Large%20Language%20Models%20Are%20State-of-the-Art%20Evaluators%20of%20Translation%20Quality.md) #metric
## non autoregressive
1. [200403 Aligned Cross Entropy for Non-Autoregressive Machine Translation](papers/2020/200403%20Aligned%20Cross%20Entropy%20for%20Non-Autoregressive%20Machine%20Translation.md)
1. [200415 Non-Autoregressive Machine Translation with Latent Alignments](papers/2020/200415%20Non-Autoregressive%20Machine%20Translation%20with%20Latent%20Alignments.md) #nmt #ctc
1. [200422 A Study of Non-autoregressive Model for Sequence Generation](papers/2020/200422%20A%20Study%20of%20Non-autoregressive%20Model%20for%20Sequence%20Generation.md)
1. [201022 Parallel Tacotron](papers/2020/201022%20Parallel%20Tacotron.md) #vae
1. [201025 Improved Mask-CTC for Non-Autoregressive End-to-End ASR](papers/2020/201025%20Improved%20Mask-CTC%20for%20Non-Autoregressive%20End-to-End%20ASR.md) #ctc
1. [201125 FBWave](papers/2020/201125%20FBWave.md) #vocoder #lightweight
1. [201207 EfficientTTS](papers/2020/201207%20EfficientTTS.md) #tts
1. [211213 Step-unrolled Denoising Autoencoders for Text Generation](papers/2021/211213%20Step-unrolled%20Denoising%20Autoencoders%20for%20Text%20Generation.md)
1. [220520 Lossless Acceleration for Seq2seq Generation with Aggressive Decoding](papers/2022/220520%20Lossless%20Acceleration%20for%20Seq2seq%20Generation%20with%20Aggressive%20Decoding.md) #efficiency
1. [220909 Improved Masked Image Generation with Token-Critic](papers/2022/220909%20Improved%20Masked%20Image%20Generation%20with%20Token-Critic.md) #mlm
1. [230301 StraIT](papers/2023/230301%20StraIT.md) #image_generation #vq
1. [230516 SoundStorm](papers/2023/230516%20SoundStorm.md) #audio_generation
## norm free
1. [200310 ReZero is All You Need](papers/2020/200310%20ReZero%20is%20All%20You%20Need.md) #initialization
## normalization
1. [200122 Group Norm, Weight Standardization](papers/2020/200122%20Group%20Norm%2C%20Weight%20Standardization.md)
1. [200122 Moving Average Batch Normalization](papers/2020/200122%20Moving%20Average%20Batch%20Normalization.md)
1. [200122 StyleGAN 2](papers/2020/200122%20StyleGAN%202.md) #GAN
1. [200130 Rethinking Normalization](papers/2020/200130%20Rethinking%20Normalization.md)
1. [200130 Weight Standardization](papers/2020/200130%20Weight%20Standardization.md) #weight
1. [200224 Batch Normalization Biases Residual Blocks Towards the Identity Function](papers/2020/200224%20Batch%20Normalization%20Biases%20Residual%20Blocks%20Towards%20the%20Identity%20Function.md) #optimization #norm_free #initialization
1. [200306 TaskNorm](papers/2020/200306%20TaskNorm.md) #meta_learning
1. [200406 Evolving Normalization-Activation Layers](papers/2020/200406%20Evolving%20Normalization-Activation%20Layers.md) #nas #activation
1. [200427 A Batch Normalized Inference Network Keeps the KL Vanishing Away](papers/2020/200427%20A%20Batch%20Normalized%20Inference%20Network%20Keeps%20the%20KL%20Vanishing%20Away.md)
1. [201128 Batch Normalization with Enhanced Linear Transformation](papers/2020/201128%20Batch%20Normalization%20with%20Enhanced%20Linear%20Transformation.md)
1. [211026 Revisiting Batch Normalization](papers/2021/211026%20Revisiting%20Batch%20Normalization.md)
1. [230516 Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation](papers/2023/230516%20Exploring%20the%20Impact%20of%20Layer%20Normalization%20for%20Zero-shot%20Neural%20Machine%20Translation.md)
## object detection
1. [191118 Anchor-Free](papers/2019/191118%20Anchor-Free.md)
1. [191118 CenterMask](papers/2019/191118%20CenterMask.md) #instance_segmentation #backbone #1stage
1. [191121 EfficientDet](papers/2019/191121%20EfficientDet.md)
1. [200103 BlendMask](papers/2020/200103%20BlendMask.md) #instance_segmentation #1stage
1. [200122 SABL](papers/2020/200122%20SABL.md)
1. [200129 AP Loss](papers/2020/200129%20AP%20Loss.md) #loss
1. [200129 Backbone Reallocation for Detection](papers/2020/200129%20Backbone%20Reallocation%20for%20Detection.md) #backbone #nas
1. [200129 Dense RepPoints](papers/2020/200129%20Dense%20RepPoints.md)
1. [200129 DetNAS](papers/2020/200129%20DetNAS.md) #nas #backbone
1. [200129 IOU-aware single stage detector](papers/2020/200129%20IOU-aware%20single%20stage%20detector.md) #1stage
1. [200130 ATSS](papers/2020/200130%20ATSS.md) #anchor #retinanet #fcos
1. [200130 AutoAugment](papers/2020/200130%20AutoAugment.md) #augmentation #search
1. [200130 EfficientDet](papers/2020/200130%20EfficientDet.md) #fpn
1. [200130 Keypoint Triplet](papers/2020/200130%20Keypoint%20Triplet.md) #keypoint
1. [200130 Learning from Noisy Anchors](papers/2020/200130%20Learning%20from%20Noisy%20Anchors.md)
1. [200130 Multiple Anchor Learning](papers/2020/200130%20Multiple%20Anchor%20Learning.md) #anchor
1. [200130 Objects as Points](papers/2020/200130%20Objects%20as%20Points.md) #keypoint
1. [200130 Soft Anchor-Point](papers/2020/200130%20Soft%20Anchor-Point.md) #anchor
1. [200211 Object Detection as a Positive-Unlabeled Problem](papers/2020/200211%20Object%20Detection%20as%20a%20Positive-Unlabeled%20Problem.md) #positive_unlabled #dataset
1. [200212 Solving Missing-Annotation Object Detection with Background](papers/2020/200212%20Solving%20Missing-Annotation%20Object%20Detection%20with%20Background.md) #dataset #noise
1. [200218 Universal-RCNN](papers/2020/200218%20Universal-RCNN.md) #multi_dataset #graph
1. [200316 Frustratingly Simple Few-Shot Object Detection](papers/2020/200316%20Frustratingly%20Simple%20Few-Shot%20Object%20Detection.md) #few_shot
1. [200317 Revisiting the Sibling Head in Object Detector](papers/2020/200317%20Revisiting%20the%20Sibling%20Head%20in%20Object%20Detector.md)
1. [200319 Revisiting the Sibling Head in Object Detector](papers/2020/200319%20Revisiting%20the%20Sibling%20Head%20in%20Object%20Detector.md) #review
1. [200320 CentripetalNet](papers/2020/200320%20CentripetalNet.md) #keypoint
1. [200413 Dynamic R-CNN](papers/2020/200413%20Dynamic%20R-CNN.md)
1. [200423 YOLOv4](papers/2020/200423%20YOLOv4.md)
1. [200511 Scope Head for Accurate Localization in Object Detection](papers/2020/200511%20Scope%20Head%20for%20Accurate%20Localization%20in%20Object%20Detection.md)
1. [200526 End-to-End Object Detection with Transformers](papers/2020/200526%20End-to-End%20Object%20Detection%20with%20Transformers.md) #end2end #matching
1. [200603 DetectoRS](papers/2020/200603%20DetectoRS.md)
1. [200611 Rethinking Pre-training and Self-training](papers/2020/200611%20Rethinking%20Pre-training%20and%20Self-training.md) #semi_supervised_learning #transfer
1. [200706 LabelEnc](papers/2020/200706%20LabelEnc.md) #distillation
1. [200707 AutoAssign](papers/2020/200707%20AutoAssign.md) #anchor_free
1. [200714 AQD](papers/2020/200714%20AQD.md) #quantization
1. [200715 Probabilistic Anchor Assignment with IoU Prediction for Object Detection](papers/2020/200715%20Probabilistic%20Anchor%20Assignment%20with%20IoU%20Prediction%20for%20Object%20Detection.md) #anchor #1stage
1. [200716 RepPoints V2](papers/2020/200716%20RepPoints%20V2.md) #1stage #anchor_free
1. [200723 PP-YOLO](papers/2020/200723%20PP-YOLO.md) #tuning
1. [200723 The Devil is in Classification](papers/2020/200723%20The%20Devil%20is%20in%20Classification.md) #longtail
1. [200727 Corner Proposal Network for Anchor-free, Two-stage Object Detection](papers/2020/200727%20Corner%20Proposal%20Network%20for%20Anchor-free%2C%20Two-stage%20Object%20Detection.md) #anchor_free #2stage
1. [201116 Scaled-YOLOv4](papers/2020/201116%20Scaled-YOLOv4.md)
1. [201118 End-to-End Object Detection with Adaptive Clustering Transformer](papers/2020/201118%20End-to-End%20Object%20Detection%20with%20Adaptive%20Clustering%20Transformer.md) #detr #end2end #efficiency
1. [201121 Rethinking Transformer-based Set Prediction for Object Detection](papers/2020/201121%20Rethinking%20Transformer-based%20Set%20Prediction%20for%20Object%20Detection.md) #detr #end2end #efficiency
1. [201124 Sparse R-CNN](papers/2020/201124%20Sparse%20R-CNN.md)
1. [201128 Class-agnostic Object Detection](papers/2020/201128%20Class-agnostic%20Object%20Detection.md)
1. [201207 End-to-End Object Detection with Fully Convolutional Network](papers/2020/201207%20End-to-End%20Object%20Detection%20with%20Fully%20Convolutional%20Network.md) #end2end
1. [201223 SWA Object Detection](papers/2020/201223%20SWA%20Object%20Detection.md) #swa
1. [201227 Towards A Category-extended Object Detector without Relabeling or](papers/2020/201227%20Towards%20A%20Category-extended%20Object%20Detector%20without%20Relabeling%20or.md) #continual_learning
1. [210225 Simple multi-dataset detection](papers/2021/210225%20Simple%20multi-dataset%20detection.md) #multi_dataset
1. [210316 You Only Look One-level Feature](papers/2021/210316%20You%20Only%20Look%20One-level%20Feature.md)
1. [210325 USB](papers/2021/210325%20USB.md) #dataset
1. [210417 TransVG](papers/2021/210417%20TransVG.md) #visual_grounding
1. [210420 PP-YOLOv2](papers/2021/210420%20PP-YOLOv2.md) #yolo
1. [210426 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding](papers/2021/210426%20MDETR%20--%20Modulated%20Detection%20for%20End-to-End%20Multi-Modal%20Understanding.md) #detr #visual_grounding
1. [210601 You Only Look at One Sequence](papers/2021/210601%20You%20Only%20Look%20at%20One%20Sequence.md) #vit
1. [210615 Dynamic Head](papers/2021/210615%20Dynamic%20Head.md) #attention
1. [210718 YOLOX](papers/2021/210718%20YOLOX.md) #yolo
1. [210728 SimROD](papers/2021/210728%20SimROD.md) #domain_adaptation #self_supervised
1. [210922 Pix2seq](papers/2021/210922%20Pix2seq.md) #detr #autoregressive_model
1. [210929 Localizing Objects with Self-Supervised Transformers and no Labels](papers/2021/210929%20Localizing%20Objects%20with%20Self-Supervised%20Transformers%20and%20no%20Labels.md) #self_supervised #self_supervised_discovery #salient_object_detection
1. [211101 PP-PicoDet](papers/2021/211101%20PP-PicoDet.md) #lightweight
1. [211122 Benchmarking Detection Transfer Learning with Vision Transformers](papers/2021/211122%20Benchmarking%20Detection%20Transfer%20Learning%20with%20Vision%20Transformers.md) #unsupervised_training #vit
1. [211123 Dynamic DETR](papers/2021/211123%20Dynamic%20DETR.md)
1. [211129 Sparse DETR](papers/2021/211129%20Sparse%20DETR.md) #detr
1. [220107 Detecting Twenty-thousand Classes using Image-level Supervision](papers/2022/220107%20Detecting%20Twenty-thousand%20Classes%20using%20Image-level%20Supervision.md) #weak_supervision
1. [220330 Exploring Plain Vision Transformer Backbones for Object Detection](papers/2022/220330%20Exploring%20Plain%20Vision%20Transformer%20Backbones%20for%20Object%20Detection.md) #vit #instance_segmentation
1. [220615 A Unified Sequence Interface for Vision Tasks](papers/2022/220615%20A%20Unified%20Sequence%20Interface%20for%20Vision%20Tasks.md) #multitask #instance_segmentation #keypoint
## ocr
1. [191231 LayoutLM](papers/2019/191231%20LayoutLM.md)
1. [200217 Text Perceptron](papers/2020/200217%20Text%20Perceptron.md)
1. [210415 Rethinking Text Line Recognition Models](papers/2021/210415%20Rethinking%20Text%20Line%20Recognition%20Models.md)
1. [220107 Data-Efficient Information Extraction from Form-Like Documents](papers/2022/220107%20Data-Efficient%20Information%20Extraction%20from%20Form-Like%20Documents.md) #information_extraction
1. [220328 Towards End-to-End Unified Scene Text Detection and Layout Analysis](papers/2022/220328%20Towards%20End-to-End%20Unified%20Scene%20Text%20Detection%20and%20Layout%20Analysis.md)
1. [220416 Pushing the Performance Limit of Scene Text Recognizer without Human Annotation](papers/2022/220416%20Pushing%20the%20Performance%20Limit%20of%20Scene%20Text%20Recognizer%20without%20Human%20Annotation.md)
## open set recognition
1. [211012 Open-Set Recognition](papers/2021/211012%20Open-Set%20Recognition.md)
## optimization
1. [200221 The Break-Even Point on Optimization Trajectories of Deep Neural Networks](papers/2020/200221%20The%20Break-Even%20Point%20on%20Optimization%20Trajectories%20of%20Deep%20Neural%20Networks.md) #loss #training
1. [200224 The Early Phase of Neural Network Training](papers/2020/200224%20The%20Early%20Phase%20of%20Neural%20Network%20Training.md)
1. [200227 Using a thousand optimization tasks to learn hyperparameter search strategies](papers/2020/200227%20Using%20a%20thousand%20optimization%20tasks%20to%20learn%20hyperparameter%20search%20strategies.md) #optimizer #hyperparameter
1. [200228 A Self-Tuning Actor-Critic Algorithm](papers/2020/200228%20A%20Self-Tuning%20Actor-Critic%20Algorithm.md) #reinforcement_learning #hyperparameter #meta_learning
1. [200316 Weak and Strong Gradient Directions](papers/2020/200316%20Weak%20and%20Strong%20Gradient%20Directions.md)
1. [200403 Gradient Centralization](papers/2020/200403%20Gradient%20Centralization.md) #training
1. [200508 An Investigation of Why Overparameterization Exacerbates Spurious](papers/2020/200508%20An%20Investigation%20of%20Why%20Overparameterization%20Exacerbates%20Spurious.md) #training
1. [200519 One Size Fits All](papers/2020/200519%20One%20Size%20Fits%20All.md)
## optimizer
1. [200130 LAMB](papers/2020/200130%20LAMB.md) #large_batch
1. [211006 8-bit Optimizers via Block-wise Quantization](papers/2021/211006%208-bit%20Optimizers%20via%20Block-wise%20Quantization.md)
1. [221117 VeLO](papers/2022/221117%20VeLO.md)
1. [230118 Learning-Rate-Free Learning by D-Adaptation](papers/2023/230118%20Learning-Rate-Free%20Learning%20by%20D-Adaptation.md)
1. [230213 Symbolic Discovery of Optimization Algorithms](papers/2023/230213%20Symbolic%20Discovery%20of%20Optimization%20Algorithms.md) #search
1. [230523 Sophia](papers/2023/230523%20Sophia.md)
## oriented object detection
1. [200129 Modulated Loss](papers/2020/200129%20Modulated%20Loss.md)
1. [200129 Oriented Objects as Middle Lines](papers/2020/200129%20Oriented%20Objects%20as%20Middle%20Lines.md)
## out of distribution
1. [200509 Generalizing Outside the Training Set](papers/2020/200509%20Generalizing%20Outside%20the%20Training%20Set.md)
1. [200519 Bridging the Gap Between Training and Inference for Spatio-Temporal Forecasting](papers/2020/200519%20Bridging%20the%20Gap%20Between%20Training%20and%20Inference%20for%20Spatio-Temporal%20Forecasting.md)
## panoptic segmentation
1. [200129 Bridge gap of traininfer Panoptic Segmentation](papers/2020/200129%20Bridge%20gap%20of%20traininfer%20Panoptic%20Segmentation.md)
1. [200130 Panoptic-DeepLab](papers/2020/200130%20Panoptic-DeepLab.md)
1. [200218 Towards Bounding-Box Free Panoptic Segmentation](papers/2020/200218%20Towards%20Bounding-Box%20Free%20Panoptic%20Segmentation.md) #box_free
1. [200404 Pixel Consensus Voting for Panoptic Segmentation](papers/2020/200404%20Pixel%20Consensus%20Voting%20for%20Panoptic%20Segmentation.md)
1. [200421 Panoptic-based Image Synthesis](papers/2020/200421%20Panoptic-based%20Image%20Synthesis.md) #neural_rendering
1. [201123 Scaling Wide Residual Networks for Panoptic Segmentation](papers/2020/201123%20Scaling%20Wide%20Residual%20Networks%20for%20Panoptic%20Segmentation.md) #scale
1. [201201 Fully Convolutional Networks for Panoptic Segmentation](papers/2020/201201%20Fully%20Convolutional%20Networks%20for%20Panoptic%20Segmentation.md) #dynamic_conv
1. [201202 Single-shot Path Integrated Panoptic Segmentation](papers/2020/201202%20Single-shot%20Path%20Integrated%20Panoptic%20Segmentation.md) #dynamic_conv
1. [210910 Panoptic Narrative Grounding](papers/2021/210910%20Panoptic%20Narrative%20Grounding.md) #visual_grounding
## perceptual loss
1. [200206 Image Fine-grained Inpainting](papers/2020/200206%20Image%20Fine-grained%20Inpainting.md) #inpainting
1. [200515 Enhancing Perceptual Loss with Adversarial Feature Matching for Super-Resolution](papers/2020/200515%20Enhancing%20Perceptual%20Loss%20with%20Adversarial%20Feature%20Matching%20for%20Super-Resolution.md)
1. [200626 A Loss Function for Generative Neural Networks Based on Watson's](papers/2020/200626%20A%20Loss%20Function%20for%20Generative%20Neural%20Networks%20Based%20on%20Watson%27s.md)
1. [201223 Focal Frequency Loss for Image Reconstruction and Synthesis](papers/2020/201223%20Focal%20Frequency%20Loss%20for%20Image%20Reconstruction%20and%20Synthesis.md) #loss
## point cloud
1. [220325 Point2Seq](papers/2022/220325%20Point2Seq.md)
## pooling
1. [200325 What Deep CNNs Benefit from Global Covariance Pooling](papers/2020/200325%20What%20Deep%20CNNs%20Benefit%20from%20Global%20Covariance%20Pooling.md)
1. [200330 Strip Pooling](papers/2020/200330%20Strip%20Pooling.md)
## pose
1. [200729 Unselfie](papers/2020/200729%20Unselfie.md) #inpainting
1. [210913 Pose with Style](papers/2021/210913%20Pose%20with%20Style.md)
## positional encoding
1. [200628 Rethinking Positional Encoding in Language Pre-training](papers/2020/200628%20Rethinking%20Positional%20Encoding%20in%20Language%20Pre-training.md)
1. [210408 Modulated Periodic Activations for Generalizable Local Functional](papers/2021/210408%20Modulated%20Periodic%20Activations%20for%20Generalizable%20Local%20Functional.md) #periodic_activation #implicit_representation
1. [210506 ACORN](papers/2021/210506%20ACORN.md) #implicit_representation
1. [210706 Rethinking Positional Encoding](papers/2021/210706%20Rethinking%20Positional%20Encoding.md)
1. [230531 The Impact of Positional Encoding on Length Generalization in Transformers](papers/2023/230531%20The%20Impact%20of%20Positional%20Encoding%20on%20Length%20Generalization%20in%20Transformers.md)
1. [230627 Extending Context Window of Large Language Models via Positional Interpolation](papers/2023/230627%20Extending%20Context%20Window%20of%20Large%20Language%20Models%20via%20Positional%20Interpolation.md)
## practice
1. [210630 Using AntiPatterns to avoid MLOps Mistakes](papers/2021/210630%20Using%20AntiPatterns%20to%20avoid%20MLOps%20Mistakes.md)
## pretraining
1. [190620 XLNet](papers/2019/190620%20XLNet.md) #language_model
1. [190729 RoBERTa](papers/2019/190729%20RoBERTa.md) #language_model
1. [200128 mBART](papers/2020/200128%20mBART.md) #machine_translation #nlp
1. [200129 ImageBERT](papers/2020/200129%20ImageBERT.md) #multimodal
1. [200129 LM Pretraining](papers/2020/200129%20LM%20Pretraining.md) #nlp
1. [200129 oLMpics](papers/2020/200129%20oLMpics.md) #language_model #nlp
1. [200130 ViLBERT](papers/2020/200130%20ViLBERT.md) #multimodal
1. [200210 Pre-training Tasks for Embedding-based Large-scale Retrieval](papers/2020/200210%20Pre-training%20Tasks%20for%20Embedding-based%20Large-scale%20Retrieval.md) #retrieval
1. [200217 Incorporating BERT into Neural Machine Translation](papers/2020/200217%20Incorporating%20BERT%20into%20Neural%20Machine%20Translation.md) #language_model #bert #nmt
1. [200219 CodeBERT](papers/2020/200219%20CodeBERT.md) #bert
1. [200228 UniLMv2](papers/2020/200228%20UniLMv2.md) #language_model
1. [200317 Calibration of Pre-trained Transformers](papers/2020/200317%20Calibration%20of%20Pre-trained%20Transformers.md) #calibration
1. [200405 Unsupervised Domain Clusters in Pretrained Language Models](papers/2020/200405%20Unsupervised%20Domain%20Clusters%20in%20Pretrained%20Language%20Models.md) #domain
1. [200412 Pre-training Text Representations as Meta Learning](papers/2020/200412%20Pre-training%20Text%20Representations%20as%20Meta%20Learning.md) #meta_learning #finetuning
1. [200413 Pretrained Transformers Improve Out-of-Distribution Robustness](papers/2020/200413%20Pretrained%20Transformers%20Improve%20Out-of-Distribution%20Robustness.md) #out_of_distribution
1. [200419 Are we pretraining it right](papers/2020/200419%20Are%20we%20pretraining%20it%20right.md) #multimodal
1. [200420 Adversarial Training for Large Neural Language Models](papers/2020/200420%20Adversarial%20Training%20for%20Large%20Neural%20Language%20Models.md) #adversarial_training #language_model #finetuning
1. [200420 MPNet](papers/2020/200420%20MPNet.md) #language_model
1. [200423 Don't Stop Pretraining](papers/2020/200423%20Don%27t%20Stop%20Pretraining.md) #domain
1. [200427 LightPAFF](papers/2020/200427%20LightPAFF.md) #distillation #finetuning
1. [200520 Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models](papers/2020/200520%20Pretraining%20with%20Contrastive%20Sentence%20Objectives%20Improves%20Discourse%20Performance%20of%20Language%20Models.md) #contrastive_learning #sentence_embedding
1. [200610 MC-BERT](papers/2020/200610%20MC-BERT.md)
1. [200615 To Pretrain or Not to Pretrain](papers/2020/200615%20To%20Pretrain%20or%20Not%20to%20Pretrain.md) #nlp #finetuning
1. [200626 Pre-training via Paraphrasing](papers/2020/200626%20Pre-training%20via%20Paraphrasing.md) #retrieval
1. [200703 Language-agnostic BERT Sentence Embedding](papers/2020/200703%20Language-agnostic%20BERT%20Sentence%20Embedding.md) #embedding #multilingual
1. [200713 An Empirical Study on Robustness to Spurious Correlations using](papers/2020/200713%20An%20Empirical%20Study%20on%20Robustness%20to%20Spurious%20Correlations%20using.md) #nlp #multitask
1. [200715 InfoXLM](papers/2020/200715%20InfoXLM.md) #nlp #cross_lingual
1. [200804 Taking Notes on the Fly Helps BERT Pre-training](papers/2020/200804%20Taking%20Notes%20on%20the%20Fly%20Helps%20BERT%20Pre-training.md) #nlp
1. [201020 Pushing the Limits of Semi-Supervised Learning for Automatic Speech](papers/2020/201020%20Pushing%20the%20Limits%20of%20Semi-Supervised%20Learning%20for%20Automatic%20Speech.md) #semi_supervised_learning #asr
1. [201021 Self-training and Pre-training are Complementary for Speech Recognition](papers/2020/201021%20Self-training%20and%20Pre-training%20are%20Complementary%20for%20Speech%20Recognition.md) #self_supervised #asr
1. [201022 mT5](papers/2020/201022%20mT5.md) #language_model #multilingual
1. [201109 When Do You Need Billions of Words of Pretraining Data](papers/2020/201109%20When%20Do%20You%20Need%20Billions%20of%20Words%20of%20Pretraining%20Data.md) #language_model
1. [201117 UP-DETR](papers/2020/201117%20UP-DETR.md) #detr #end2end #object_detection
1. [201127 Progressively Stacking 2.0](papers/2020/201127%20Progressively%20Stacking%202.0.md) #efficiency
1. [201201 Pre-Trained Image Processing Transformer](papers/2020/201201%20Pre-Trained%20Image%20Processing%20Transformer.md) #contrastive_learning #vision_transformer #restoration
1. [201201 StructFormer](papers/2020/201201%20StructFormer.md) #parse #attention #mlm
1. [201227 Syntax-Enhanced Pre-trained Model](papers/2020/201227%20Syntax-Enhanced%20Pre-trained%20Model.md) #language_model #syntax
1. [210225 SparseBERT](papers/2021/210225%20SparseBERT.md) #attention #sparse_attention #bert
1. [210318 All NLP Tasks Are Generation Tasks](papers/2021/210318%20All%20NLP%20Tasks%20Are%20Generation%20Tasks.md) #language_model
1. [210324 Can Vision Transformers Learn without Natural Images](papers/2021/210324%20Can%20Vision%20Transformers%20Learn%20without%20Natural%20Images.md) #vision_transformer
1. [210402 Robust wav2vec 2.0](papers/2021/210402%20Robust%20wav2vec%202.0.md) #asr
1. [210407 Pushing the Limits of Non-Autoregressive Speech Recognition](papers/2021/210407%20Pushing%20the%20Limits%20of%20Non-Autoregressive%20Speech%20Recognition.md) #non-autoregressive #asr #ctc
1. [210413 Masked Language Modeling and the Distributional Hypothesis](papers/2021/210413%20Masked%20Language%20Modeling%20and%20the%20Distributional%20Hypothesis.md) #language_model #mlm
1. [210417 mT6](papers/2021/210417%20mT6.md) #language_model
1. [210418 Data-Efficient Language-Supervised Zero-Shot Learning with](papers/2021/210418%20Data-Efficient%20Language-Supervised%20Zero-Shot%20Learning%20with.md) #multimodal
1. [210422 ImageNet-21K Pretraining for the Masses](papers/2021/210422%20ImageNet-21K%20Pretraining%20for%20the%20Masses.md) #backbone
1. [210606 On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation](papers/2021/210606%20On%20the%20Effectiveness%20of%20Adapter-based%20Tuning%20for%20Pretrained%20Language%20Model%20Adaptation.md) #finetuning #adapter
1. [210606 Rethinking Training from Scratch for Object Detection](papers/2021/210606%20Rethinking%20Training%20from%20Scratch%20for%20Object%20Detection.md) #object_detection
1. [210608 DETReg](papers/2021/210608%20DETReg.md) #detr
1. [210614 SAS](papers/2021/210614%20SAS.md)
1. [210615 BEiT](papers/2021/210615%20BEiT.md) #vit #bert
1. [210907 How much pretraining data do language models need to learn syntax](papers/2021/210907%20How%20much%20pretraining%20data%20do%20language%20models%20need%20to%20learn%20syntax.md) #bert
1. [210910 ReasonBERT](papers/2021/210910%20ReasonBERT.md) #bert #reasoning #qa
1. [210913 STraTA](papers/2021/210913%20STraTA.md) #finetuning #semi_supervised_learning #few_shot
1. [210914 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](papers/2021/210914%20Performance-Efficiency%20Trade-offs%20in%20Unsupervised%20Pre-training%20for%20Speech%20Recognition.md) #asr
1. [210914 Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding](papers/2021/210914%20Task-adaptive%20Pre-training%20and%20Self-training%20are%20Complementary%20for%20Natural%20Language%20Understanding.md) #finetuning #semi_supervised_learning #few_shot
1. [210927 BigSSL](papers/2021/210927%20BigSSL.md) #asr #semi_supervised_learning #unsupervised_training
1. [211005 Exploring the Limits of Large Scale Pre-training](papers/2021/211005%20Exploring%20the%20Limits%20of%20Large%20Scale%20Pre-training.md) #classificiation #scaling
1. [211018 Unsupervised Finetuning](papers/2021/211018%20Unsupervised%20Finetuning.md) #unsupervised_training #finetuning
1. [211026 WavLM](papers/2021/211026%20WavLM.md) #speech
1. [211103 VLMo](papers/2021/211103%20VLMo.md) #mixture_of_experts #vision-language
1. [211111 Masked Autoencoders Are Scalable Vision Learners](papers/2021/211111%20Masked%20Autoencoders%20Are%20Scalable%20Vision%20Learners.md) #vit
1. [211122 ExT5](papers/2021/211122%20ExT5.md) #multitask
1. [211122 Florence](papers/2021/211122%20Florence.md) #vision-language #transfer
1. [211201 Revisiting the Transferability of Supervised Pretraining](papers/2021/211201%20Revisiting%20the%20Transferability%20of%20Supervised%20Pretraining.md) #transfer
1. [211216 Masked Feature Prediction for Self-Supervised Visual Pre-Training](papers/2021/211216%20Masked%20Feature%20Prediction%20for%20Self-Supervised%20Visual%20Pre-Training.md) #self_supervised
1. [211220 Are Large-scale Datasets Necessary for Self-Supervised Pre-training](papers/2021/211220%20Are%20Large-scale%20Datasets%20Necessary%20for%20Self-Supervised%20Pre-training.md) #self_supervised #transfer
1. [220429 Vision-Language Pre-Training for Boosting Scene Text Detectors](papers/2022/220429%20Vision-Language%20Pre-Training%20for%20Boosting%20Scene%20Text%20Detectors.md)
1. [220914 PaLI](papers/2022/220914%20PaLI.md) #vision-language
1. [230808 Continual Pre-Training of Large Language Models](papers/2023/230808%20Continual%20Pre-Training%20of%20Large%20Language%20Models.md)
## probabilistic model
1. [200413 Einsum Networks](papers/2020/200413%20Einsum%20Networks.md)
1. [200419 Roundtrip](papers/2020/200419%20Roundtrip.md)
## prompt
1. [220118 ZeroPrompt](papers/2022/220118%20ZeroPrompt.md) #zero-shot
1. [220916 Text and Patterns](papers/2022/220916%20Text%20and%20Patterns.md)
1. [230207 Hard Prompts Made Easy](papers/2023/230207%20Hard%20Prompts%20Made%20Easy.md) #text2img
1. [230517 Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models](papers/2023/230517%20Chain-of-Symbol%20Prompting%20Elicits%20Planning%20in%20Large%20Langauge%20Models.md) #in_context_learning
1. [230517 Tree of Thoughts](papers/2023/230517%20Tree%20of%20Thoughts.md) #in_context_learning
## pruning
1. [200130 Rethinking Pruning](papers/2020/200130%20Rethinking%20Pruning.md)
1. [200218 Picking Winning Tickets Before Training by Preserving Gradient Flow](papers/2020/200218%20Picking%20Winning%20Tickets%20Before%20Training%20by%20Preserving%20Gradient%20Flow.md) #lottery_ticket
1. [200224 HRank](papers/2020/200224%20HRank.md) #rank
1. [200305 Comparing Rewinding and Fine-tuning in Neural Network Pruning](papers/2020/200305%20Comparing%20Rewinding%20and%20Fine-tuning%20in%20Neural%20Network%20Pruning.md)
1. [200424 Convolution-Weight-Distribution Assumption](papers/2020/200424%20Convolution-Weight-Distribution%20Assumption.md)
1. [200514 Bayesian Bits](papers/2020/200514%20Bayesian%20Bits.md) #quantization #variational_inference
1. [200515 Movement Pruning](papers/2020/200515%20Movement%20Pruning.md)
1. [200518 Joint Multi-Dimension Pruning](papers/2020/200518%20Joint%20Multi-Dimension%20Pruning.md)
1. [200706 Lossless CNN Channel Pruning via Decoupling Remembering and Forgetting](papers/2020/200706%20Lossless%20CNN%20Channel%20Pruning%20via%20Decoupling%20Remembering%20and%20Forgetting.md)
1. [200710 To Filter Prune, or to Layer Prune, That Is The Question](papers/2020/200710%20To%20Filter%20Prune%2C%20or%20to%20Layer%20Prune%2C%20That%20Is%20The%20Question.md)
## qa
1. [200222 Unsupervised Question Decomposition for Question Answering](papers/2020/200222%20Unsupervised%20Question%20Decomposition%20for%20Question%20Answering.md)
## quantization
1. [220815 LLM.int8()](papers/2022/220815%20LLM.int8%28%29.md)
1. [230216 Shared Microexponents](papers/2023/230216%20Shared%20Microexponents.md)
1. [230425 Stable and low-precision training for large-scale vision-language models](papers/2023/230425%20Stable%20and%20low-precision%20training%20for%20large-scale%20vision-language%20models.md) #optimizer
1. [230601 AWQ](papers/2023/230601%20AWQ.md)
1. [230719 ZeroQuant-FP](papers/2023/230719%20ZeroQuant-FP.md)
## reasoning
1. [200129 Neural Arithmetic Units](papers/2020/200129%20Neural%20Arithmetic%20Units.md)
1. [200409 Injecting Numerical Reasoning Skills into Language Models](papers/2020/200409%20Injecting%20Numerical%20Reasoning%20Skills%20into%20Language%20Models.md)
## recommender
1. [230510 Do LLMs Understand User Preferences](papers/2023/230510%20Do%20LLMs%20Understand%20User%20Preferences.md)
## regularization
1. [200130 DropAttention](papers/2020/200130%20DropAttention.md) #dropout
1. [200219 Revisiting Training Strategies and Generalization Performance in Deep](papers/2020/200219%20Revisiting%20Training%20Strategies%20and%20Generalization%20Performance%20in%20Deep.md) #metric_learning
1. [200225 On Feature Normalization and Data Augmentation](papers/2020/200225%20On%20Feature%20Normalization%20and%20Data%20Augmentation.md) #normalization #mixup
1. [200228 The Implicit and Explicit Regularization Effects of Dropout](papers/2020/200228%20The%20Implicit%20and%20Explicit%20Regularization%20Effects%20of%20Dropout.md) #dropout
1. [200331 Regularizing Class-wise Predictions via Self-knowledge Distillation](papers/2020/200331%20Regularizing%20Class-wise%20Predictions%20via%20Self-knowledge%20Distillation.md) #distillation #consistency_regularization
1. [200409 Orthogonal Over-Parameterized Training](papers/2020/200409%20Orthogonal%20Over-Parameterized%20Training.md)
1. [200424 Dropout as an Implicit Gating Mechanism For Continual Learning](papers/2020/200424%20Dropout%20as%20an%20Implicit%20Gating%20Mechanism%20For%20Continual%20Learning.md)
1. [200427 Scheduled DropHead](papers/2020/200427%20Scheduled%20DropHead.md)
1. [200513 Implicit Regularization in Deep Learning May Not Be Explainable by Norms](papers/2020/200513%20Implicit%20Regularization%20in%20Deep%20Learning%20May%20Not%20Be%20Explainable%20by%20Norms.md) #training #optimization
1. [200707 RIFLE](papers/2020/200707%20RIFLE.md) #finetuning
1. [200707 Remix](papers/2020/200707%20Remix.md) #imbalanced
1. [200721 Improving compute efficacy frontiers with SliceOut](papers/2020/200721%20Improving%20compute%20efficacy%20frontiers%20with%20SliceOut.md) #efficient_training
1. [201122 Stable Weight Decay Regularization](papers/2020/201122%20Stable%20Weight%20Decay%20Regularization.md)
1. [220527 Sharpness-Aware Training for Free](papers/2022/220527%20Sharpness-Aware%20Training%20for%20Free.md)
1. [230302 Dropout Reduces Underfitting](papers/2023/230302%20Dropout%20Reduces%20Underfitting.md) #dropout
## reinforcement learning
1. [191120 Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](papers/2019/191120%20Mastering%20Atari%2C%20Go%2C%20Chess%20and%20Shogi%20by%20Planning%20with%20a%20Learned%20Model.md)
1. [200130 Mastering Atari, Go, Chess, Shogi](papers/2020/200130%20Mastering%20Atari%2C%20Go%2C%20Chess%2C%20Shogi.md)
1. [200626 Critic Regularized Regression](papers/2020/200626%20Critic%20Regularized%20Regression.md)
1. [210929 Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization](papers/2021/210929%20Vision-Guided%20Quadrupedal%20Locomotion%20in%20the%20Wild%20with%20Multi-Modal%20Delay%20Randomization.md)
1. [211030 Mastering Atari Games with Limited Data](papers/2021/211030%20Mastering%20Atari%20Games%20with%20Limited%20Data.md)
## rendering
1. [200130 Textured Neural Avatars](papers/2020/200130%20Textured%20Neural%20Avatars.md)
## representation
1. [200220 Neural Bayes](papers/2020/200220%20Neural%20Bayes.md) #bayesian #clustering
1. [200412 Gradients as Features for Deep Representation Learning](papers/2020/200412%20Gradients%20as%20Features%20for%20Deep%20Representation%20Learning.md)
1. [201223 Noisy Labels Can Induce Good Representations](papers/2020/201223%20Noisy%20Labels%20Can%20Induce%20Good%20Representations.md) #noise
## resampling
1. [200512 Invertible Image Rescaling](papers/2020/200512%20Invertible%20Image%20Rescaling.md)
## restoration
1. [200402 Learning to See Through Obstructions](papers/2020/200402%20Learning%20to%20See%20Through%20Obstructions.md)
1. [200404 Deblurring by Realistic Blurring](papers/2020/200404%20Deblurring%20by%20Realistic%20Blurring.md)
1. [200406 Self-Supervised Scene De-occlusion](papers/2020/200406%20Self-Supervised%20Scene%20De-occlusion.md)
1. [201123 Cross-Camera Convolutional Color Constancy](papers/2020/201123%20Cross-Camera%20Convolutional%20Color%20Constancy.md)
1. [201123 Dissecting Image Crops](papers/2020/201123%20Dissecting%20Image%20Crops.md)
## retrieval
1. [210715 Internet-Augmented Dialogue Generation](papers/2021/210715%20Internet-Augmented%20Dialogue%20Generation.md) #dialog
1. [220124 Text and Code Embeddings by Contrastive Pre-Training](papers/2022/220124%20Text%20and%20Code%20Embeddings%20by%20Contrastive%20Pre-Training.md)
## review
1. [200130 Filter Response Normalization](papers/2020/200130%20Filter%20Response%20Normalization.md)
1. [200227 A Primer in BERTology](papers/2020/200227%20A%20Primer%20in%20BERTology.md) #bert
1. [200306 What is the State of Neural Network Pruning](papers/2020/200306%20What%20is%20the%20State%20of%20Neural%20Network%20Pruning.md) #pruning
1. [200318 A Metric Learning Reality Check](papers/2020/200318%20A%20Metric%20Learning%20Reality%20Check.md) #metric_learning
1. [200324 A Systematic Evaluation](papers/2020/200324%20A%20Systematic%20Evaluation.md)
1. [200325 Rethinking Few-Shot Image Classification](papers/2020/200325%20Rethinking%20Few-Shot%20Image%20Classification.md) #meta_learning
1. [200408 State of the Art on Neural Rendering](papers/2020/200408%20State%20of%20the%20Art%20on%20Neural%20Rendering.md) #neural_rendering
1. [200409 EvoNorm](papers/2020/200409%20EvoNorm.md)
1. [200428 Showing Your Work Doesn't Always Work](papers/2020/200428%20Showing%20Your%20Work%20Doesn%27t%20Always%20Work.md)
1. [200619 Augmentation for GANs](papers/2020/200619%20Augmentation%20for%20GANs.md)
1. [200627 Denoising Diffusion Probabilistic Models Implementation](papers/2020/200627%20Denoising%20Diffusion%20Probabilistic%20Models%20Implementation.md)
1. [200717 Semantic factor of GANs](papers/2020/200717%20Semantic%20factor%20of%20GANs.md)
1. [200725 Neighbor Embedding](papers/2020/200725%20Neighbor%20Embedding.md)
1. [200821 Virtual Try On](papers/2020/200821%20Virtual%20Try%20On.md)
1. [201016 Representation Learning via Invariant Causal Mechanisms](papers/2020/201016%20Representation%20Learning%20via%20Invariant%20Causal%20Mechanisms.md)
1. [201021 BYOL works even without batch statistics](papers/2020/201021%20BYOL%20works%20even%20without%20batch%20statistics.md)
1. [201108 Long Range Arena](papers/2020/201108%20Long%20Range%20Arena.md) #attention #efficient_attention
1. [201112 Learning Semantic-aware Normalization for Generative Adversarial Networks](papers/2020/201112%20Learning%20Semantic-aware%20Normalization%20for%20Generative%20Adversarial%20Networks.md)
1. [201112 When Do You Need Billions of Words of Pretraining Data](papers/2020/201112%20When%20Do%20You%20Need%20Billions%20of%20Words%20of%20Pretraining%20Data.md)
## rl
1. [230807 AlphaStar Unplugged](papers/2023/230807%20AlphaStar%20Unplugged.md)
## robustness
1. [200211 Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial](papers/2020/200211%20Fundamental%20Tradeoffs%20between%20Invariance%20and%20Sensitivity%20to%20Adversarial.md) #adversarial_training
1. [200304 A Closer Look at Accuracy vs. Robustness](papers/2020/200304%20A%20Closer%20Look%20at%20Accuracy%20vs.%20Robustness.md) #adversarial_training
1. [200810 Informative Dropout for Robust Representation Learning](papers/2020/200810%20Informative%20Dropout%20for%20Robust%20Representation%20Learning.md)
1. [220607 Can CNNs Be More Robust Than Transformers](papers/2022/220607%20Can%20CNNs%20Be%20More%20Robust%20Than%20Transformers.md)
## saliency
1. [200406 There and Back Again](papers/2020/200406%20There%20and%20Back%20Again.md)
## salient object detection
1. [200518 U$^2$-Net](papers/2020/200518%20U%24%5E2%24-Net.md)
## scale
1. [200712 Learning to Learn Parameterized Classification Networks for Scalable](papers/2020/200712%20Learning%20to%20Learn%20Parameterized%20Classification%20Networks%20for%20Scalable.md) #hypernetwork
1. [201130 Towards Better Accuracy-efficiency Trade-offs](papers/2020/201130%20Towards%20Better%20Accuracy-efficiency%20Trade-offs.md)
## score
1. [200319 GIQA](papers/2020/200319%20GIQA.md)
1. [200426 Evaluation Metrics for Conditional Image Generation](papers/2020/200426%20Evaluation%20Metrics%20for%20Conditional%20Image%20Generation.md)
## self supervised
1. [200213 Automatically Discovering and Learning New Visual Categories with Ranking Statistics](papers/2020/200213%20Automatically%20Discovering%20and%20Learning%20New%20Visual%20Categories%20with%20Ranking%20Statistics.md) #weak_supervision
1. [200218 MAST](papers/2020/200218%20MAST.md) #tracking
1. [200224 Self-Adaptive Training](papers/2020/200224%20Self-Adaptive%20Training.md) #noise #dataset
1. [200408 Improving BERT with Self-Supervised Attention](papers/2020/200408%20Improving%20BERT%20with%20Self-Supervised%20Attention.md) #bert #distillation
1. [200722 CrossTransformers](papers/2020/200722%20CrossTransformers.md) #few_shot
1. [201015 Representation Learning via Invariant Causal Mechanisms](papers/2020/201015%20Representation%20Learning%20via%20Invariant%20Causal%20Mechanisms.md) #causality
1. [201117 Neural Semi-supervised Learning for Text Classification Under](papers/2020/201117%20Neural%20Semi-supervised%20Learning%20for%20Text%20Classification%20Under.md) #nlp
1. [201125 Can Temporal Information Help with Contrastive Self-Supervised Learning](papers/2020/201125%20Can%20Temporal%20Information%20Help%20with%20Contrastive%20Self-Supervised%20Learning.md) #video #augmentation
1. [201224 Self-supervised Pre-training with Hard Examples Improves Visual](papers/2020/201224%20Self-supervised%20Pre-training%20with%20Hard%20Examples%20Improves%20Visual.md) #mixup
1. [210726 Continental-Scale Building Detection from High Resolution Satellite Imagery](papers/2021/210726%20Continental-Scale%20Building%20Detection%20from%20High%20Resolution%20Satellite%20Imagery.md)
1. [210827 Injecting Text in Self-Supervised Speech Pretraining](papers/2021/210827%20Injecting%20Text%20in%20Self-Supervised%20Speech%20Pretraining.md) #asr
1. [210927 Compressive Visual Representations](papers/2021/210927%20Compressive%20Visual%20Representations.md)
1. [211027 Neural Analysis and Synthesis](papers/2021/211027%20Neural%20Analysis%20and%20Synthesis.md) #audio_synthesis
1. [220124 data2vec](papers/2022/220124%20data2vec.md)
1. [220216 Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision](papers/2022/220216%20Vision%20Models%20Are%20More%20Robust%20And%20Fair%20When%20Pretrained%20On%20Uncurated%20Images%20Without%20Supervision.md)
1. [220520 Uniform Masking](papers/2022/220520%20Uniform%20Masking.md)
1. [220526 Green Hierarchical Vision Transformer for Masked Image Modeling](papers/2022/220526%20Green%20Hierarchical%20Vision%20Transformer%20for%20Masked%20Image%20Modeling.md)
1. [220526 MixMIM](papers/2022/220526%20MixMIM.md)
1. [220526 Revealing the Dark Secrets of Masked Image Modeling](papers/2022/220526%20Revealing%20the%20Dark%20Secrets%20of%20Masked%20Image%20Modeling.md) #representation
1. [220715 Is a Caption Worth a Thousand Images](papers/2022/220715%20Is%20a%20Caption%20Worth%20a%20Thousand%20Images.md) #clip
1. [220803 Masked Vision and Language Modeling for Multi-modal Representation Learning](papers/2022/220803%20Masked%20Vision%20and%20Language%20Modeling%20for%20Multi-modal%20Representation%20Learning.md) #mlm
## self supervised discovery
1. [200403 Self-Supervised Viewpoint Learning From Image Collections](papers/2020/200403%20Self-Supervised%20Viewpoint%20Learning%20From%20Image%20Collections.md) #viewpoint
1. [201127 Unsupervised part representation by Flow Capsules](papers/2020/201127%20Unsupervised%20part%20representation%20by%20Flow%20Capsules.md)
1. [210429 MarioNette](papers/2021/210429%20MarioNette.md)
## semantic factor
1. [200307 StyleGAN2 Distillation for Feed-forward Image Manipulation](papers/2020/200307%20StyleGAN2%20Distillation%20for%20Feed-forward%20Image%20Manipulation.md) #stylegan
1. [200308 PULSE](papers/2020/200308%20PULSE.md) #stylegan
1. [200406 GANSpace](papers/2020/200406%20GANSpace.md)
1. [201222 Time-Travel Rephotography](papers/2020/201222%20Time-Travel%20Rephotography.md) #restoration #stylegan
## semantic segmentation
1. [200323 Learning Dynamic Routing for Semantic Segmentation](papers/2020/200323%20Learning%20Dynamic%20Routing%20for%20Semantic%20Segmentation.md)
1. [200516 Single-Stage Semantic Segmentation from Image Labels](papers/2020/200516%20Single-Stage%20Semantic%20Segmentation%20from%20Image%20Labels.md)
1. [200826 EfficientFCN](papers/2020/200826%20EfficientFCN.md)
1. [210512 Segmenter](papers/2021/210512%20Segmenter.md)
1. [220918 SegNeXt](papers/2022/220918%20SegNeXt.md)
## semi supervised learning
1. [200306 Semi-Supervised StyleGAN for Disentanglement Learning](papers/2020/200306%20Semi-Supervised%20StyleGAN%20for%20Disentanglement%20Learning.md) #stylegan #mixup
1. [200323 Meta Pseudo Labels](papers/2020/200323%20Meta%20Pseudo%20Labels.md) #meta_learning
1. [200627 Laplacian Regularized Few-Shot Learning](papers/2020/200627%20Laplacian%20Regularized%20Few-Shot%20Learning.md) #few_shot
1. [200724 Deep Co-Training with Task Decomposition for Semi-Supervised Domain](papers/2020/200724%20Deep%20Co-Training%20with%20Task%20Decomposition%20for%20Semi-Supervised%20Domain.md) #domain_adaptation
1. [201116 On the Marginal Benefit of Active Learning](papers/2020/201116%20On%20the%20Marginal%20Benefit%20of%20Active%20Learning.md) #active_learning #unsupervised_training
1. [201118 FROST](papers/2020/201118%20FROST.md)
1. [220811 Semi-supervised Vision Transformers at Scale](papers/2022/220811%20Semi-supervised%20Vision%20Transformers%20at%20Scale.md)
1. [220829 Open-Set Semi-Supervised Object Detection](papers/2022/220829%20Open-Set%20Semi-Supervised%20Object%20Detection.md) #open_set_recognition
1. [220918 The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning](papers/2022/220918%20The%20Geometry%20of%20Self-supervised%20Learning%20Models%20and%20its%20Impact%20on%20Transfer%20Learning.md)
## seq2seq
1. [230502 Unlimiformer](papers/2023/230502%20Unlimiformer.md)
## sgld
1. [200706 Kernel Stein Generative Modeling](papers/2020/200706%20Kernel%20Stein%20Generative%20Modeling.md) #svgd
## singing voice synthesis
1. [211008 KaraSinger](papers/2021/211008%20KaraSinger.md)
## single image
1. [200405 Structural-analogy from a Single Image Pair](papers/2020/200405%20Structural-analogy%20from%20a%20Single%20Image%20Pair.md)
## speech
1. [200129 Speech Recognition](papers/2020/200129%20Speech%20Recognition.md)
1. [200129 WaveFlow](papers/2020/200129%20WaveFlow.md) #conditional_generative_model
1. [230511 CoMoSpeech](papers/2023/230511%20CoMoSpeech.md) #audio_synthesis
## state space model
1. [211031 Efficiently Modeling Long Sequences with Structured State Spaces](papers/2021/211031%20Efficiently%20Modeling%20Long%20Sequences%20with%20Structured%20State%20Spaces.md)
1. [221017 What Makes Convolutional Models Great on Long Sequence Modeling](papers/2022/221017%20What%20Makes%20Convolutional%20Models%20Great%20on%20Long%20Sequence%20Modeling.md)
1. [230213 Simple Hardware-Efficient Long Convolutions for Sequence Modeling](papers/2023/230213%20Simple%20Hardware-Efficient%20Long%20Convolutions%20for%20Sequence%20Modeling.md)
## structure learning
1. [200518 Large-scale empirical validation of Bayesian Network structure learning](papers/2020/200518%20Large-scale%20empirical%20validation%20of%20Bayesian%20Network%20structure%20learning.md)
## style transfer
1. [200318 A Content Transformation Block For Image Style Transfer](papers/2020/200318%20A%20Content%20Transformation%20Block%20For%20Image%20Style%20Transfer.md)
1. [200324 Deformable Style Transfer](papers/2020/200324%20Deformable%20Style%20Transfer.md)
1. [200710 Geometric Style Transfer](papers/2020/200710%20Geometric%20Style%20Transfer.md)
## stylegan
1. [210318 Labels4Free](papers/2021/210318%20Labels4Free.md) #unsupervised_segmentation
## super resolution
1. [200129 ESRGAN+](papers/2020/200129%20ESRGAN%2B.md)
1. [200323 Deep Unfolding Network for Image Super-Resolution](papers/2020/200323%20Deep%20Unfolding%20Network%20for%20Image%20Super-Resolution.md)
## table
1. [210906 Parsing Table Structures in the Wild](papers/2021/210906%20Parsing%20Table%20Structures%20in%20the%20Wild.md)
1. [220809 TSRFormer](papers/2022/220809%20TSRFormer.md)
## text generation
1. [200130 Unlikelihood Training](papers/2020/200130%20Unlikelihood%20Training.md)
1. [200605 CoCon](papers/2020/200605%20CoCon.md)
## text2img
1. [221125 3DDesigner](papers/2022/221125%203DDesigner.md) #3d_generative_model
1. [221125 SpaText](papers/2022/221125%20SpaText.md)
1. [230502 Pick-a-Pic](papers/2023/230502%20Pick-a-Pic.md)
## tokenizer
1. [211006 How BPE Affects Memorization in Transformers](papers/2021/211006%20How%20BPE%20Affects%20Memorization%20in%20Transformers.md)
1. [230421 Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition](papers/2023/230421%20Evaluating%20Transformer%20Language%20Models%20on%20Arithmetic%20Operations%20Using%20Number%20Decomposition.md)
## topic model
1. [200426 Neural Topic Modeling with Bidirectional Adversarial Training](papers/2020/200426%20Neural%20Topic%20Modeling%20with%20Bidirectional%20Adversarial%20Training.md)
## topology
1. [200413 Topology of deep neural networks](papers/2020/200413%20Topology%20of%20deep%20neural%20networks.md) #theory
## tracking
1. [200402 Tracking Objects as Points](papers/2020/200402%20Tracking%20Objects%20as%20Points.md) #keypoint
1. [200402 Tracking by Instance Detection](papers/2020/200402%20Tracking%20by%20Instance%20Detection.md) #meta_learning
1. [200403 FairMOT](papers/2020/200403%20FairMOT.md)
1. [200506 PeTra](papers/2020/200506%20PeTra.md)
1. [201215 Detecting Invisible People](papers/2020/201215%20Detecting%20Invisible%20People.md)
1. [211013 ByteTrack](papers/2021/211013%20ByteTrack.md)
## training
1. [200702 Beyond Signal Propagation](papers/2020/200702%20Beyond%20Signal%20Propagation.md)
## transducer
1. [200519 A New Training Pipeline for an Improved Neural Transducer](papers/2020/200519%20A%20New%20Training%20Pipeline%20for%20an%20Improved%20Neural%20Transducer.md)
## transfer
1. [200130 BiT ResNet](papers/2020/200130%20BiT%20ResNet.md) #resnet
1. [200512 Neural Architecture Transfer](papers/2020/200512%20Neural%20Architecture%20Transfer.md) #nas
1. [200711 Adversarially-Trained Deep Nets Transfer Better](papers/2020/200711%20Adversarially-Trained%20Deep%20Nets%20Transfer%20Better.md) #adversarial_training
1. [200716 Do Adversarially Robust ImageNet Models Transfer Better](papers/2020/200716%20Do%20Adversarially%20Robust%20ImageNet%20Models%20Transfer%20Better.md) #robust
1. [200721 Adversarial Training Reduces Information and Improves Transferability](papers/2020/200721%20Adversarial%20Training%20Reduces%20Information%20and%20Improves%20Transferability.md) #adversarial_training
1. [201122 Ranking Neural Checkpoints](papers/2020/201122%20Ranking%20Neural%20Checkpoints.md)
1. [211012 Rethinking supervised pre-training for better downstream transferring](papers/2021/211012%20Rethinking%20supervised%20pre-training%20for%20better%20downstream%20transferring.md) #classificiation #metric_learning
## transformer
1. [200129 Are Transformers universal approximator](papers/2020/200129%20Are%20Transformers%20universal%20approximator.md)
1. [200129 Product Key Memory](papers/2020/200129%20Product%20Key%20Memory.md) #attention
1. [200129 Reformer](papers/2020/200129%20Reformer.md) #attention
1. [200130 RoBERTa](papers/2020/200130%20RoBERTa.md) #pretraining #language_model #nlp
1. [200130 Sparse Transformer](papers/2020/200130%20Sparse%20Transformer.md) #generative_model
1. [200130 Structured Pruning for LM](papers/2020/200130%20Structured%20Pruning%20for%20LM.md) #pruning
1. [200130 T5](papers/2020/200130%20T5.md) #pretraining #nlp #seq2seq
1. [200207 Transformer Transducer](papers/2020/200207%20Transformer%20Transducer.md) #asr #transducer
1. [200211 On Layer Normalization in the Transformer Architecture](papers/2020/200211%20On%20Layer%20Normalization%20in%20the%20Transformer%20Architecture.md) #normalization
1. [200212 GLU Variants Improve Transformer](papers/2020/200212%20GLU%20Variants%20Improve%20Transformer.md) #activation
1. [200214 Transformer on a Diet](papers/2020/200214%20Transformer%20on%20a%20Diet.md) #efficient_attention
1. [200214 Transformers as Soft Reasoners over Language](papers/2020/200214%20Transformers%20as%20Soft%20Reasoners%20over%20Language.md) #language
1. [200215 Fine-Tuning Pretrained Language Models](papers/2020/200215%20Fine-Tuning%20Pretrained%20Language%20Models.md) #bert #finetuning
1. [200221 Addressing Some Limitations of Transformers with Feedback Memory](papers/2020/200221%20Addressing%20Some%20Limitations%20of%20Transformers%20with%20Feedback%20Memory.md) #recurrent
1. [200305 Talking-Heads Attention](papers/2020/200305%20Talking-Heads%20Attention.md) #attention
1. [200424 Lite Transformer with Long-Short Range Attention](papers/2020/200424%20Lite%20Transformer%20with%20Long-Short%20Range%20Attention.md) #lightweight
1. [200515 Finding Experts in Transformer Models](papers/2020/200515%20Finding%20Experts%20in%20Transformer%20Models.md)
1. [200515 JDI-T](papers/2020/200515%20JDI-T.md) #tts
1. [200516 Conformer](papers/2020/200516%20Conformer.md) #asr
1. [200518 Weak-Attention Suppression For Transformer Based Speech Recognition](papers/2020/200518%20Weak-Attention%20Suppression%20For%20Transformer%20Based%20Speech%20Recognition.md) #asr
1. [200605 Funnel-Transformer](papers/2020/200605%20Funnel-Transformer.md) #efficient_attention
1. [200707 Do Transformers Need Deep Long-Range Memory](papers/2020/200707%20Do%20Transformers%20Need%20Deep%20Long-Range%20Memory.md) #lm #attention
1. [200709 Fast Transformers with Clustered Attention](papers/2020/200709%20Fast%20Transformers%20with%20Clustered%20Attention.md) #attention
1. [200715 AdapterHub](papers/2020/200715%20AdapterHub.md) #nlp #finetuning
1. [200727 Big Bird](papers/2020/200727%20Big%20Bird.md) #attention
1. [200802 DeLighT](papers/2020/200802%20DeLighT.md) #nlp
1. [201217 Taming Transformers for High-Resolution Image Synthesis](papers/2020/201217%20Taming%20Transformers%20for%20High-Resolution%20Image%20Synthesis.md) #discrete_vae #generative_model #autoregressive_model
1. [201221 RealFormer](papers/2020/201221%20RealFormer.md) #attention
1. [201227 SG-Net](papers/2020/201227%20SG-Net.md) #syntax #attention
1. [210223 Do Transformer Modifications Transfer Across Implementations and](papers/2021/210223%20Do%20Transformer%20Modifications%20Transfer%20Across%20Implementations%20and.md)
1. [210225 Evolving Attention with Residual Convolutions](papers/2021/210225%20Evolving%20Attention%20with%20Residual%20Convolutions.md) #attention
1. [210318 HiT](papers/2021/210318%20HiT.md) #video #retrieval
1. [210318 Looking Beyond Two Frames](papers/2021/210318%20Looking%20Beyond%20Two%20Frames.md) #tracking
1. [210318 TFPose](papers/2021/210318%20TFPose.md) #pose
1. [210318 TransCenter](papers/2021/210318%20TransCenter.md) #tracking
1. [210318 Transformer Trackin](papers/2021/210318%20Transformer%20Trackin.md) #tracking
1. [210407 Seeing Out of tHe bOx](papers/2021/210407%20Seeing%20Out%20of%20tHe%20bOx.md) #multimodal #vision-language
1. [210409 Efficient Large-Scale Language Model Training on GPU Clusters](papers/2021/210409%20Efficient%20Large-Scale%20Language%20Model%20Training%20on%20GPU%20Clusters.md) #distributed_training
1. [210409 Not All Attention Is All You Need](papers/2021/210409%20Not%20All%20Attention%20Is%20All%20You%20Need.md)
1. [210410 UniDrop](papers/2021/210410%20UniDrop.md) #regularization
1. [210417 Demystifying the Better Performance of Position Encoding Variants for](papers/2021/210417%20Demystifying%20the%20Better%20Performance%20of%20Position%20Encoding%20Variants%20for.md) #positional_encoding
1. [210420 RoFormer](papers/2021/210420%20RoFormer.md) #positional_encoding
1. [210423 M3DeTR](papers/2021/210423%20M3DeTR.md) #3d
1. [210509 FNet](papers/2021/210509%20FNet.md) #efficient_attention #fourier
1. [210510 Are Pre-trained Convolutions Better than Pre-trained Transformers](papers/2021/210510%20Are%20Pre-trained%20Convolutions%20Better%20than%20Pre-trained%20Transformers.md) #pretraining #nlp #convolution
1. [210613 Thinking Like Transformers](papers/2021/210613%20Thinking%20Like%20Transformers.md)
1. [210617 Multi-head or Single-head](papers/2021/210617%20Multi-head%20or%20Single-head.md)
1. [210730 Perceiver IO](papers/2021/210730%20Perceiver%20IO.md)
1. [210809 Making Transformers Solve Compositional Tasks](papers/2021/210809%20Making%20Transformers%20Solve%20Compositional%20Tasks.md)
1. [210812 Mobile-Former](papers/2021/210812%20Mobile-Former.md) #backbone
1. [210830 A Battle of Network Structures](papers/2021/210830%20A%20Battle%20of%20Network%20Structures.md) #cnn #mlp #backbone
1. [210830 Shatter](papers/2021/210830%20Shatter.md) #bert
1. [210908 Panoptic SegFormer](papers/2021/210908%20Panoptic%20SegFormer.md) #panoptic_segmentation #detr
1. [210909 Bag of Tricks for Optimizing Transformer Efficiency](papers/2021/210909%20Bag%20of%20Tricks%20for%20Optimizing%20Transformer%20Efficiency.md) #nmt #lightweight
1. [210917 Primer](papers/2021/210917%20Primer.md) #lm #nas
1. [210922 Scale Efficiently](papers/2021/210922%20Scale%20Efficiently.md)
1. [211018 NormFormer](papers/2021/211018%20NormFormer.md)
1. [211026 Hierarchical Transformers Are More Efficient Language Models](papers/2021/211026%20Hierarchical%20Transformers%20Are%20More%20Efficient%20Language%20Models.md) #lm #efficient_attention
1. [211122 MetaFormer is Actually What You Need for Vision](papers/2021/211122%20MetaFormer%20is%20Actually%20What%20You%20Need%20for%20Vision.md) #vit
1. [211124 Sparse is Enough in Scaling Transformers](papers/2021/211124%20Sparse%20is%20Enough%20in%20Scaling%20Transformers.md) #sparsity #efficiency
1. [220221 Transformer Quality in Linear Time](papers/2022/220221%20Transformer%20Quality%20in%20Linear%20Time.md) #efficient_attention #linear_attention #local_attention
1. [220301 DeepNet](papers/2022/220301%20DeepNet.md) #normalization
1. [220330 Transformer Language Models without Positional Encodings Still Learn Positional Information](papers/2022/220330%20Transformer%20Language%20Models%20without%20Positional%20Encodings%20Still%20Learn%20Positional%20Information.md) #lm #positional_encoding
1. [220924 In-context Learning and Induction Heads](papers/2022/220924%20In-context%20Learning%20and%20Induction%20Heads.md) #in_context_learning
1. [221004 MOAT](papers/2022/221004%20MOAT.md) #backbone
1. [221220 A Length-Extrapolatable Transformer](papers/2022/221220%20A%20Length-Extrapolatable%20Transformer.md) #positional_encoding
1. [230209 In-Context Learning with Many Demonstration Examples](papers/2023/230209%20In-Context%20Learning%20with%20Many%20Demonstration%20Examples.md) #efficient_attention
1. [230311 Stabilizing Transformer Training by Preventing Attention Entropy Collapse](papers/2023/230311%20Stabilizing%20Transformer%20Training%20by%20Preventing%20Attention%20Entropy%20Collapse.md) #stability
1. [230419 Scaling Transformer to 1M tokens and beyond with RMT](papers/2023/230419%20Scaling%20Transformer%20to%201M%20tokens%20and%20beyond%20with%20RMT.md)
1. [230428 ResiDual](papers/2023/230428%20ResiDual.md) #normalization
1. [230504 BranchNorm](papers/2023/230504%20BranchNorm.md) #normalization
1. [230504 On the Expressivity Role of LayerNorm in Transformers' Attention](papers/2023/230504%20On%20the%20Expressivity%20Role%20of%20LayerNorm%20in%20Transformers%27%20Attention.md) #attention #normalization
1. [230507 Vcc](papers/2023/230507%20Vcc.md) #efficient_attention
1. [230512 MEGABYTE](papers/2023/230512%20MEGABYTE.md) #tokenizer
1. [230512 TinyStories](papers/2023/230512%20TinyStories.md) #lm
1. [230522 GQA](papers/2023/230522%20GQA.md)
1. [230530 Grokking of Hierarchical Structure in Vanilla Transformers](papers/2023/230530%20Grokking%20of%20Hierarchical%20Structure%20in%20Vanilla%20Transformers.md)
1. [230612 Augmenting Language Models with Long-Term Memory](papers/2023/230612%20Augmenting%20Language%20Models%20with%20Long-Term%20Memory.md)
1. [230622 Quantizable Transformers](papers/2023/230622%20Quantizable%20Transformers.md)
1. [230627 Length Generalization in Arithmetic Transformers](papers/2023/230627%20Length%20Generalization%20in%20Arithmetic%20Transformers.md)
1. [230706 Lost in the Middle](papers/2023/230706%20Lost%20in%20the%20Middle.md) #lm
1. [230707 Teaching Arithmetic to Small Transformers](papers/2023/230707%20Teaching%20Arithmetic%20to%20Small%20Transformers.md)
1. [230720 L-Eval](papers/2023/230720%20L-Eval.md) #benchmark
1. [230727 Scaling TransNormer to 175 Billion Parameters](papers/2023/230727%20Scaling%20TransNormer%20to%20175%20Billion%20Parameters.md) #efficient_attention
## tropical geometry
1. [200220 On the Decision Boundaries of Neural Networks](papers/2020/200220%20On%20the%20Decision%20Boundaries%20of%20Neural%20Networks.md)
## tts
1. [200512 Flowtron](papers/2020/200512%20Flowtron.md) #flow
1. [210617 WaveGrad 2](papers/2021/210617%20WaveGrad%202.md)
## uncertainty
1. [210727 A Tale Of Two Long Tails](papers/2021/210727%20A%20Tale%20Of%20Two%20Long%20Tails.md)
## unsupervised img2img
1. [200310 Unpaired Image-to-Image Translation using Adversarial Consistency Loss](papers/2020/200310%20Unpaired%20Image-to-Image%20Translation%20using%20Adversarial%20Consistency%20Loss.md)
1. [200611 Rethinking the Truly Unsupervised Image-to-Image Translation](papers/2020/200611%20Rethinking%20the%20Truly%20Unsupervised%20Image-to-Image%20Translation.md)
1. [201201 Unpaired Image-to-Image Translation via Latent Energy Transport](papers/2020/201201%20Unpaired%20Image-to-Image%20Translation%20via%20Latent%20Energy%20Transport.md)
## unsupervised nmt
1. [200422 When and Why is Unsupervised Neural Machine Translation Useless](papers/2020/200422%20When%20and%20Why%20is%20Unsupervised%20Neural%20Machine%20Translation%20Useless.md)
## vae
1. [200420 Bringing Old Photos Back to Life](papers/2020/200420%20Bringing%20Old%20Photos%20Back%20to%20Life.md) #restoration
1. [200707 NVAE](papers/2020/200707%20NVAE.md)
1. [201119 Dual Contradistinctive Generative Autoencoder](papers/2020/201119%20Dual%20Contradistinctive%20Generative%20Autoencoder.md)
1. [201120 Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them](papers/2020/201120%20Very%20Deep%20VAEs%20Generalize%20Autoregressive%20Models%20and%20Can%20Outperform%20Them.md)
## video
1. [210325 An Image is Worth 16x16 Words, What is a Video Worth](papers/2021/210325%20An%20Image%20is%20Worth%2016x16%20Words%2C%20What%20is%20a%20Video%20Worth.md)
## video transformer
1. [210423 VidTr](papers/2021/210423%20VidTr.md)
## vision
1. [200305 Optimizing JPEG Quantization for Classification Networks](papers/2020/200305%20Optimizing%20JPEG%20Quantization%20for%20Classification%20Networks.md)
1. [201127 Field of Junctions](papers/2020/201127%20Field%20of%20Junctions.md)
## vision language
1. [201212 MiniVLM](papers/2020/201212%20MiniVLM.md)
1. [201222 Seeing past words](papers/2020/201222%20Seeing%20past%20words.md)
1. [210407 Multimodal Fusion Refiner Networks](papers/2021/210407%20Multimodal%20Fusion%20Refiner%20Networks.md)
1. [210727 Is Object Detection Necessary for Human-Object Interaction Recognition](papers/2021/210727%20Is%20Object%20Detection%20Necessary%20for%20Human-Object%20Interaction%20Recognition.md) #human-object-interaction
1. [220221 Vision-Language Pre-Training with Triple Contrastive Learning](papers/2022/220221%20Vision-Language%20Pre-Training%20with%20Triple%20Contrastive%20Learning.md)
1. [220504 CoCa](papers/2022/220504%20CoCa.md)
1. [220612 GLIPv2](papers/2022/220612%20GLIPv2.md)
1. [220615 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone](papers/2022/220615%20Coarse-to-Fine%20Vision-Language%20Pre-training%20with%20Fusion%20in%20the%20Backbone.md)
1. [220617 Bridge-Tower](papers/2022/220617%20Bridge-Tower.md)
1. [220617 Unified-IO](papers/2022/220617%20Unified-IO.md) #multitask
1. [220810 Patching open-vocabulary models by interpolating weights](papers/2022/220810%20Patching%20open-vocabulary%20models%20by%20interpolating%20weights.md) #clip #multitask #domain
1. [220822 Image as a Foreign Language](papers/2022/220822%20Image%20as%20a%20Foreign%20Language.md) #mlm
1. [230209 Re-ViLM](papers/2023/230209%20Re-ViLM.md)
1. [230313 Scaling Vision-Language Models with Sparse Mixture of Experts](papers/2023/230313%20Scaling%20Vision-Language%20Models%20with%20Sparse%20Mixture%20of%20Experts.md) #mixture_of_experts
## vision transformer
1. [201127 General Multi-label Image Classification with Transformers](papers/2020/201127%20General%20Multi-label%20Image%20Classification%20with%20Transformers.md)
1. [201223 A Survey on Visual Transformer](papers/2020/201223%20A%20Survey%20on%20Visual%20Transformer.md)
1. [201223 Training data-efficient image transformers & distillation through](papers/2020/201223%20Training%20data-efficient%20image%20transformers%20%26%20distillation%20through.md) #distillation
1. [210223 Pyramid Vision Transformer](papers/2021/210223%20Pyramid%20Vision%20Transformer.md)
1. [210318 CrossViT](papers/2021/210318%20CrossViT.md)
1. [210318 CvT](papers/2021/210318%20CvT.md)
1. [210318 Multi-Scale Vision Longformer](papers/2021/210318%20Multi-Scale%20Vision%20Longformer.md)
1. [210319 ConViT](papers/2021/210319%20ConViT.md)
1. [210319 Scalable Visual Transformers with Hierarchical Pooling](papers/2021/210319%20Scalable%20Visual%20Transformers%20with%20Hierarchical%20Pooling.md)
1. [210324 Vision Transformers for Dense Prediction](papers/2021/210324%20Vision%20Transformers%20for%20Dense%20Prediction.md) #fpn
1. [210325 Swin Transformer](papers/2021/210325%20Swin%20Transformer.md) #local_attention
1. [210331 Going deeper with Image Transformers](papers/2021/210331%20Going%20deeper%20with%20Image%20Transformers.md)
1. [210402 LeViT](papers/2021/210402%20LeViT.md)
1. [210421 Token Labeling](papers/2021/210421%20Token%20Labeling.md)
1. [210422 Multiscale Vision Transformers](papers/2021/210422%20Multiscale%20Vision%20Transformers.md)
1. [210422 So-ViT](papers/2021/210422%20So-ViT.md)
1. [210426 Improve Vision Transformers Training by Suppressing Over-smoothing](papers/2021/210426%20Improve%20Vision%20Transformers%20Training%20by%20Suppressing%20Over-smoothing.md)
1. [210426 Visformer](papers/2021/210426%20Visformer.md)
1. [210427 ConTNet](papers/2021/210427%20ConTNet.md)
1. [210428 Twins](papers/2021/210428%20Twins.md) #local_attention #positional_encoding
1. [210509 Conformer](papers/2021/210509%20Conformer.md)
1. [210515 Are Convolutional Neural Networks or Transformers more like human vision](papers/2021/210515%20Are%20Convolutional%20Neural%20Networks%20or%20Transformers%20more%20like%20human%20vision.md) #cnn #inductive_bias
1. [210517 Rethinking the Design Principles of Robust Vision Transformer](papers/2021/210517%20Rethinking%20the%20Design%20Principles%20of%20Robust%20Vision%20Transformer.md) #robustness
## visual grounding
1. [210401 Towards General Purpose Vision Systems](papers/2021/210401%20Towards%20General%20Purpose%20Vision%20Systems.md)
1. [210510 Visual Grounding with Transformers](papers/2021/210510%20Visual%20Grounding%20with%20Transformers.md)
## vit
1. [210521 Intriguing Properties of Vision Transformers](papers/2021/210521%20Intriguing%20Properties%20of%20Vision%20Transformers.md) #robustness
1. [210526 Aggregating Nested Transformers](papers/2021/210526%20Aggregating%20Nested%20Transformers.md) #local_attention
1. [210529 Less is More](papers/2021/210529%20Less%20is%20More.md)
1. [210603 DynamicViT](papers/2021/210603%20DynamicViT.md) #sparse_attention
1. [210603 When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations](papers/2021/210603%20When%20Vision%20Transformers%20Outperform%20ResNets%20without%20Pretraining%20or%20Strong%20Data%20Augmentations.md) #regularization
1. [210604 RegionViT](papers/2021/210604%20RegionViT.md) #local_attention
1. [210607 Refiner](papers/2021/210607%20Refiner.md) #attention
1. [210607 Shuffle Transformer](papers/2021/210607%20Shuffle%20Transformer.md)
1. [210608 Scaling Vision Transformers](papers/2021/210608%20Scaling%20Vision%20Transformers.md) #scale
1. [210609 CoAtNet](papers/2021/210609%20CoAtNet.md)
1. [210614 Delving Deep into the Generalization of Vision Transformers under Distribution Shifts](papers/2021/210614%20Delving%20Deep%20into%20the%20Generalization%20of%20Vision%20Transformers%20under%20Distribution%20Shifts.md) #robustness
1. [210615 Revisiting the Calibration of Modern Neural Networks](papers/2021/210615%20Revisiting%20the%20Calibration%20of%20Modern%20Neural%20Networks.md) #mlp #calibration
1. [210617 XCiT](papers/2021/210617%20XCiT.md) #efficient_attention
1. [210624 Exploring Corruption Robustness](papers/2021/210624%20Exploring%20Corruption%20Robustness.md) #robustness #mlp
1. [210624 VOLO](papers/2021/210624%20VOLO.md) #efficient_attention
1. [210624 Video Swin Transformer](papers/2021/210624%20Video%20Swin%20Transformer.md) #local_attention #video #video_transformer
1. [210701 CSWin Transformer](papers/2021/210701%20CSWin%20Transformer.md) #efficient_attention #local_attention
1. [210701 Focal Self-attention for Local-Global Interactions in Vision Transformers](papers/2021/210701%20Focal%20Self-attention%20for%20Local-Global%20Interactions%20in%20Vision%20Transformers.md) #local_attention
1. [210705 What Makes for Hierarchical Vision Transformer](papers/2021/210705%20What%20Makes%20for%20Hierarchical%20Vision%20Transformer.md) #attention #mlp #local_attention
1. [210713 Visual Parser](papers/2021/210713%20Visual%20Parser.md) #local_attention
1. [210731 CrossFormer](papers/2021/210731%20CrossFormer.md)
1. [210811 ConvNets vs. Transformers](papers/2021/210811%20ConvNets%20vs.%20Transformers.md) #robustness #transfer
1. [210819 Do Vision Transformers See Like Convolutional Neural Networks](papers/2021/210819%20Do%20Vision%20Transformers%20See%20Like%20Convolutional%20Neural%20Networks.md) #resnet
1. [210908 Scaled ReLU Matters for Training Vision Transformers](papers/2021/210908%20Scaled%20ReLU%20Matters%20for%20Training%20Vision%20Transformers.md) #cnn
1. [211118 Swin Transformer V2](papers/2021/211118%20Swin%20Transformer%20V2.md)
1. [211202 Improved Multiscale Vision Transformers for Classification and Detection](papers/2021/211202%20Improved%20Multiscale%20Vision%20Transformers%20for%20Classification%20and%20Detection.md)
1. [211210 Deep ViT Features as Dense Visual Descriptors](papers/2021/211210%20Deep%20ViT%20Features%20as%20Dense%20Visual%20Descriptors.md) #self_supervised #semantic_segmentation
1. [211217 A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation](papers/2021/211217%20A%20Simple%20Single-Scale%20Vision%20Transformer%20for%20Object%20Localization%20and%20Instance%20Segmentation.md) #multiscale
1. [220214 How Do Vision Transformers Work](papers/2022/220214%20How%20Do%20Vision%20Transformers%20Work.md) #cnn
1. [220414 DeiT III](papers/2022/220414%20DeiT%20III.md)
1. [220722 An Impartial Take to the CNN vs Transformer Robustness Contest](papers/2022/220722%20An%20Impartial%20Take%20to%20the%20CNN%20vs%20Transformer%20Robustness%20Contest.md) #robustness #cnn
1. [220812 BEiT v2](papers/2022/220812%20BEiT%20v2.md) #self_supervised #mlm
1. [221110 Demystify Transformers & Convolutions in Modern Image Deep Networks](papers/2022/221110%20Demystify%20Transformers%20%26%20Convolutions%20in%20Modern%20Image%20Deep%20Networks.md) #cnn
1. [230202 Dual PatchNorm](papers/2023/230202%20Dual%20PatchNorm.md) #normalization
1. [230712 Patch n' Pack](papers/2023/230712%20Patch%20n%27%20Pack.md)
## vocoder
1. [200512 FeatherWave](papers/2020/200512%20FeatherWave.md)
1. [201118 Universal MelGAN](papers/2020/201118%20Universal%20MelGAN.md)
## vq
1. [230311 Regularized Vector Quantization for Tokenized Image Synthesis](papers/2023/230311%20Regularized%20Vector%20Quantization%20for%20Tokenized%20Image%20Synthesis.md)
## vqa
1. [220914 MUST-VQA](papers/2022/220914%20MUST-VQA.md)
## weak supervision
1. [201126 SelfText Beyond Polygon](papers/2020/201126%20SelfText%20Beyond%20Polygon.md) #ocr
## yolo
1. [230113 YOLOv6 v3.0](papers/2023/230113%20YOLOv6%20v3.0.md)
## uncategorized
1. [09](papers/2016/09.md)
1. [200211 fastai](papers/2020/200211%20fastai.md)
1. [210224 Zero-Shot Text-to-Image Generation](papers/2021/210224%20Zero-Shot%20Text-to-Image%20Generation.md)
1. [210603 The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models](papers/2021/210603%20The%20Case%20for%20Translation-Invariant%20Self-Attention%20in%20Transformer-Based%20Language%20Models.md)
1. [210606 Referring Transformer](papers/2021/210606%20Referring%20Transformer.md)
1. [210607 ViTAE](papers/2021/210607%20ViTAE.md)
1. [210614 Non Gaussian Denoising Diffusion Models](papers/2021/210614%20Non%20Gaussian%20Denoising%20Diffusion%20Models.md)
1. [210909 PIMNet](papers/2021/210909%20PIMNet.md)
1. [211026 Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers](papers/2021/211026%20Combining%20Recurrent%2C%20Convolutional%2C%20and%20Continuous-time%20Models%20with%20Linear%20State-Space%20Layers.md)
1. [211028 Colossal-AI](papers/2021/211028%20Colossal-AI.md)
1. [211215 Value Retrieval with Arbitrary Queries for Form-like Documents](papers/2021/211215%20Value%20Retrieval%20with%20Arbitrary%20Queries%20for%20Form-like%20Documents.md)
1. [221125 Solving math word problems with process- and outcome-based feedback](papers/2021/221125%20Solving%20math%20word%20problems%20with%20process-%20and%20outcome-based%20feedback.md)
1. [221204 Languages You Know Influence Those You Learn](papers/2021/221204%20Languages%20You%20Know%20Influence%20Those%20You%20Learn.md)
1. [221215 Constitutional AI](papers/2021/221215%20Constitutional%20AI.md)
1. [220114 DeepSpeed-MoE](papers/2022/220114%20DeepSpeed-MoE.md)
1. [220203 AlphaCode, Formal Math](papers/2022/220203%20AlphaCode%2C%20Formal%20Math.md)
1. [220204 InstructGPT](papers/2022/220204%20InstructGPT.md)
1. [220316 Memorizing Transformers](papers/2022/220316%20Memorizing%20Transformers.md)
1. [220323 Pathways](papers/2022/220323%20Pathways.md)
1. [220329 Few Could Be Better Than All](papers/2022/220329%20Few%20Could%20Be%20Better%20Than%20All.md)
1. [220405 Text Spotting Transformers](papers/2022/220405%20Text%20Spotting%20Transformers.md)
1. [220416 Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks](papers/2022/220416%20Benchmarking%20Generalization%20via%20In-Context%20Instructions%20on%201%2C600%2B%20Language%20Tasks.md)
1. [220510 UL2](papers/2022/220510%20UL2.md)
1. [220610 A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction](papers/2022/220610%20A%20Multi-Task%20Benchmark%20for%20Korean%20Legal%20Language%20Understanding%20and%20Judgement%20Prediction.md)
1. [220612 Self-critiquing models for assisting human evaluators](papers/2022/220612%20Self-critiquing%20models%20for%20assisting%20human%20evaluators.md)
1. [220614 RDU](papers/2022/220614%20RDU.md)
1. [220630 DeepSpeed Inference](papers/2022/220630%20DeepSpeed%20Inference.md)
1. [220712 Inner Monologue](papers/2022/220712%20Inner%20Monologue.md)
1. [220720 NUWA-Infinity](papers/2022/220720%20NUWA-Infinity.md)
1. [220722 Multiface](papers/2022/220722%20Multiface.md)
1. [220725 CelebV-HQ](papers/2022/220725%20CelebV-HQ.md)
1. [220725 Neural Generation Meets Real People](papers/2022/220725%20Neural%20Generation%20Meets%20Real%20People.md)
1. [220725 Towards Complex Document Understanding By Discrete Reasoning](papers/2022/220725%20Towards%20Complex%20Document%20Understanding%20By%20Discrete%20Reasoning.md)
1. [220819 FP8 Quantization](papers/2022/220819%20FP8%20Quantization.md)
1. [220823 CLOWER](papers/2022/220823%20CLOWER.md)
1. [220912 FP8 Formats for Deep Learning](papers/2022/220912%20FP8%20Formats%20for%20Deep%20Learning.md)
1. [220923 Diffusion](papers/2022/220923%20Diffusion.md)
1. [220928 Improving alignment of dialogue agents via targeted human judgements](papers/2022/220928%20Improving%20alignment%20of%20dialogue%20agents%20via%20targeted%20human%20judgements.md)
1. [220928 The Change You Want to See](papers/2022/220928%20The%20Change%20You%20Want%20to%20See.md)
1. [221219 MatCha](papers/2022/221219%20MatCha.md)
1. [230203 Measuring The Impact Of Programming Language Distribution](papers/2023/230203%20Measuring%20The%20Impact%20Of%20Programming%20Language%20Distribution.md)
1. [230206 SmoothQuant](papers/2023/230206%20SmoothQuant.md)
1. [230207 Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages](papers/2023/230207%20Efficiently%20Upgrading%20Multilingual%20Machine%20Translation%20Models%20to%20Support%20More%20Languages.md)
1. [230207 FP8](papers/2023/230207%20FP8.md)
1. [230208 Google Configuration System](papers/2023/230208%20Google%20Configuration%20System.md)
1. [230209 Efficient Attention via Control Variates](papers/2023/230209%20Efficient%20Attention%20via%20Control%20Variates.md)
1. [230211 Generative AI에 대한 생각](papers/2023/230211%20Generative%20AI%EC%97%90%20%EB%8C%80%ED%95%9C%20%EC%83%9D%EA%B0%81.md)
1. [230213 Lossy Compression](papers/2023/230213%20Lossy%20Compression.md)
1. [230214 Adding Instructions during Pretraining](papers/2023/230214%20Adding%20Instructions%20during%20Pretraining.md)
1. [230214 Score-based Diffusion Models in Function Space](papers/2023/230214%20Score-based%20Diffusion%20Models%20in%20Function%20Space.md)
1. [230216 Aligning Language Models with Preferences through f-divergence Minimization](papers/2023/230216%20Aligning%20Language%20Models%20with%20Preferences%20through%20f-divergence%20Minimization.md)
1. [230220 DSP](papers/2023/230220%20DSP.md)
1. [230221 Anthropic](papers/2023/230221%20Anthropic.md)
1. [230222 AlpaServe](papers/2023/230222%20AlpaServe.md)
1. [230222 FlexGen](papers/2023/230222%20FlexGen.md)
1. [230223 Colossal AI ChatGPT](papers/2023/230223%20Colossal%20AI%20ChatGPT.md)
1. [230223 On the Generalization Ability of Retrieval-Enhanced Transformers](papers/2023/230223%20On%20the%20Generalization%20Ability%20of%20Retrieval-Enhanced%20Transformers.md)
1. [230224 World Models](papers/2023/230224%20World%20Models.md)
1. [230228 SHP](papers/2023/230228%20SHP.md)
1. [230306](papers/2023/230306.md)
1. [230311 Resurrecting Recurrent Neural Networks for Long Sequences](papers/2023/230311%20Resurrecting%20Recurrent%20Neural%20Networks%20for%20Long%20Sequences.md)
1. [230312 ChatGPT Asks, BLIP-2 Answers](papers/2023/230312%20ChatGPT%20Asks%2C%20BLIP-2%20Answers.md)
1. [230314 ViperGPT](papers/2023/230314%20ViperGPT.md)
1. [230315 GPT-4](papers/2023/230315%20GPT-4.md)
1. [230320 Reflexion](papers/2023/230320%20Reflexion.md)
1. [230323 The Quantization Model of Neural Scaling](papers/2023/230323%20The%20Quantization%20Model%20of%20Neural%20Scaling.md)
1. [230327 EVA-CLIP](papers/2023/230327%20EVA-CLIP.md)
1. [230327 unarXive 2022](papers/2023/230327%20unarXive%202022.md)
1. [230328 Improving Code Generation by Training with Natural Language Feedback](papers/2023/230328%20Improving%20Code%20Generation%20by%20Training%20with%20Natural%20Language%20Feedback.md)
1. [230331 Autoregressive Model](papers/2023/230331%20Autoregressive%20Model.md)
1. [230331 Choose Your Weapon](papers/2023/230331%20Choose%20Your%20Weapon.md)
1. [230406 Quantization](papers/2023/230406%20Quantization.md)
1. [230407 RLHF](papers/2023/230407%20RLHF.md)
1. [230414 OpenAssistant Conversations -- Democratizing Large Language Model Alignment](papers/2023/230414%20OpenAssistant%20Conversations%20--%20Democratizing%20Large%20Language%20Model%20Alignment.md)
1. [230416 Open Assistant](papers/2023/230416%20Open%20Assistant.md)
1. [230417 Tool Learning with Foundation Models](papers/2023/230417%20Tool%20Learning%20with%20Foundation%20Models.md)
1. [230418 HCI](papers/2023/230418%20HCI.md)
1. [230420 Stable LM](papers/2023/230420%20Stable%20LM.md)
1. [230428 Are Emergent Abilities of Large Language Models a Mirage](papers/2023/230428%20Are%20Emergent%20Abilities%20of%20Large%20Language%20Models%20a%20Mirage.md)
1. [230502 RedPajama](papers/2023/230502%20RedPajama.md)
1. [230504 ZipIt! Merging Models from Different Tasks without Training](papers/2023/230504%20ZipIt%21%20Merging%20Models%20from%20Different%20Tasks%20without%20Training.md)
1. [230511 An Inverse Scaling Law for CLIP Training](papers/2023/230511%20An%20Inverse%20Scaling%20Law%20for%20CLIP%20Training.md)
1. [230511 InstructBLIP](papers/2023/230511%20InstructBLIP.md)
1. [230511 Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers](papers/2023/230511%20Region-Aware%20Pretraining%20for%20Open-Vocabulary%20Object%20Detection%20with%20Vision%20Transformers.md)
1. [230511 Simple Token-Level Confidence Improves Caption Correctness](papers/2023/230511%20Simple%20Token-Level%20Confidence%20Improves%20Caption%20Correctness.md)
1. [230516 SpecInfer](papers/2023/230516%20SpecInfer.md)
1. [230518 Evidence of Meaning in Language Models Trained on Programs](papers/2023/230518%20Evidence%20of%20Meaning%20in%20Language%20Models%20Trained%20on%20Programs.md)
1. [230518 구글 달리기](papers/2023/230518%20%EA%B5%AC%EA%B8%80%20%EB%8B%AC%EB%A6%AC%EA%B8%B0.md)
1. [230522 Training Diffusion Models with Reinforcement Learning](papers/2023/230522%20Training%20Diffusion%20Models%20with%20Reinforcement%20Learning.md)
1. [230523 ZeroSCROLLS](papers/2023/230523%20ZeroSCROLLS.md)
1. [230524 Model evaluation for extreme risks](papers/2023/230524%20Model%20evaluation%20for%20extreme%20risks.md)
1. [230525 The False Promise of Imitating Proprietary LLMs](papers/2023/230525%20The%20False%20Promise%20of%20Imitating%20Proprietary%20LLMs.md)
1. [230527 Fine-Tuning Language Models with Just Forward Passes](papers/2023/230527%20Fine-Tuning%20Language%20Models%20with%20Just%20Forward%20Passes.md)
1. [230531 Let's Verify Step by Step](papers/2023/230531%20Let%27s%20Verify%20Step%20by%20Step.md)
1. [230601 Hiera](papers/2023/230601%20Hiera.md)
1. [230601 SnapFusion](papers/2023/230601%20SnapFusion.md)
1. [230602 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training](papers/2023/230602%20Fine-Grained%20Human%20Feedback%20Gives%20Better%20Rewards%20for%20Language%20Model%20Training.md)
1. [230608 SequenceMatch](papers/2023/230608%20SequenceMatch.md)
1. [230611 LAMM](papers/2023/230611%20LAMM.md)
1. [230616 Scaling Open-Vocabulary Object Detection](papers/2023/230616%20Scaling%20Open-Vocabulary%20Object%20Detection.md)
1. [230616 ZeRO++](papers/2023/230616%20ZeRO%2B%2B.md)
1. [230619 RepoFusion](papers/2023/230619%20RepoFusion.md)
1. [230621 Constant Memory Attention Block](papers/2023/230621%20Constant%20Memory%20Attention%20Block.md)
1. [230621 Limits for Learning with Language Models](papers/2023/230621%20Limits%20for%20Learning%20with%20Language%20Models.md)
1. [230626 Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression](papers/2023/230626%20Pretraining%20task%20diversity%20and%20the%20emergence%20of%20non-Bayesian%20in-context%20learning%20for%20regression.md)
1. [230626 Understanding In-Context Learning via Supportive Pretraining Data](papers/2023/230626%20Understanding%20In-Context%20Learning%20via%20Supportive%20Pretraining%20Data.md)
1. [230627 IDOL](papers/2023/230627%20IDOL.md)
1. [230629 An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training](papers/2023/230629%20An%20Efficient%20General-Purpose%20Modular%20Vision%20Model%20via%20Multi-Task%20Heterogeneous%20Training.md)
1. [230629 Generate Anything Anywhere in Any Scene](papers/2023/230629%20Generate%20Anything%20Anywhere%20in%20Any%20Scene.md)
1. [230629 Generative AI for Programming Education](papers/2023/230629%20Generative%20AI%20for%20Programming%20Education.md)
1. [230629 LLaVAR](papers/2023/230629%20LLaVAR.md)
1. [230701 Let Me Teach You](papers/2023/230701%20Let%20Me%20Teach%20You.md)
1. [230701 NTK Aware Scaled RoPE](papers/2023/230701%20NTK%20Aware%20Scaled%20RoPE.md)
1. [230706 Superalignment](papers/2023/230706%20Superalignment.md)
1. [230706 Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts](papers/2023/230706%20Training%20Models%20to%20Generate%2C%20Recognize%2C%20and%20Reframe%20Unhelpful%20Thoughts.md)
1. [230708 DDPO](papers/2023/230708%20DDPO.md)
1. [230710 About Anthropic](papers/2023/230710%20About%20Anthropic.md)
1. [230710 BeaverTails](papers/2023/230710%20BeaverTails.md)
1. [230710 FreeDrag](papers/2023/230710%20FreeDrag.md)
1. [230710 Large Language Models as General Pattern Machines](papers/2023/230710%20Large%20Language%20Models%20as%20General%20Pattern%20Machines.md)
1. [230710 VampNet](papers/2023/230710%20VampNet.md)
1. [230711 GPT-4 FLOPS](papers/2023/230711%20GPT-4%20FLOPS.md)
1. [230711 Objaverse-XL](papers/2023/230711%20Objaverse-XL.md)
1. [230711 Self-consistency for open-ended generations](papers/2023/230711%20Self-consistency%20for%20open-ended%20generations.md)
1. [230712 Claude 2](papers/2023/230712%20Claude%202.md)
1. [230712 Instruction Mining](papers/2023/230712%20Instruction%20Mining.md)
1. [230713 Hassabis](papers/2023/230713%20Hassabis.md)
1. [230714 Code Interpreter](papers/2023/230714%20Code%20Interpreter.md)
1. [230718 Flash Attention 2](papers/2023/230718%20Flash%20Attention%202.md)
1. [230718 How is ChatGPT's behavior changing over time](papers/2023/230718%20How%20is%20ChatGPT%27s%20behavior%20changing%20over%20time.md)
1. [230719 Llama 2](papers/2023/230719%20Llama%202.md)
1. [230724 RLCD](papers/2023/230724%20RLCD.md)
1. [230725 Retentive Network](papers/2023/230725%20Retentive%20Network.md)
1. [230728 Exploring Format Consistency for Instruction Tuning](papers/2023/230728%20Exploring%20Format%20Consistency%20for%20Instruction%20Tuning.md)
1. [230728 The Hydra Effect](papers/2023/230728%20The%20Hydra%20Effect.md)
1. [230729 Configuration System](papers/2023/230729%20Configuration%20System.md)
1. [230803 H100 Supply and Demand](papers/2023/230803%20H100%20Supply%20and%20Demand.md)
1. [230803 Multimodal Neurons in Pretrained Text-Only Transformers](papers/2023/230803%20Multimodal%20Neurons%20in%20Pretrained%20Text-Only%20Transformers.md)
1. [230804 Retroformer](papers/2023/230804%20Retroformer.md)
1. [230807 Intelligent Assistant Language Understanding On Device](papers/2023/230807%20Intelligent%20Assistant%20Language%20Understanding%20On%20Device.md)
1. [230808 Gentopia](papers/2023/230808%20Gentopia.md)
1. [230809 StableCode](papers/2023/230809%20StableCode.md)
1. [230810 ReRoPE](papers/2023/230810%20ReRoPE.md)
1. [210714 Deduplicating Training Data Makes Language Models Better](papers/arXiv/210714%20Deduplicating%20Training%20Data%20Makes%20Language%20Models%20Better.md)
1. [211122 ExT5](papers/arXiv/211122%20ExT5.md)
1. [230523 Aligning Large Language Models through Synthetic Feedback](papers/arXiv/230523%20Aligning%20Large%20Language%20Models%20through%20Synthetic%20Feedback.md)