Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
ATPapers
Worth-reading papers and related resources on attention mechanism, Transformer and pretrained language model (PLM) such as BERT. 值得一读的注意力机制、Transformer和预训练语言模型论文与相关资源集合
https://github.com/ZhengZixiang/ATPapers
Last synced: 2 days ago
JSON representation
-
Attention
-
English Blog
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
- Illustrated: Self-Attention
-
Repositories
- thushv89 / Keras Attention Layer - Keras Layer implementation of Attention
-
Papers
- [paper - ***Global & Local Attention***
- [paper - ***YELP-HAT***
- [paper - ***Hard & Soft Attention***
- [paper - ***Global & Local Attention***
- [paper
- [paper - nonlocal-net)
- [paper
- [paper
- [paper - ***Bi-BloSAN***
- [paper - attention)
- [paper - pytorch)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - analysis-of-transformer)
- [paper
-
Chinese Blog
-
Survey & Review
-
-
Transformer
-
Repositories
- DongjunLee / transformer-tensorflow - Transformer Tensorflow implementation
- andreamad8 / Universal-Transformer-Pytorch - Universal Transformer PyTorch implementation
- lucidrains / Linear Attention Transformer - Transformer based on a variant of attention that is linear complexity in respect to sequence length
- sannykim / transformers - A collection of resources to study Transformers in depth
- PapersWithCode / Attention
-
Chinese Blog
- 科学空间 / 突破瓶颈,打造更强大的Transformer
- 科学空间 / Transformer升级之路:1、Sinusoidal位置编码追根溯源
- 科学空间 / Transformer升级之路:2、博采众长的旋转式位置编码
- 量子位 / 最新Transformer模型大盘点,NLP学习必备,Google AI研究员出品
- 美团 / Transformer 在美团搜索排序中的实践
- 徐啸 / 浅谈 Transformer-based 模型中的位置表示
- 张俊林 / 放弃幻想,全面拥抱Transformer:自然语言处理三大特征抽取器(CNN/RNN/TF)比较
- 夕小瑶的卖萌屋 / Transformer哪家强?Google爸爸辨优良!
- Lilian / Transformer的一家!
- Kaiyuan Gao / Transformers Assemble(PART I)
-
Papers
- [paper - pytorch)
- [paper
- [paper - han-lab/lite-transformer)
- [paper
- [paper
- [paper - Transformer)
- [paper - research/google-research/tree/master/routing_transformer)
- [paper - han-lab/hardware-aware-transformers)
- [paper
- [paper
- [paper
- [paper
- [paper - transformers)[[project]](https://linear-transformers.com/)
- [paper - Transformer)
- [paper - transformers)
- [paper
- [paper - attention)
- [paper
- [paper
- [paper - research/google-research/tree/master/performer/fast_self_attention)[[pytorch version]](https://github.com/lucidrains/performer-pytorch)[[blog]](https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html)
- [paper - research/long-range-arena)
- [paper - nmt)
- [paper
- [paper
- [paper
- [paper
- [paper - ***MAN***
- [paper - is-all-you-need-pytorch) - ***Transformer***
- [paper
- [paper - aan) - ***AAN***
- [paper - MT/THUMT/blob/d4cb62c215d846093e5357aa17b286506b2df1af/thumt/layers/attention.py)
- [paper - Transformer-Pytorch)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - Sparse-Transformer)
- [paper - transformer-pytorch)
-
English Blog
- Google / Moving Beyond Translation with the Universal Transformer
- Google / Constructing Transformers For Longer Sequences with Sparse Attention Methods
- Havard NLP / The Annotated Transformer - transformer)
- Hugging Face / Hugging Face Reads, Feb. 2021 - Long-range Transformers
- Jay Alammar / The Illustrated Transformer
- Madison May / A Survey of Long-Term Context in Transformers
- Mohd / How Self-Attention with Relative Position Representations works
-
-
Pretrained Language Model
-
Models
- [paper - ***SpanBERT***
- [paper - ***XLNet***
- [paper - ***RoBERTa***
- [paper - ***ULMFit***
- [paper - dnn) - ***MT-DNN***
- [paper - ***GPT***
- [paper - 2) - ***GPT-2***
- [paper - ***MASS***
- [paper - ***UNILM***
- [paper - dnn) - ***MT-DNN***
- [paper - ***UDify***
- [paper - ***Grover***
- [paper - ***ERNIE 2.0 (Baidu)***
- [paper - ***Chinese-BERT-wwm***
- [paper - ***SpanBERT***
- [paper - ***XLNet***
- [paper - ***RoBERTa***
- [paper - noah/Pretrained-Language-Model) - ***NEZHA***
- [paper - LM) - ***Megatron-LM***
- [paper - research/text-to-text-transfer-transformer) - ***T5***
- [paper - ***BART***
- [paper - ***ZEN***
- [paper - aig/nlp_baai) - ***BAAI-JDAI-BERT***
- [paper - py) - ***UER***
- [paper - ***ELECTRA***
- [paper - ***StructBERT***
- [paper - ***FreeLB***
- [paper - ***HUBERT***
- [paper - ***ProphetNet***
- [paper - gen) - ***ERNIE-GEN***
- [paper - ***StackingBERT***
- [paper - BERT)
- [paper - ***Meena***
- [paper - ***UNILMv2***
- [paper - ***Optimus***
- [paper
- [paper - ***MPNet***
- [paper - 3) - ***GPT-3***
- [paper - ***SPECTER***
- [paper - of-the-art-open-source-chatbot/)[[code]](https://github.com/facebookresearch/ParlAI) - ***Blender***
- [paper - -kHuAI1V8oLRQ) - ***MacBERT***
- [paper - 2) - ***PLATO-2***
- [paper - ***DeBERTa***
- [paper - opensource/ConvBert)
- [paper
- [paper - uxn38aFvjPNiwWGw)
- [paper
- [paper
- [paper - ***GLM***
-
Application
-
Repository
- bojone / bert4keras - bojone's (苏神) BERT Keras implementation
- brightmart / albert_zh - 海量中文预训练ALBERT模型
- brightmart / roberta_zh - RoBERTa中文预训练模型
- CyberZHG / keras-bert - CyberZHG's BERT Keras implementation
- Ethan-yt / GuwenBERT - A Pre-trained Language Model for Classical Chinese (Literary Chinese)
- graykode / gpt-2-Pytorch - Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation
- heartcored98 / Transformer_Anatomy - Toolkit for finding and analyzing important attention heads in transformer-based models
- Hironsan / bertsearch - Elasticsearch with BERT for advanced document search
- CLUEbenchmark / CLUE - Chinese Language Understanding Evaluation Benchmark
- jessevig / bertviz - BERT Visualization Tool
- Jiakui / awesome-bert - Collect BERT related resources
- legacyai / tf-transformers - State of the art faster Natural Language Processing in Tensorflow 2.0
- Morizeyao / GPT2-Chinese - Chinese version of GPT2 training code, using BERT tokenizer
- Separius / BERT-keras - Separius' BERT Keras implementation
- policeme / roberta-wwm-base-distill - A chinese Roberta wwm distillation model which was distilled from roberta-ext-wwm-large
- terrifyzhao / bert-utils - One line generate BERT's sent2vec for classification or matching task
- Tencent / TurboTransformers - A fast and user-friendly runtime for transformer inference on CPU and GPU
- THUNLP / OpenCLaP - Open Chinese Language Pre-trained Model Zoo
- THUNLP / PLMpapers - Must-read Papers on pre-trained language models.
- THUNLP-AIPoet / BERT-CCPoem - A BERT-based pre-trained model particularly for Chinese classical poetry
- tomohideshibata / BERT-related-papers - This is a list of BERT-related papers.
- TsinghuaAI / CPM-Generate - Chinese Pre-Trained Language Models (CPM-LM) Version-I
- valuesimplex / FinBERT
- ymcui / Chinese-XLNet - Pre-Trained Chinese XLNet(中文XLNet预训练模型)
- ZhuiyiTechnology / pretrained-models - Open Language Pre-trained Model Zoo
- ZhuiyiTechnology / SimBERT - A bert for retrieval and generation
- ZhuiyiTechnology / WoBERT
- ZhuiyiTechnology / t5-pegasus - 中文生成式预训练模型
- bojone / albert_zh - 转换brightmart版的albert权重到Google版格式
- CLUEbenchmark / CLUEPretrainedModels - 高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
- ymcui / Chinese-ELECTRA - Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)
- hanxiao / bert-as-service - Using BERT model as a sentence encoding service
-
English Blog
- Keyur Faldu and Dr. Amit Sheth / Linguistics Wisdom of NLP Models: Analyzing, Designing, and Evaluating Linguistic Probes
- The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
- Keyur Faldu and Dr. Amit Sheth / Linguistics Wisdom of NLP Models: Analyzing, Designing, and Evaluating Linguistic Probes
- Keyur Faldu and Dr. Amit Sheth / Linguistics Wisdom of NLP Models: Analyzing, Designing, and Evaluating Linguistic Probes
- Keyur Faldu and Dr. Amit Sheth / Linguistics Wisdom of NLP Models: Analyzing, Designing, and Evaluating Linguistic Probes
- Keyur Faldu and Dr. Amit Sheth / Linguistics Wisdom of NLP Models: Analyzing, Designing, and Evaluating Linguistic Probes
- Keyur Faldu and Dr. Amit Sheth / Linguistics Wisdom of NLP Models: Analyzing, Designing, and Evaluating Linguistic Probes
- Keyur Faldu and Dr. Amit Sheth / Linguistics Wisdom of NLP Models: Analyzing, Designing, and Evaluating Linguistic Probes
- A Fair Comparison Study of XLNet and BERT with Large Models
- All The Ways You Can Compress BERT
- Keyur Faldu and Dr. Amit Sheth / Linguistics Wisdom of NLP Models: Analyzing, Designing, and Evaluating Linguistic Probes
- The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
-
Multi-Modal
-
Multilingual
-
Knowledge
-
Compression & Accelerating
- [paper
- [paper - ***MKDM***
- [paper
- [paper
- [paper
- [paper
- [paper - ***BERT-PKD***
- [paper
- [paper
- [paper
- [paper - research/ALBERT)
- [paper
- [paper - ***LayerDrop***
- [paper
- [paper
- [paper - ***BERT-PKD***
- [paper
- [paper - ***AdaBERT***
- [paper - of-Theseus)[[tf code]](https://github.com/qiufengyuyi/bert-of-theseus-tf)[[keras code]](https://github.com/bojone/bert-of-theseus)
- [paper
- [paper
- [paper - research/google-research/tree/master/mobilebert)
- [paper - ***BiLSTM-SRA & LTD-BERT***
- [paper
- [paper
- [paper
- [paper - ***Bort***
- [paper
- [paper - EMD)[[blog]](https://mp.weixin.qq.com/s/w1sT126jS_lZ_Q3cRi6fGQ)
- [paper
- [paper - RdGEtwxUdigNeEGJM987Q)
-
Analysis & Tools
- [paper
- [paper - dnn/tree/master/alum)
- [paper - study)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - analysis)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - ***TextFooler***
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - ***RIPPLe***
- [paper - ***Transformer Anatomy***
- [paper
- [paper
- [paper - stop-pretraining)
- [paper
- [paper - Masking)[[keras code]](https://github.com/bojone/perturbed_masking)
- [paper - ***TUPE***
- [paper
- [paper
- [paper - ***PET***
- [paper
- [paper
- [paper
- [paper - Yt91Sg)
- [paper
- [paper - Semi-Supervised-Learning-for-Text-Classification)[[blog]](https://mp.weixin.qq.com/s/t7a_1cf1EFuoTYnm2gAYSw)
- [paper - Chen/RecAdam)[[blog]](https://mp.weixin.qq.com/s/M89mqFxa7_iK3lzlEgLzAQ)
- [paper
- [paper - shot-lm-learning)
- [paper - flow)
- [paper
- [paper
-
Chinese Blog
- 阿里 / BERT蒸馏在垃圾舆情识别中的探索
- 科学空间 / 从语言模型到Seq2Seq:Transformer如戏,全靠Mask
- RUC AI Box / NeurIPS 2020 之预训练语言模型压缩
- 张俊林 / 乘风破浪的PTM:两年来预训练模型的技术进展
- 张正 / 从基础到前沿看迁移学习在NLP中的演化
- Andy Yang / BERT 瘦身之路:Distillation,Quantization,Pruning
- RUC AI Box / BERT meet Knowledge Graph:预训练模型与知识图谱相结合的研究进展
- Microsoft / 8篇论文梳理BERT相关模型进展与反思
- NLP有品 / 关于BERT:你不知道的事
- 知乎问答 / BERT为何使用学习的position embedding而非正弦position encoding?
- 科学空间 / 抛开约束,增强模型:一行代码提升albert表现
- 科学空间 / BERT-of-Theseus:基于模块替换的模型压缩方法
- 科学空间 / 提速不掉点:基于词颗粒度的中文WoBERT
- 科学空间 / 用ALBERT和ELECTRA之前,请确认你真的了解它们
- 李如 / ELECTRA: 超越BERT, 19年最佳NLP预训练模型
- 李如 / 谈谈我对ELECTRA源码放出的看法
- 李如 / BERT蒸馏完全指南|原理/技巧/代码
- 李如 / BERT-flow|CMUx字节推出简单易用的文本表示新SOTA!
- 刘群 / 预训练语言模型研究进展和趋势展望
- 美团 / 结合业务场景案例实践分析,倾囊相授美团BERT的探索经验
- 科学空间 / 必须要GPT3吗?不,BERT的MLM模型也能小样本学习
- sliderSun / Transformer和Bert相关知识解答
- Tobias Lee / BERT 的演进和应用
- 张俊林 / 效果惊人的GPT 2.0模型:它告诉了我们什么
- 张俊林 / XLNet:运行机制及和Bert的异同比较
- 李理 / XLNet原理
- Tobias Lee / BERT 的演进和应用
- 腾讯 / 内存用量1/20,速度加快80倍,腾讯QQ提出全新BERT蒸馏框架,未来将开源
- 小米 / BERT适应业务遇难题?这是小米NLP的实战探索
- 邱震宇 / 模型压缩实践收尾篇——模型蒸馏以及其他一些技巧实践小结
- 夕小瑶的卖萌屋 / 超一流 | 从XLNet的多流机制看最新预训练模型的研究进展
- 夕小瑶的卖萌屋 / 如何优雅地编码文本中的位置信息?三种positioanl encoding方法简述
- 夕小瑶的卖萌屋 / 软硬兼施极限轻量BERT!能比ALBERT再轻13倍?!
- 许维 / 深度长文:NLP的巨人肩膀(上)
- 许维 / 深度长文:NLP的巨人肩膀(下)
- 张俊林 / 从Word Embedding到BERT模型—自然语言处理中的预训练技术发展史
- 张俊林 / BERT时代的创新(应用篇):Bert在NLP各领域的应用进展
- 张俊林 / BERT时代的创新:BERT应用模式比较及其它
- 张俊林 / BERT和Transformer到底学到了什么 | AI ProCon 2019
-
Tutorial & Survey
-
Programming Languages
Sub Categories
Keywords
bert
15
nlp
10
pytorch
10
tensorflow
9
roberta
7
transformer
7
natural-language-processing
6
gpt2
5
chinese
5
pretrained-models
4
language-model
4
albert
4
keras
4
deep-learning
4
transformers
3
xlnet
3
corpus
2
pre-trained
2
pre-trained-model
2
dataset
2
distillation
2
machine-learning
2
gpt-2
2
text-generation
2
text-classification
2
attention-head
1
interpretability
1
text-generator
1
story-telling
1
interpretable-deep-learning
1
implementation
1
transformer-encoder
1
literary-chinese
1
guwenbert
1
classical-chinese
1
lstm
1
pre-trained-language-models
1
rnn
1
attention
1
experiments
1
chinese-corpus
1
hb-experiment
1
translation
1
attention-mechanism
1
artificial-intelligence
1
universal-transformer
1
poetry
1
pretrain
1
semantic-similarity
1
sentence-analysis
1