Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ailln/nlp-roadmap
๐บ๏ธ ไธไธช่ช็ถ่ฏญ่จๅค็็ๅญฆไน ่ทฏ็บฟๅพ
https://github.com/ailln/nlp-roadmap
natural-language-processing nlp roadmap sequence-labeling word-embedding word-segmentation
Last synced: about 7 hours ago
JSON representation
๐บ๏ธ ไธไธช่ช็ถ่ฏญ่จๅค็็ๅญฆไน ่ทฏ็บฟๅพ
- Host: GitHub
- URL: https://github.com/ailln/nlp-roadmap
- Owner: Ailln
- License: mit
- Created: 2019-04-17T15:41:16.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-01-13T10:41:21.000Z (almost 3 years ago)
- Last Synced: 2023-03-03T17:17:26.360Z (over 1 year ago)
- Topics: natural-language-processing, nlp, roadmap, sequence-labeling, word-embedding, word-segmentation
- Homepage:
- Size: 134 KB
- Stars: 45
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Natural Language Processing Roadmap
๐บ๏ธ ไธไธชใ่ช็ถ่ฏญ่จๅค็ใ็**ๅญฆไน ่ทฏ็บฟๅพ**ใ
> โ ๏ธ ๆณจๆ:
>
> 1. ่ฟไธช้กน็ฎๅ ๅซไธไธชๅไธบ `PCB` ็ๅฐๅฎ้ช๏ผ่ฟไธช็ PCB ไธๆฏๅฐๅท็ต่ทฏๆฟ `Printed Circuit Board`๏ผไนไธๆฏ่ฟ็จๆงๅถๅ `Process Control Block`๏ผ่ๆฏ `Paper Code Blog` ็็ผฉๅใๆ่ฎคไธบ `่ฎบๆ`ใ`ไปฃ็ ` ๅ `ๅๅฎข` ่ฟไธไธชไธ่ฅฟ๏ผๅฏไปฅ่ฎฉๆไปฌๅ ผ้กพ็่ฎบๅๅฎ่ทตๅๆถ๏ผๅฟซ้ๅฐๆๆก็ฅ่ฏ็น๏ผ
>
> 2. ๆฏ็ฏ่ฎบๆๅ้ข็ๆๆไธชๆฐไปฃ่กจ่ฎบๆ็้่ฆๆง๏ผ*ไธป่งๆ่ง๏ผไป ไพๅ่*๏ผใ
> 1. ๐: ไธ่ฌ๏ผ
> 2. ๐๐: ้่ฆ๏ผ
> 3. ๐๐๐: ้ๅธธ้่ฆใ## 1 ๅ่ฏ `Word Segmentation`
**่ฏๆฏ่ฝๅค็ฌ็ซๆดปๅจ็ๆๅฐ่ฏญ่จๅไฝใ** ๅจ่ช็ถ่ฏญ่จๅค็ไธญ๏ผ้ๅธธ้ฝๆฏไปฅ่ฏไฝไธบๅบๆฌๅไฝ่ฟ่กๅค็็ใ็ฑไบ่ฑๆๆฌ่บซๅ ทๆๅคฉ็็ไผๅฟ๏ผไปฅ็ฉบๆ ผๅๅๆๆ่ฏใ่ไธญๆ็่ฏไธ่ฏไน้ดๆฒกๆๆๆพ็ๅๅฒๆ ่ฎฐ๏ผๆไปฅๅจๅไธญๆ่ฏญ่จๅค็ๅ็้ฆ่ฆไปปๅก๏ผๅฐฑๆฏๆ่ฟ็ปญไธญๆๅฅๅญๅๅฒๆใ่ฏๅบๅใใ่ฟไธชๅๅฒ็่ฟ็จๅฐฑๅซ**ๅ่ฏ**ใ[ไบ่งฃๆดๅค](https://www.v2ai.cn/2018/04/26/nature-language-processing/2-word-segmentation/)
### ็ปผ่ฟฐ
- ๆฑ่ฏญๅ่ฏๆๆฏ็ปผ่ฟฐ [{Paper}](http://www.lis.ac.cn/CN/article/downloadArticleFile.do?attachType=PDF&id=9402) ๐
- ๅฝๅ ไธญๆ่ชๅจๅ่ฏๆๆฏ็ ็ฉถ็ปผ่ฟฐ [{Paper}](http://www.lis.ac.cn/CN/article/downloadArticleFile.do?attachType=PDF&id=11361) ๐
- ๆฑ่ฏญ่ชๅจๅ่ฏ็็ ็ฉถ็ฐ็ถไธๅฐ้พ [{Paper}](http://sourcedb.ict.cas.cn/cn/ictthesis/200907/P020090722605434114544.pdf) ๐๐
- ๆฑ่ฏญ่ชๅจๅ่ฏ็ ็ฉถ่ฏ่ฟฐ [{Paper}](http://59.108.48.5/course/mining/12-13spring/%E5%8F%82%E8%80%83%E6%96%87%E7%8C%AE/02-01%E6%B1%89%E8%AF%AD%E8%87%AA%E5%8A%A8%E5%88%86%E8%AF%8D%E7%A0%94%E7%A9%B6%E8%AF%84%E8%BF%B0.pdf) ๐๐
- ไธญๆๅ่ฏๅๅนดๅๅ้กพ: 2007-2017 [{Paper}](https://arxiv.org/pdf/1901.06079.pdf) ๐๐๐
- chinese-word-segmentation [{Code}](https://github.com/Ailln/chinese-word-segmentation)
- ๆทฑๅบฆๅญฆไน ไธญๆๅ่ฏ่ฐ็ [{Blog}](http://www.hankcs.com/nlp/segment/depth-learning-chinese-word-segmentation-survey.html)## 2 ่ฏๅตๅ ฅ `Word Embedding`
**่ฏๅตๅ ฅ**ๅฐฑๆฏๆพๅฐไธไธชๆ ๅฐๆ่ ๅฝๆฐ๏ผ็ๆๅจไธไธชๆฐ็็ฉบ้ดไธ็่กจ็คบ๏ผ่ฏฅ่กจ็คบ่ขซ็งฐไธบใๅ่ฏ่กจ็คบใใ[ไบ่งฃๆดๅค](https://www.v2ai.cn/2018/08/27/nature-language-processing/6-word-embedding/)
### ็ปผ่ฟฐ
- Word Embeddings: A Survey [{Paper}](https://arxiv.org/pdf/1901.09069.pdf) ๐๐๐
- Visualizing Attention in Transformer-Based Language Representation Models [{Paper}](https://arxiv.org/pdf/1904.02679.pdf) ๐๐
- **PTMs**: Pre-trained Models for Natural Language Processing: A Survey [{Paper}](https://arxiv.org/pdf/2003.08271.pdf) [{Blog}](https://zhuanlan.zhihu.com/p/115014536) ๐๐๐
- Efficient Transformers: A Survey [{Paper}](https://arxiv.org/pdf/2009.06732.pdf) ๐๐
- A Survey of Transformers [{Paper}](https://arxiv.org/pdf/2106.04554.pdf) ๐๐
- Pre-Trained Models: Past, Present and Future [{Paper}](https://arxiv.org/pdf/2106.07139.pdf) ๐๐
- Pretrained Language Models for Text Generation: A Survey [{Paper}](https://arxiv.org/pdf/2105.10311.pdf) ๐
- A Practical Survey on Faster and Lighter Transformers [{Paper}](https://arxiv.org/pdf/2103.14636.pdf) ๐
- The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures [{Paper}](https://arxiv.org/pdf/2104.10640.pdf) ๐๐### ๆ ธๅฟ
- **NNLM**: A Neural Probabilistic Language Model [{Paper}](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) [{Code}](https://github.com/FuYanzhe2/NNLM) [{Blog}](https://zhuanlan.zhihu.com/p/21240807) ๐
- **W2V**: Efficient Estimation of Word Representations in Vector Space [{Paper}](https://arxiv.org/abs/1301.3781) ๐๐
- **Glove**: Global Vectors for Word Representation [{Paper}](https://nlp.stanford.edu/pubs/glove.pdf) ๐๐
- **CharCNN**: Character-level Convolutional Networks for Text Classification [{Paper}](https://arxiv.org/pdf/1509.01626.pdf) [{Blog}](https://zhuanlan.zhihu.com/p/51698513) ๐
- **ULMFiT**: Universal Language Model Fine-tuning for Text Classification [{Paper}](https://arxiv.org/pdf/1801.06146.pdf) ๐
- **SiATL**: An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models [{Paper}](https://www.aclweb.org/anthology/N19-1213.pdf) ๐
- **FastText**: Bag of Tricks for Efficient Text Classification [{Paper}](https://arxiv.org/pdf/1607.01759.pdf) ๐๐
- **CoVe**: Learned in Translation: Contextualized Word Vectors [{Paper}](https://arxiv.org/pdf/1708.00107.pdf) ๐
- **ELMo**: Deep contextualized word representations [{Paper}](https://arxiv.org/pdf/1802.05365.pdf) ๐๐
- **Transformer**: Attention is All you Need [{Paper}](https://arxiv.org/pdf/1706.03762.pdf) [{Code}](https://github.com/tensorflow/tensor2tensor) [{Blog}](http://jalammar.github.io/illustrated-transformer/) ๐๐๐
- **GPT**: Improving Language Understanding by Generative Pre-Training [{Paper}](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) ๐
- **GPT2**: Language Models are Unsupervised Multitask Learners [{Paper}](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf) [{Code}](https://github.com/openai/gpt-2) [{Blog}](https://openai.com/blog/better-language-models/) ๐๐
- **GPT3**: Language Models are Few-Shot Learners [{Paper}](https://arxiv.org/pdf/2005.14165.pdf) [{Code}](https://github.com/openai/gpt-3) ๐๐๐
- **GPT4**: GPT-4 Technical Report [{Paper}](https://arxiv.org/pdf/2303.08774.pdf) ๐๐๐
- **BERT**: Pre-training of Deep Bidirectional Transformers for Language Understanding [{Paper}](https://arxiv.org/pdf/1810.04805.pdf) [{Code}](https://github.com/google-research/bert) [{Blog}](https://zhuanlan.zhihu.com/p/49271699) ๐๐๐
- **UniLM**: Unified Language Model Pre-training for Natural Language Understanding and Generation [{Paper}](https://arxiv.org/pdf/1905.03197.pdf) [{Code}](https://github.com/microsoft/unilm) [{Blog}](https://zhuanlan.zhihu.com/p/68327602) ๐๐
- **T5**: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [{Paper}](https://arxiv.org/pdf/1910.10683.pdf) [{Code}](https://github.com/google-research/text-to-text-transfer-transformer) [{Blog}](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) ๐
- **ERNIE**(Baidu): Enhanced Representation through Knowledge Integration [{Paper}](https://arxiv.org/pdf/1904.09223.pdf) [{Code}](https://github.com/PaddlePaddle/ERNIE) ๐
- **ERNIE**(Tsinghua): Enhanced Language Representation with Informative Entities [{Paper}](https://arxiv.org/pdf/1905.07129.pdf) [{Code}](https://github.com/thunlp/ERNIE) ๐
- **RoBERTa**: A Robustly Optimized BERT Pretraining Approach [{Paper}](https://arxiv.org/pdf/1907.11692.pdf) ๐
- **ALBERT**: A Lite BERT for Self-supervised Learning of Language Representations [{Paper}](https://arxiv.org/pdf/1909.11942.pdf) [{Code}](https://github.com/google-research/ALBERT) ๐๐
- **TinyBERT**: Distilling BERT for Natural Language Understanding [{Paper}](https://arxiv.org/pdf/1909.10351.pdf) ๐๐
- **FastFormers**: Highly Efficient Transformer Models for Natural Language Understanding [{Paper}](https://arxiv.org/pdf/2010.13382.pdf) [{Code}](https://github.com/microsoft/fastformers) ๐๐### ๅ ถไป
- word2vec Parameter Learning Explained [{Paper}](https://arxiv.org/pdf/1411.2738.pdf) ๐๐
- Semi-supervised Sequence Learning [{Paper}](https://arxiv.org/pdf/1511.01432.pdf) ๐๐
- BERT Rediscovers the Classical NLP Pipeline [{Paper}](https://arxiv.org/pdf/1905.05950.pdf) ๐
- Pre-trained Languge Model Papers [{Blog}](https://github.com/thunlp/PLMpapers)
- HuggingFace Transformers [{Code}](https://github.com/huggingface/transformers)
- Fudan FastNLP [{Code}](https://github.com/fastnlp/fastNLP)## 3 ๆๆฌๅ็ฑป `Text Classification`
### ็ปผ่ฟฐ
- A Survey on Text Classification: From Shallow to Deep Learning [{Paper}](https://arxiv.org/pdf/2008.00364.pdf) ๐๐๐
- Deep Learning Based Text Classification: A Comprehensive Review [{Paper}](https://arxiv.org/pdf/2004.03705.pdf) ๐๐### CNN
- **TextCNN**:Convolutional Neural Networks for Sentence Classification [{Paper}](https://arxiv.org/pdf/1408.5882.pdf) [{Code}](https://github.com/dennybritz/cnn-text-classification-tf) ๐๐๐
- Convolutional Neural Networks for Text Categorization: Shallow Word-level vs. Deep Character-level [{Paper}](https://arxiv.org/pdf/1609.00718.pdf) ๐
- **DPCNN**: Deep Pyramid Convolutional Neural Networks for Text Categorization [{Paper}](https://www.aclweb.org/anthology/P17-1052.pdf) [{Code}](https://github.com/Cheneng/DPCNN) ๐๐## 4 ๅบๅๆ ๆณจ `Sequence Labeling`
### ็ปผ่ฟฐ
- Sequence Labeling ็ๅๅฑๅฒ๏ผDNNs+CRF๏ผ[{Blog}](https://zhuanlan.zhihu.com/p/34828874)
### Bi-LSTM + CRF
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF [{Paper}](https://www.aclweb.org/anthology/P16-1101) ๐๐
- pytorch_NER_BiLSTM_CNN_CRF [{Code}](https://github.com/bamtercelboo/pytorch_NER_BiLSTM_CNN_CRF)
- NN_NER_tensorFlow [{Code}](https://github.com/LopezGG/NN_NER_tensorFlow)
- End-to-end-Sequence-Labeling-via-Bi-directional-LSTM-CNNs-CRF-Tutorial [{Code}](https://github.com/jayavardhanr/End-to-end-Sequence-Labeling-via-Bi-directional-LSTM-CNNs-CRF-Tutorial)
- Bi-directional LSTM-CNNs-CRF [{Code}](https://zhuanlan.zhihu.com/p/30791481)### ๅ ถไป
- Sequence to Sequence Learning with Neural Networks [{Paper}](https://proceedings.neurips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf) ๐
- Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks [{Paper}](https://arxiv.org/pdf/1506.03099.pdf) ๐## 5 ๅฏน่ฏ็ณป็ป `Dialogue Systems`
### ็ปผ่ฟฐ
- A Survey on Dialogue Systems: Recent Advances and New Frontiers [{Paper}](https://arxiv.org/pdf/1711.01731v1.pdf) [{Blog}](https://zhuanlan.zhihu.com/p/45210996) ๐๐
- ๅฐๅฅๅฅ๏ผๆฃ็ดขๅผchatbotไบ่งฃไธไธ๏ผ [{Blog}](https://mp.weixin.qq.com/s/yC8uYwti9Meyt83xkmbmcg) ๐๐๐
- Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey [{Paper}](https://arxiv.org/pdf/2011.00564.pdf) ๐๐### Open Domain Dialogue Systems
- **HERD**: Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models [{Paper}](https://arxiv.org/pdf/1507.04808v3.pdf) [{Code}](https://github.com/hsgodhia/hred) ๐๐
- Adversarial Learning for Neural Dialogue Generation [{Paper}](https://arxiv.org/pdf/1701.06547.pdf) [{Code}](https://github.com/liuyuemaicha/Adversarial-Learning-for-Neural-Dialogue-Generation-in-Tensorflow) [{Blog}](https://blog.csdn.net/liuyuemaicha/article/details/60581187) ๐๐### Task Oriented Dialogue Systems
- **Joint NLU**: Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling [{Paper}](https://arxiv.org/pdf/1609.01454.pdf) [{Code}](https://github.com/Ailln/chatbot) ๐๐
- BERT for Joint Intent Classification and Slot Filling [{Paper}](https://arxiv.org/pdf/1902.10909.pdf) ๐
- Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures [{Paper}](https://www.aclweb.org/anthology/P18-1133.pdf) [{Code}](https://github.com/WING-NUS/sequicity) ๐๐
- Attention with Intention for a Neural Network Conversation Model [{Paper}](https://arxiv.org/pdf/1510.08565.pdf) ๐
- **REDP**: Few-Shot Generalization Across Dialogue Tasks [{Paper}](https://arxiv.org/pdf/1811.11707.pdf) [{Blog}](http://www.xuwei.io/2019/03/18/%E3%80%8Afew-shot-generalization-across-dialogue-tasks%E3%80%8B%E8%AE%BA%E6%96%87%E7%AC%94%E8%AE%B0/) ๐๐
- **TEDP**: Dialogue Transformers [{Paper}](https://arxiv.org/pdf/1910.00486.pdf) [{Code}](https://github.com/RasaHQ/TED-paper) [{Blog}](https://zhuanlan.zhihu.com/p/336977835) ๐๐๐### Conversational Response Selection
- Multi-view Response Selection for Human-Computer Conversation [{Paper}](https://aclweb.org/anthology/D16-1036.pdf) ๐๐
- **SMN**: Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots [{Paper}](https://www.aclweb.org/anthology/P17-1046.pdf) [{Code}](https://github.com/MarkWuNLP/MultiTurnResponseSelection) [{Blog}](https://zhuanlan.zhihu.com/p/65062025) ๐๐๐:
- **DUA**: Modeling Multi-turn Conversation with Deep Utterance Aggregation [{Paper}](https://www.aclweb.org/anthology/C18-1317.pdf) [{Code}](https://github.com/cooelf/DeepUtteranceAggregation) [{Blog}](https://zhuanlan.zhihu.com/p/60618158) ๐๐
- **DAM**: Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network [{Paper}](https://www.aclweb.org/anthology/P18-1103.pdf) [{Code}](https://github.com/baidu/Dialogue/tree/master/DAM) [{Blog}](https://zhuanlan.zhihu.com/p/65143297) ๐๐๐
- **IMN**: Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots [{Paper}](https://arxiv.org/pdf/1901.01824.pdf) [{Code}](https://github.com/JasonForJoy/IMN) [{Blog}](https://zhuanlan.zhihu.com/p/68590678) ๐๐
- Dialogue Transformers [{Paper}](https://arxiv.org/pdf/1910.00486.pdf) ๐๐## 6 ไธป้ขๆจกๅ `Topic Model`
### LDA
- Latent Dirichlet Allocation [{Paper}](https://jmlr.org/papers/volume3/blei03a/blei03a.pdf) [{Blog}](https://arxiv.org/pdf/1908.03142.pdf) ๐๐๐
## 7 ็ฅ่ฏๅพ่ฐฑ `Knowledge Graph`
### ็ปผ่ฟฐ
- Towards a Definition of Knowledge Graphs [{Paper}](http://ceur-ws.org/Vol-1695/paper4.pdf) ๐๐๐
## 8 ๆ็คบๅญฆไน `Prompt Learning`
### ็ปผ่ฟฐ
- **PPP**: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [{Paper}](https://arxiv.org/pdf/2107.13586.pdf) [{Blog}](https://zhuanlan.zhihu.com/p/395115779) ๐๐๐
## 9 ๅพ็ฅ็ป็ฝ็ป `Graph Neural Network`
### ็ปผ่ฟฐ
- Graph Neural Networks for Natural Language Processing: A Survey [{Paper}](https://arxiv.org/pdf/2106.06090.pdf) ๐๐
## 10 ๅฅๅตๅ ฅ `Sentence Embedding`
### ๆ ธๅฟ
- **InferSent**: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data [{Paper}](https://arxiv.org/pdf/1705.02364.pdf) [{Code}](https://github.com/facebookresearch/InferSent) ๐๐
- **Sentence-BERT**: Sentence Embeddings using Siamese BERT-Networks [{Paper}](https://arxiv.org/pdf/1908.10084.pdf) [{Code}](https://github.com/UKPLab/sentence-transformers) ๐๐๐
- **BERT-flow**: On the Sentence Embeddings from Pre-trained Language Models [{Paper}](https://arxiv.org/pdf/2011.05864.pdf) [{Code}](https://github.com/bohanli/BERT-flow) [{Blog}](https://zhuanlan.zhihu.com/p/337134133) ๐๐
- **SimCSE**: Simple Contrastive Learning of Sentence Embeddings [{Paper}](https://arxiv.org/pdf/2104.08821.pdf) [{Code}](https://github.com/princeton-nlp/SimCSE) ๐๐๐## ๅ่
- [thunlp/NLP-THU](https://github.com/thunlp/NLP-THU)
- [iwangjian/Paper-Reading](https://github.com/iwangjian/Paper-Reading)
- [thunlp/PromptPapers](https://github.com/thunlp/PromptPapers)