https://github.com/JackHCC/NLP-Bubble

🖨 Natural Language Processing Learning Blog，a Study Bubble to recording learning.
https://github.com/JackHCC/NLP-Bubble
List: NLP-Bubble
awesome machine-learning nlp
Last synced: 6 months ago
JSON representation
🖨 Natural Language Processing Learning Blog，a Study Bubble to recording learning.
Host: GitHub
URL: https://github.com/JackHCC/NLP-Bubble
Owner: JackHCC
License: mit
Created: 2022-01-24T10:46:05.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2022-07-01T13:02:34.000Z (almost 3 years ago)
Last Synced: 2024-11-19T01:02:16.226Z (6 months ago)
Topics: awesome, machine-learning, nlp
Homepage:
Size: 2.51 MB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - NLP-Bubble - 🖨 Natural Language Processing Learning Blog，a Study Bubble to recording learning. (Other Lists / Julia Lists)
README

        # NLP-Bubble

🖨 Natural Language Processing Learning Blog，a Study Bubble to recording learning.

![](image/logo/NLP-Bubble-banner.png)

💡 NLP Learning Record 💡

## Lessons/Books

- [Statistical Learning Method v1](https://blog.creativecc.cn/posts/Lesson-Statistical-Learning-Method.html)

- [CS224N Natural Language Processing 2022](https://github.com/JackHCC/Awesome-DL-Models/tree/master/Docx/CS224N)

- [CS224W Machine Learning with Graphs 2021](https://blog.creativecc.cn/posts/Lesson-CS224W-Machine-Learning-with-Graphs.html)

## Papers

- [Arxiv NLP Reporter](https://github.com/JackHCC/Arxiv-NLP-Reporter)

  - [Web Reader](https://blog.creativecc.cn/Arxiv-NLP-Reporter/)

- Reading Web

  - [ACL anthology](https://www.aclweb.org/anthology/)

  - [NeurIPS](https://papers.nips.cc) , ICML, ICLR

  - [online preprint servers](https://arxiv.org)

## DataSet

### Classes

- Linguistic Data Consortium

  - [Linguistic Data Consortium (upenn.edu)](https://catalog.ldc.upenn.edu/)

  - [Linguistics (stanford.edu)](https://linguistics.stanford.edu/resources/resources-corpora)

- Machine translation

  - [Statistical Machine Translation (statmt.org)](https://statmt.org/)

- Dependency parsing: Universal Dependencies

  - [Universal Dependencies](https://universaldependencies.org/)

### Other

- Awesome

  - [NLPDataSet](https://github.com/liucongg/NLPDataSet)

  - [nlp-datasets](https://github.com/niderhoff/nlp-datasets)

- Platform

  - [千言：中文开源数据集合](https://www.luge.ai/)

  - [Papers With Code](https://paperswithcode.com/datasets)

  - Kaggle 

  - [GLUE](https://gluebenchmark.com/tasks)

- Blogs

  - [Datasets for Natural Language Processing](https://machinelearningmastery.com/datasets-natural-language-processing/)

  - [Sentiment Analysis](https://nlp.stanford.edu/sentiment/)

  - [The bAbI](https://research.facebook.com/downloads/babi/)

## NLP Task

思维导图：

![](../../../Blog/JackCC.Blog/hexo_blog/source/images/lesson/NLP_Task.png)

常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果（2020.05）以及对应的Paper与Code.

| 任务 
| ---------------------- 
| Chunking 
| Common sense reasoning 
| Parsing 
| Coreference resolution 
| Dependency parsing 
| Task-Oriented 
| Task-Oriented 
| Task-Oriented 
| Domain adaptation 
| Entity Linking 
| Information Extraction 
| Grammatical Error Correction 
| Language modeling 
| Lexical Normalization 
| Machine translation 
| Multimodal Emotion Recognition 
| Multimodal Metaphor Recognition 
| Multimodal Sentiment Analysis 
| Named entity recognition 
| Natural language inference 
| Part-of-speech tagging 
| Question answering 
| Word segmentation 
| Word Sense Disambiguation 
| Text classification 
| Summarization 
| Sentiment analysis 
| Semantic role labeling 
| Semantic parsing 
| Semantic textual similarity 
| Relationship Extraction 
| Relation Prediction

| 描述                | corpus/dataset                       | 评价指标                                   | SOTA                                       | Papers                                                       | Code                                                         | ------------------------ | ------------------- | ------------------------------------ | ------------------------------------------ | ------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | 组块分析            | Penn Treebank                        | F1                                         | 95.77                                      | [A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks](https://arxiv.org/pdf/1611.01587v5.pdf) | [Link](https://github.com/hassyGo/charNgram2vec)             | | 常识推理            | Event2Mind                           | cross-entropy                              | 4.22                                       | [Event2Mind: Commonsense Inference on Events, Intents, and Reactions](https://www.dialog-21.ru/media/5090/fenogenovaasplusetal-010.pdf) | [Link](https://github.com/Alenush/russian_event2mind)        | | 句法分析            | Penn Treebank                        | F1                                         | 95.13                                      | [Constituency Parsing with a Self-Attentive Encoder](https://arxiv.org/pdf/1805.01052v1.pdf) | [Link](https://github.com/nikitakit/self-attentive-parser)   | | 指代消解            | CoNLL 2012                           | average F1                                 | 73                                         | [Higher-order Coreference Resolution with Coarse-to-fine Inference](https://arxiv.org/pdf/1804.05392v1.pdf) | [Link](https://github.com/kentonl/e2e-coref)                 | | 依存句法分析        | Penn Treebank                        | POS
UAS
LAS                        | 97.3
95.44
93.76                   | [Deep Biaffine Attention for Neural Dependency Parsing](https://arxiv.org/pdf/1611.01734v3.pdf) | [Link](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/dependency_parsing/ddparser) | Dialogue/Intent Detection        | 任务型对话/意图识别 | ATIS/Snips                           | accuracy                                   | 94.1  97.0                                 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](https://aclanthology.org/N18-2118.pdf) | [Link](https://github.com/MiuLab/SlotGated-SLU)              | Dialogue/Slot Filling            | 任务型对话/槽填充   | ATIS/Snips                           | F1                                         | 95.2
88.8                              | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](https://aclanthology.org/N18-2118.pdf) | [Link](https://github.com/MiuLab/SlotGated-SLU)              | Dialogue/Dialogue State Tracking | 任务型对话/状态追踪 | DSTC2                                | Area
Food
Price
Joint          | 90
84
92
72                    | [Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems](https://arxiv.org/pdf/1804.06512v1.pdf) | [Link](https://github.com/google-research-datasets/simulated-dialogue) | | 领域适配            | Multi-Domain Sentiment Dataset       | average 
accuracy                      | 79.15                                      | [Strong Baselines for Neural Semi-supervised Learning under Domain Shift](https://arxiv.org/pdf/1804.09530v1.pdf) | [Link](https://github.com/bplank/semi-supervised-baselines)  | | 实体链接            | AIDA CoNLL-YAGO                      | Micro-F1-strong
Macro-F1-strong        | 86.6 
89.4                             | [End-to-End Neural Entity Linking](https://arxiv.org/pdf/1808.07699v2.pdf) | [Link](https://github.com/dalab/end2end_neural_el)           | | 信息抽取            | ReVerb45K                            | Precision
Recall
F1                | 62.7
84.4
81.9                     | [CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information](https://arxiv.org/pdf/1902.00172v1.pdf) | [Link](https://github.com/malllabiisc/cesi)                  | | 语法错误纠正        | JFLEG                                | GLEU                                       | 61.5                                       | [Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation](https://arxiv.org/pdf/1804.05945v1.pdf) | Link                                                         | | 语言模型            | Penn Treebank                        | Validation perplexity
 Test perplexity | 48.33
47.69                            | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/pdf/1711.03953v4.pdf) | [Link](https://github.com/zihangdai/mos)                     | | 词汇规范化          | LexNorm2015                          | F1
Precision
Recall                | 86.39 
93.53 
80.26                | [MoNoise: Modeling Noise Using a Modular Normalization System](https://arxiv.org/pdf/1710.03476v1.pdf) | [Link](https://bitbucket.org/robvanderg/monoise)             | | 机器翻译            | WMT 2014 EN-DE                       | BLEU                                       | 35.0                                       | [Understanding Back-Translation at Scale](https://arxiv.org/pdf/1808.09381v2.pdf) | [Link](https://github.com/pytorch/fairseq)                   | | 多模态情感识别      | IEMOCAP                              | Accuracy                                   | 76.5                                       | [Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling](https://arxiv.org/pdf/1806.06228v1.pdf) | [Link](https://github.com/SenticNet/hfusion)                 | | 多模态隐喻识别      | verb-noun pairs adjective-noun pairs | F1                                         | 0.75
0.79                              | [Black Holes and White Rabbits: Metaphor Identification with Visual Features](https://aclanthology.org/N16-1020.pdf) | Link                                                         | | 多模态情感分析      | MOSI                                 | Accuracy                                   | 80.3                                       | [Context-Dependent Sentiment Analysis in User-Generated Videos](https://aclanthology.org/P17-1081.pdf) | [Link](https://github.com/senticnet/sc-lstm)                 | | 命名实体识别        | CoNLL 2003                           | F1                                         | 93.09                                      | [Contextual String Embeddings for Sequence Labeling](https://aclanthology.org/C18-1139.pdf) | [Link](https://github.com/zalandoresearch/flair)             | | 自然语言推理        | SciTail                              | Accuracy                                   | 88.3                                       | [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | [Link](https://github.com/huggingface/transformers)          | | 词性标注            | Penn Treebank                        | Accuracy                                   | 97.96                                      | [Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings](https://arxiv.org/pdf/1805.08237v1.pdf) | [Link](https://github.com/google/meta_tagger)                | | 问答                | CliCR                                | F1                                         | 33.9                                       | [CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension](https://arxiv.org/pdf/1803.09720v1.pdf) | [Link](https://github.com/clips/clicr)                       | | 分词                | VLSP 2013                            | F1                                         | 97.90                                      | [A Fast and Accurate Vietnamese Word Segmenter](https://arxiv.org/pdf/1709.06307v2.pdf) | [Link](https://github.com/datquocnguyen/RDRsegmenter)        | | 词义消歧            | SemEval 2015                         | F1                                         | 67.1                                       | [Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison](https://aclanthology.org/E17-1010.pdf) | Link                                                         | | 文本分类            | AG News                              | Error rate                                 | 5.01                                       | [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/pdf/1801.06146v5.pdf) | [Link](https://github.com/fastai/fastai)                     | | 摘要                | Gigaword                             | ROUGE-1
ROUGE-2
ROUGE-L            | 37.04
19.03
34.46                  | [Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization](https://aclanthology.org/P18-1015.pdf) | Link                                                         | | 情感分析            | IMDb                                 | Accuracy                                   | 95.4                                       | [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/pdf/1801.06146v5.pdf) | [Link](https://github.com/fastai/fastai)                     | | 语义角色标注        | OntoNotes                            | F1                                         | 85.5                                       | [Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling](https://arxiv.org/pdf/1805.04787v2.pdf) | [Link](https://github.com/luheng/lsgn)                       | | 语义解析            | LDC2014T12                           | F1 Newswire
F1 Full                    | 0.71
0.66                              | [AMR Parsing with an Incremental Joint Model](https://arxiv.org/pdf/1909.04303v2.pdf) | [Link](https://github.com/jcyk/AMR-parser)                   | | 语义文本相似度      | SentEval                             | MRPC
SICK-R
SICK-E
STS         | 78.6/84.4
0.888
87.8
78.9/78.6 | [Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning](https://arxiv.org/pdf/1804.00079v1.pdf) | [Link](https://github.com/facebookresearch/SentEval)         | | 关系抽取            | New York Times Corpus                | P@10%
P@30%                            | 73.6
59.5                              | [RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information](https://arxiv.org/pdf/1812.04361v2.pdf) | [Link](https://github.com/malllabiisc/RESIDE)                | | 关系预测            | WN18RR                               | H@10
H@1
MRR                       | 59.02
45.37
49.83                  | [Predicting Semantic Relations using Global Graph Properties](https://arxiv.org/pdf/1808.08644v1.pdf) | [Link](https://github.com/yuvalpinter/m3gm)                  |

## Resource

- [NLP-Interview-Notes](https://github.com/km1994/NLP-Interview-Notes)

- [Recommendation-Advertisement-Search](https://github.com/km1994/recommendation_advertisement_search)

- [NLPer-Arsenal](https://github.com/TingFree/NLPer-Arsenal)

- [AI-Surveys](https://github.com/KaiyuanGao/AI-Surveys)

## Interview

- [Machine Learning](./docs/interview/machine-learning.md)

- [Deep Learning](./docs/interview/deep-learning.md)

- [Word Embedding](./docs/interview/word-embedding.md)

- [Transformer](./docs/interview/transformer.md)

- [Bert](./docs/interview/bert.md)

- [Reverse](./docs/interview/reverse-interview.md)

© [JackHCC](https://github.com/JackHCC)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/JackHCC/NLP-Bubble

Awesome Lists containing this project

README