awesome-nlp

A curated list of awesome frameworks, libraries, tools, datasets, tutorials, and research papers for Natural Language Processing (NLP). This list covers a variety of NLP tasks, from text processing and tokenization to state-of-the-art language models and applications like sentiment analysis and machine translation.
https://github.com/awesomelistsio/awesome-nlp

Last synced: about 22 hours ago
JSON representation

Tools and Applications
- Stanford CoreNLP - A suite of NLP tools for linguistic analysis.
- Gensim - A Python library for topic modeling and document similarity.
- FastText - A library for efficient text classification and representation learning.
- LexRank - A text summarization library using graph-based ranking algorithms.
- Gensim - A Python library for topic modeling and document similarity.
- FastText - A library for efficient text classification and representation learning.
- LexRank - A text summarization library using graph-based ranking algorithms.
NLP Tasks
- spaCy NER
- TextBlob Sentiment Analysis
- Stanford NER
- OpenNMT - A neural machine translation framework.
- Fairseq - A Facebook AI research framework for sequence-to-sequence models.
- PEGASUS - A pre-trained model specifically designed for text summarization.
- TextBlob Sentiment Analysis
- VADER Sentiment Analysis
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation
- Stanford NER
- OpenNMT - A neural machine translation framework.
- Fairseq - A Facebook AI research framework for sequence-to-sequence models.
- PEGASUS - A pre-trained model specifically designed for text summarization.
- spaCy NER
Frameworks and Libraries
- NLTK (Natural Language Toolkit) - A comprehensive library for text processing and analysis.
- AllenNLP - An open-source NLP research library built on top of PyTorch.
- AllenNLP - An open-source NLP research library built on top of PyTorch.
- spaCy - An open-source library for advanced natural language processing in Python.
- NLTK (Natural Language Toolkit) - A comprehensive library for text processing and analysis.
Text Processing and Tokenization
- BPE (Byte Pair Encoding) - A subword tokenization technique used by models like GPT and BERT.
- Moses Tokenizer - A widely used tokenizer for machine translation tasks.
- BPE (Byte Pair Encoding) - A subword tokenization technique used by models like GPT and BERT.
- SentencePiece - A language-independent tokenization and text processing library.
- RegexpTokenizer (NLTK) - A tokenizer that uses regular expressions to split text into tokens.
Pretrained Language Models
- GPT-3 (Generative Pre-trained Transformer 3) - A powerful generative language model by OpenAI.
- RoBERTa - An optimized variant of BERT, focusing on robustly optimized pretraining.
- T5 (Text-to-Text Transfer Transformer) - A model that treats every NLP task as a text-to-text problem.
- XLNet - A generalized autoregressive pretraining model that outperforms BERT on several tasks.
- DistilBERT - A smaller, faster, and lighter version of BERT.
- GPT-3 (Generative Pre-trained Transformer 3) - A powerful generative language model by OpenAI.
- RoBERTa - An optimized variant of BERT, focusing on robustly optimized pretraining.
- T5 (Text-to-Text Transfer Transformer) - A model that treats every NLP task as a text-to-text problem.
- XLNet - A generalized autoregressive pretraining model that outperforms BERT on several tasks.
- DistilBERT - A smaller, faster, and lighter version of BERT.
Datasets
- CoNLL-2003 - A dataset for named entity recognition.
- GLUE Benchmark - A collection of resources for evaluating natural language understanding systems.
- SQuAD (Stanford Question Answering Dataset) - A dataset for reading comprehension and question answering tasks.
- IMDB Reviews - A dataset for sentiment analysis.
- WikiText - A collection of high-quality text from Wikipedia for language modeling tasks.
- GLUE Benchmark - A collection of resources for evaluating natural language understanding systems.
- SQuAD (Stanford Question Answering Dataset) - A dataset for reading comprehension and question answering tasks.
- CoNLL-2003 - A dataset for named entity recognition.
- IMDB Reviews - A dataset for sentiment analysis.
- WikiText - A collection of high-quality text from Wikipedia for language modeling tasks.
Research Papers
- Attention Is All You Need (2017) - The paper that introduced the Transformer architecture, revolutionizing NLP.
- GloVe: Global Vectors for Word Representation (2014) - A model for generating word embeddings.
- Word2Vec: Efficient Estimation of Word Representations in Vector Space (2013) - The introduction of Word2Vec, a method for learning word embeddings.
- ELMo: Deep Contextualized Word Representations (2018) - A model for contextual word embeddings.
- Attention Is All You Need (2017) - The paper that introduced the Transformer architecture, revolutionizing NLP.
- GloVe: Global Vectors for Word Representation (2014) - A model for generating word embeddings.
- Word2Vec: Efficient Estimation of Word Representations in Vector Space (2013) - The introduction of Word2Vec, a method for learning word embeddings.
- ELMo: Deep Contextualized Word Representations (2018) - A model for contextual word embeddings.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) - The introduction of the BERT model.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) - The introduction of the BERT model.
Community
- Reddit: r/NLP - A subreddit for discussions on natural language processing.
- Hugging Face Community - A forum for discussing the Hugging Face NLP library.
- Reddit: r/NLP - A subreddit for discussions on natural language processing.
- Hugging Face Community - A forum for discussing the Hugging Face NLP library.
Learning Resources
- Coursera: Natural Language Processing Specialization - A comprehensive course on NLP by Deeplearning.ai.
- Stanford CS224N: Natural Language Processing with Deep Learning - A popular university course on NLP.
- Fast.ai NLP Course - A practical course on NLP using the fastai library.
- Fast.ai NLP Course - A practical course on NLP using the fastai library.
- Coursera: Natural Language Processing Specialization - A comprehensive course on NLP by Deeplearning.ai.
- Stanford CS224N: Natural Language Processing with Deep Learning - A popular university course on NLP.

Programming Languages

Python 3 JavaScript 2 C++ 1 Roff 1 Java 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-nlp

Tools and Applications

NLP Tasks

Frameworks and Libraries

Text Processing and Tokenization

Pretrained Language Models

Datasets

Research Papers

Community

Learning Resources