Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-nlp
A curated list of awesome frameworks, libraries, tools, datasets, tutorials, and research papers for Natural Language Processing (NLP). This list covers a variety of NLP tasks, from text processing and tokenization to state-of-the-art language models and applications like sentiment analysis and machine translation.
https://github.com/awesomelistsio/awesome-nlp
Last synced: about 18 hours ago
JSON representation
-
Tools and Applications
- Stanford CoreNLP - A suite of NLP tools for linguistic analysis.
- Gensim - A Python library for topic modeling and document similarity.
- FastText - A library for efficient text classification and representation learning.
- LexRank - A text summarization library using graph-based ranking algorithms.
- Gensim - A Python library for topic modeling and document similarity.
- FastText - A library for efficient text classification and representation learning.
- LexRank - A text summarization library using graph-based ranking algorithms.
-
Research Papers
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) - The introduction of the BERT model.
- Attention Is All You Need (2017) - The paper that introduced the Transformer architecture, revolutionizing NLP.
- GloVe: Global Vectors for Word Representation (2014) - A model for generating word embeddings.
- Word2Vec: Efficient Estimation of Word Representations in Vector Space (2013) - The introduction of Word2Vec, a method for learning word embeddings.
- ELMo: Deep Contextualized Word Representations (2018) - A model for contextual word embeddings.
- Attention Is All You Need (2017) - The paper that introduced the Transformer architecture, revolutionizing NLP.
- GloVe: Global Vectors for Word Representation (2014) - A model for generating word embeddings.
- Word2Vec: Efficient Estimation of Word Representations in Vector Space (2013) - The introduction of Word2Vec, a method for learning word embeddings.
- ELMo: Deep Contextualized Word Representations (2018) - A model for contextual word embeddings.
-
Frameworks and Libraries
- NLTK (Natural Language Toolkit) - A comprehensive library for text processing and analysis.
- AllenNLP - An open-source NLP research library built on top of PyTorch.
- AllenNLP - An open-source NLP research library built on top of PyTorch.
- spaCy - An open-source library for advanced natural language processing in Python.
- NLTK (Natural Language Toolkit) - A comprehensive library for text processing and analysis.
-
Text Processing and Tokenization
- BPE (Byte Pair Encoding) - A subword tokenization technique used by models like GPT and BERT.
- spaCy Tokenizer - A fast and efficient tokenizer integrated within the spaCy library.
- Moses Tokenizer - A widely used tokenizer for machine translation tasks.
- BPE (Byte Pair Encoding) - A subword tokenization technique used by models like GPT and BERT.
- SentencePiece - A language-independent tokenization and text processing library.
- RegexpTokenizer (NLTK) - A tokenizer that uses regular expressions to split text into tokens.
-
NLP Tasks
- spaCy NER
- TextBlob Sentiment Analysis
- Stanford NER
- OpenNMT - A neural machine translation framework.
- Fairseq - A Facebook AI research framework for sequence-to-sequence models.
- PEGASUS - A pre-trained model specifically designed for text summarization.
- TextBlob Sentiment Analysis
- VADER Sentiment Analysis
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation
- Stanford NER
- OpenNMT - A neural machine translation framework.
- Fairseq - A Facebook AI research framework for sequence-to-sequence models.
- PEGASUS - A pre-trained model specifically designed for text summarization.
-
Pretrained Language Models
- GPT-3 (Generative Pre-trained Transformer 3) - A powerful generative language model by OpenAI.
- RoBERTa - An optimized variant of BERT, focusing on robustly optimized pretraining.
- T5 (Text-to-Text Transfer Transformer) - A model that treats every NLP task as a text-to-text problem.
- XLNet - A generalized autoregressive pretraining model that outperforms BERT on several tasks.
- DistilBERT - A smaller, faster, and lighter version of BERT.
- BERT (Bidirectional Encoder Representations from Transformers) - A Transformer-based model for a variety of NLP tasks.
- GPT-3 (Generative Pre-trained Transformer 3) - A powerful generative language model by OpenAI.
- RoBERTa - An optimized variant of BERT, focusing on robustly optimized pretraining.
- T5 (Text-to-Text Transfer Transformer) - A model that treats every NLP task as a text-to-text problem.
- XLNet - A generalized autoregressive pretraining model that outperforms BERT on several tasks.
- DistilBERT - A smaller, faster, and lighter version of BERT.
-
Datasets
- CoNLL-2003 - A dataset for named entity recognition.
- GLUE Benchmark - A collection of resources for evaluating natural language understanding systems.
- SQuAD (Stanford Question Answering Dataset) - A dataset for reading comprehension and question answering tasks.
- IMDB Reviews - A dataset for sentiment analysis.
- WikiText - A collection of high-quality text from Wikipedia for language modeling tasks.
- GLUE Benchmark - A collection of resources for evaluating natural language understanding systems.
- SQuAD (Stanford Question Answering Dataset) - A dataset for reading comprehension and question answering tasks.
- CoNLL-2003 - A dataset for named entity recognition.
- IMDB Reviews - A dataset for sentiment analysis.
- WikiText - A collection of high-quality text from Wikipedia for language modeling tasks.
-
Community
- Reddit: r/NLP - A subreddit for discussions on natural language processing.
- Hugging Face Community - A forum for discussing the Hugging Face NLP library.
- Reddit: r/NLP - A subreddit for discussions on natural language processing.
- Hugging Face Community - A forum for discussing the Hugging Face NLP library.
-
Learning Resources
- Coursera: Natural Language Processing Specialization - A comprehensive course on NLP by Deeplearning.ai.
- Stanford CS224N: Natural Language Processing with Deep Learning - A popular university course on NLP.
- Fast.ai NLP Course - A practical course on NLP using the fastai library.
- Fast.ai NLP Course - A practical course on NLP using the fastai library.
- Coursera: Natural Language Processing Specialization - A comprehensive course on NLP by Deeplearning.ai.
- Stanford CS224N: Natural Language Processing with Deep Learning - A popular university course on NLP.
Programming Languages
Categories
Sub Categories