Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-nlp
https://github.com/supertopdev/awesome-nlp
- AI Playbook - technical audience. Written by the amazing people over at [a16z - Andreessen Horowitz](https://a16z.com/) this is a great link to forward to your managers or content for your presentations
- Machine Learning Blog
- Ruder's Blog
- Understand & Implement Natural Language Processing
- Introduction to NLP at Hackernoon - in their own words
- NLP Tutorial by Vik Paruchari
- Deep Learning for NLP with Pytorch
- Hands-On NLTK Tutorial - The hands-on NLTK tutorial in the form of Jupyter notebooks
- Deep Learning, NLP, and Representations
- Natural Language Processing Blog
- Seq2Seq
- tutorials by Radim Řehůřek
- arXiv: Natural Language Processing (Almost) from Scratch
- karpathy's The Unreasonable Effectiveness of Recurrent Neural Networks
- Udacity's Intro to Artificial Intelligence
- Udacity's Deep Learning
- Deep Natural Language Processing at Oxford
- Deep Learning for Natural Language Processing (cs224*n* Winter 2017)
- Lecture Slides and Reading Material here
- Deep Learning for Natural Language Processing (cs224*d* 2016)
- Deep Learning for Natural Language Processing (cs224*d* 2015)
- Natural Language Processing by Prof. Mike Collins at Columbia
- Statistical Machine Translation - a Machine Translation course with great assignments and slides
- NLTK with Python 3 for Natural Language Processing
- Computational Linguistics I - Graber, Lectures from University of Maryland
- Text Mining in R
- Natural Language Processing with Python
- Twitter-text - A JavaScript implementation of Twitter's text processing library
- Knwl.js - A Natural Language Processor in JS
- Retext - Extensible system for analyzing and manipulating natural language
- NLP Compromise - Natural Language processing in the browser
- Natural - general natural language facilities for node
- TextBlob - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of [Natural Language Toolkit (NLTK)](http://www.nltk.org/) and [Pattern](https://github.com/clips/pattern), and plays nicely with both :+1:
- spaCy - Industrial strength NLP with Python and Cython :+1:
- textacy - Higher level NLP built on spaCy
- gensim - Python library to conduct unsupervised semantic modelling from plain text :+1:
- scattertext - Python library to produce d3 visualizations of how language differs between corpora
- AllenNLP - An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
- PyTorch-NLP - NLP research toolkit designed to support rapid prototyping with better data loaders, word vector loaders, neural network layer representations, common NLP metrics such as BLEU
- Rosetta - Text processing tools and wrappers (e.g. Vowpal Wabbit)
- PyNLPl - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](http://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.
- jPTDP - A toolkit for joint part-of-speech (POS) tagging and dependency parsing. jPTDP provides pre-trained models for 40+ languages.
- BigARTM - a fast library for topic modelling
- Snips NLU - A production ready library for intent parsing
- MIT Information Extraction Toolkit - C, C++, and Python tools for named entity recognition and relation extraction
- CRF++ - Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks.
- CRFsuite - CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data.
- BLLIP Parser - BLLIP Natural Language Parser (also known as the Charniak-Johnson parser)
- colibri-core - C++ library, command line tools, and Python binding for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
- ucto - Unicode-aware regular-expression based tokenizer for various languages. Tool and C++ library. Supports FoLiA format.
- libfolia - C++ library for the [FoLiA format](http://proycon.github.io/folia/)
- frog - Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyzer.
- MeTA - [MeTA : ModErn Text Analysis](https://meta-toolkit.org/) is a C++ Data Sciences Toolkit that facilitates mining big text data.
- Mecab (Japanese)
- Moses
- StarSpace - a library from Facebook for creating embeddings of word-level, paragraph-level, document-level and for text classification
- Stanford NLP
- OpenNLP
- ClearNLP
- Word2vec in Java
- ReVerb - Scale Open Information Extraction
- OpenRegex - based regular expression language and engine.
- CogcompNLP - Core libraries developed in the U of Illinois' Cognitive Computation Group.
- MALLET - MAchine Learning for LanguagE Toolkit - package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
- RDRPOSTagger - A robust POS tagging toolkit available (in both Java & Python) together with pre-trained models for 40+ languages.
- Saul - Library for developing NLP systems, including built in modules like SRL, POS, etc.
- ATR4S - Toolkit with state-of-the-art [automatic term recognition](https://en.wikipedia.org/wiki/Terminology_extraction) methods.
- tm - Implementation of topic modeling based on regularized multilingual [PLSA](https://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis).
- word2vec-scala - Scala interface to word2vec model; includes operations on vectors like word-distance and word-analogy.
- Epic - Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.
- text2vec - Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
- wordVectors - An R package for creating and exploring word2vec and other word embedding models
- RMallet - R package to interface with the Java machine learning tool MALLET
- dfr-browser - Creates d3 visualizations for browsing topic models of text in a web browser.
- dfrtopics - R package for exploring topic models of text.
- sentiment_classifier - Sentiment Classification using Word Sense Disambiguation and WordNet Reader
- jProcessing - Japanese Natural Langauge Processing Libraries, with Japanese sentiment classification
- Clojure-openNLP - Natural Language Processing in Clojure (opennlp)
- Infections-clj - Rails-like inflection library for Clojure and ClojureScript
- postagga - A library to parse natural language in Clojure and ClojureScript
- A collection of Natural Language Processing (NLP) Ruby libraries, tools and software
- Practical Natural Language Processing done in Ruby
- whatlang
- snips-nlu-rs - A production ready library for intent parsing
- Wit-ai - Natural Language Interface for apps and devices
- IBM Watson's Natural Language Understanding - API and Github demo
- Amazon Comprehend - NLP and ML suite covers most common tasks like NER, tagging, and sentiment analysis
- Google Cloud Natural Language API - Syntax Analysis, NER, Sentiment Analysis, and Content tagging in atleast 9 languages include English and Chinese (Simplified and Traditional).
- ParallelDots - State of the art Text Analysis API Service ranging from Sentiment Analysis to Intent Analysis
- Microsoft Cognitive Service
- TextRazor
- Rosette
- T. Mikolov
- Word2Vec Official Implementation
- Deep Learning, NLP, and Representations
- Efficient Estimation of Word Representations in Vector Space
- Distributed Representations of Words and Phrases and their Compositionality - technologies.com/making-sense-of-word2vec/)
- Word2Vec Resources on Github
- GloVe: Global vectors for word representation
- Glove source code and training data
- fastText on Github - for efficient learning of word representations and sentence classification
- Pre-trained Vectors
- arXiv: Enriching Word Vectors with Subword Information
- Unofficial Python Wrapper for fastText on Github
- Pre-trained word embeddings for WSJ corpus - Lab
- HLBL language model
- Real-valued vector "embeddings"
- Improving Word Representations Via Global Context And Multiple Word Prototypes
- Dependency based word embeddings
- sense2vec - on word sense disambiguation
- Infinite Dimensional Word Embeddings - new
- Skip Thought Vectors - word representation method
- Adaptive skip-gram - similar approach, with adaptive properties
- Sequence to Sequence Learning - word vectors for machine translation
- Improving distributional similarity with lessons learned from word embeddings
- Deep Contextualized Word Represenations - [PyTorch](https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md) - [TF Implementation](https://github.com/allenai/bilm-tf)
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
- Distributed Representations of Sentences and Documents
- Le - technologies.com/doc2vec-tutorial/)
- Deep Recursive Neural Networks for Compositionality in Language
- Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
- Semi-supervised Sequence Learning
- blog post - decoder architecture with seq2seq models. [Tensorflow Code here](https://github.com/tensorflow/nmt)
- seq2seq tensorflow tutorial
- tutorial in Perl
- arXiv: Sequence to Sequence Learning with Neural Networks
- arXiv: Neural Machine Translation by jointly learning to align and translate
- arXiv: A Convolutional encoder model for neural machine translation
- Convolutional Sequence to Sequence learning
- Convolutional over Recurrent Encoder for neural machine translation
- OpenNMT - py), [Tensorflow](https://github.com/OpenNMT/OpenNMT-tf) and the original [LuaTorch](https://github.com/OpenNMT/OpenNMT) implementation
- A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
- Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010).
- Implementing RNN Language Models by Denny Britz
- Neural Responding Machine for Short-Text Conversation
- arXiv: A Neural Conversation Model - XIAAAAJ) 2015. Uses LSTM RNNs to generate conversational responses
- andrewt3000/DL4NLP
- Annotated Transformer
- Memory Networks
- End-To-End Memory Networks
- MemNN
- Reasoning, Attention and Memory RAM workshop at NIPS 2015. slides included
- Neural Turing Machines
- Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
- Stack RNN source code - algorithmic-patterns-with-stack/)
- Neural autocoder for paragraphs and documents - LSTM representation
- LSTM over tree structures
- Low-Dimensional Embeddings of Logic
- based on this paper
- Distant Supervision for Cancer Pathway Extraction From Text
- A Neural Probabilistic Language Model
- Retrofitting word vectors to semantic lexicons
- Unsupervised Learning of the Morphology of a Natural Language
- Computational Grounded Cognition: a new alliance between grounded cognition and computational modelling
- Learning the Structure of Biomedical Relation Extractions
- Statistical Language Models based on Neural Networks
- A survey of named entity recognition and classification
- Benchmarking the extraction and disambiguation of named entities on the semantic web
- Knowledge base population: Successful approaches and challenges
- SpeedRead: A fast named entity recognition Pipeline
- Markov Logic Networks for Natural Language Question Answering
- Template-Based Information Extraction without the Templates
- Relation extraction with matrix factorization and universal schemas
- Privee: An Architecture for Automatically Analyzing Web Privacy Policies
- Teaching Machines to Read and Comprehend - DeepMind paper
- DrQA: Open Domain Question Answering
- Relation Extraction with Matrix Factorization and Universal Schemas
- Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors
- Presentation slides for MLN tutorial
- Presentation slides for QA applications of MLNs
- Presentation slides
- awesome-text-summarization - curated list of resources in text summarization.
- Example blogpost - Summarization-with-Amazon-Reviews).
- TextRank- bringing order into text
- Modelling compressions with Discourse constraints
- Deep Recurrent Generative Decoder model for Abstractive Text Summarization - to-sequence oriented encoder-decoder model equipped with a deep recurrent generative decoder.
- A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification - decoder for text summarization.
- TextSum
- Brightmart/text_classification
- Facebook's fasttext
- Convolutional Neural Networks for Sentence Classfication
- Using a CNN for text classification in TensorFlow - text-classification-tf).
- Character-level Convolutional Networks for Text Classification
- nlp-datasets
- KoNLPy - Python package for Korean natural language processing.
- Mecab (Korean) - C++ library for Korean NLP
- KoalaNLP - Scala library for Korean Natural Language Processing.
- KoNLP - R package for Korean Natural language processing
- dsindex's blog
- Kangwon University's NLP course in Korean
- KAIST Corpus - A corpus from the Korea Advanced Institute of Science and Technology in Korean.
- Naver Sentiment Movie Corpus in Korean
- Chosun Ilbo archive - dataset in Korean from one of the major newspapers in South Korea, the Chosun Ilbo.
- LABR - LArge Arabic Book Reviews dataset
- Arabic Stopwords - A list of Arabic stopwords from various resources
- goarabic - Go package for Arabic text processing
- jsastem - Javascript for Arabic stemming
- PyArabic - Python libraries for Arabic
- jieba - Python package for Words Segmentation Utilities in Chinese
- SnowNLP - Python package for Chinese NLP
- FudanNLP - Java library for Chinese text processing
- Columbian Political Speeches
- Copenhagen Treebank
- Reuters Corpora RCV2
- Spanish Billion words corpus with Word2Vec embeddings
- Hindi Dependency Treebank - A multi-representational multi-layered treebank for Hindi and Urdu
- Universal Dependencies Treebank in Hindi
- Parallel Universal Dependencies Treebank in Hindi - A smaller part of the above-mentioned treebank.
- PyThaiNLP - Thai NLP in Python Package
- JTCC - A character cluster library in Java
- CutKum - Word segmentation with deep learning in TensorFlow
- Thai Language Toolkit - Based on a paper by Wirote Aroonmanakun in 2002 with included dataset
- SynThai - Word segmentation and POS tagging using deep learning in Python
- Inter-BEST - A text corpus with 5 million words with word segmentation
- Prime Minister 29 - Dataset containing speeches of the current Prime Minister of Thailand
- pymorphy2 - a good pos-tagger for Russian
- arXiv: BKTreeBank
- ICU Tokenizer
- CLTK
- python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
- NLPH_Resources - A collection of papers, corpora and linguistic resources for NLP in Hebrew
- ai-reading-list
- nlp-reading-group
- awesome-spanish-nlp
- jjangsangy's awesome-nlp
- awesome-machine-learning
- DL4NLP
Programming Languages
Keywords
nlp
22
natural-language-processing
17
machine-learning
11
python
9
computational-linguistics
6
java
5
text-classification
4
folia
4
text-mining
4
nlp-library
4
text-processing
4
deep-learning
4
pos-tagger
4
c-plus-plus
4
named-entity-recognition
3
library
3
pos-tagging
3
ai
3
word-embeddings
3
topic-modeling
3
natural-language-understanding
2
ner
2
language
2
pos-tag
2
text-analysis
2
machine-learning-library
2
information-extraction
2
sentiment-analysis
2
word-sense-disambiguation
2
wsd
2
part-of-speech-tagger
2
dependency-parsing
2
dependency-parser
2
linguistics
2
neural-network
2
pytorch
2
word2vec
2
spacy
2
ruby
2
nltk
2
rust
2
nlu
2
inference
2
arabic-language
2
embeddings
1
dataset
1
data-loader
1
textrnn
1
data-science
1
arabic-nlp
1