text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
https://github.com/stepthom/text_mining_resources
Last synced: 9 days ago
JSON representation
-
Other Curated Lists
-
Lexicons for Sentiment Analysis
- awesome-machine-learning
- Chinese NLP Tools
- Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found
- Awesome Deep Learning for Natural Language Processing (NLP)
- Association for Computational Linguistics Papers Anthology
- Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found
-
-
Products
-
Knowledge Graphs
- Amazon Lex
- Apache PDFBox
- Amazon Comprehend
- brat
- Lexalytics Sematria
- Systran - Enterprise Translation Products
- SAS Sentiment Analysis
- STATISTICA
- Text Mining (Big Data, Unstructured Data)
- Gate
- Video: How IBM Watson learns (3 minutes)
- Video: IBM Watson on Jeapardy! (10 minutes)
- Video: IBM Watson: The Science Behind an Answer (7 minutes)
- Stocktwits
- Meltwater
- Alchemy API
- Ask Data by Tableau Software Inc. - on to help assist existing Tableau platform users with retrieving quick and easy data visualizations to drive business intelligence insights. Similar to a search engine user interface, Tableau’s Ask Data feature interface applies NLP from user text input to extract key words to find data analytics and business insights quickly on the Tableau Platform.
- Microsoft Azure Text Analytics
- SO: How to extract text from a PDF?
- Tools for Extracting Data and Text from PDFs - A Review
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- Lyrebird.ai - Realistic Voice Cloning and Text-to-Speech” recognition platform. This Canadian start-up has created a product/platform that syncs both voice cloning with text-to-speech. Lyrebird recognizes the intonations and voice patterns from audio recordings, and overlays text data input to recreate a text-to-speech audio file output from the selected voice pattern audio recording.
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- Dialogflow
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- PDFLayoutTextStripper: Converts a pdf file into a text file while keeping the layout of the original pdf.
- pdftabextract: A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
- Anafora - based raw text annotation tool
- LightTag Annotation Tool
- PyPDF2
- Tools for Extracting Data and Text from PDFs - A Review
- Ask Data by Tableau Software Inc. - on to help assist existing Tableau platform users with retrieving quick and easy data visualizations to drive business intelligence insights. Similar to a search engine user interface, Tableau’s Ask Data feature interface applies NLP from user text input to extract key words to find data analytics and business insights quickly on the Tableau Platform.
- Tabula: A tool for liberating data tables locked inside PDF files.
-
-
APIs and Libraries
-
Knowledge Graphs
- MLlib
- spaCy - Strength Natural Language Processing in Python.
- Python modules
- Gensim
- Apache OpenNLP
- textblob
- TextRazor API
- polyglot
- Beautiful Soup
- Apache Spark - purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
- R packages
- tm
- lsa
- lda
- textir
- corpora
- tau
- sentimentr - based sentiment analysis.
- cleanNLP - based sentiment analysis.
- RSentiment - based sentiment analysis. Contains support for negation detection and sarcasm.
- text2vec - friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities.
- LDAvis
- keras - level neural networks 'API'. ([RStudio Blog: TensorFlow for R](https://blog.rstudio.com/2018/02/06/tensorflow-for-r))
- retweet - recipes/))
- topicmodels
- textmineR
- gtrendsR
- Analyzing Google Trends Data in R
- textstem
- NLPutils
- Udpipe
- Tutorial
- Spark NLP - grade, scalable, and trainable versions of the latest research in natural language processing.
- Natural Language Basics with TextBlob
- textmining
- embeddings
- fastText
- LDA
- TFIDF - inverse document frequency
- HDF5
- h5py
- Introduction to StanfordNLP: An Incredible State-of-the-Art NLP Library for 53 Languages (with Python code)
- Stanford Parser
- Stanford POS Tagger - of-Speech tagger.
- Stanford Named Entity Recognizer
- Stanford Classifier
- Stanford Topic Modeling Toolbox
- Comparison of Top 6 Python NLP Libraries
- pyCaret's NLP Module - code machine learning library in Python that aims to reduce the cycle time from hypothesis to insights; also, PyCaret's Founder Moez Ali is a Smith Alumni - MMA 2020.
- fastText
- Google Seq2Seq - purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summarization, Conversational Modeling, Image Captioning, and more.
- lda2vec
- AllenNLP - source NLP research library, built on PyTorch.
- tidytext
- wordVectors
- Scattertext
- BigARTM
- word_forms - -> "elect", "electoral", "electorate" etc.
- Keras-BERT
- Texthero
- Paragraph embedding scripts and Pre-trained models - trained Doc2Vec and Word2Vec models
- sent2vec
- Glove-Python
- flair - of-the-art Natural Language Processing (NLP)
- scikit-learn
- Video: NLTK with Python 3 for Natural Language Processing
- Sentiment140
- retweet - recipes/))
- textmining
- Streamcrab - Time, Twitter sentiment analyzer engine http:/www.streamcrab.com
- Pattern.en - of-speech tagger for English, sentiment analysis, tools for English verb conjugation and noun singularization & pluralization, and a WordNet interface.
- Tutorial
- Bert As A Service - length sentence to a fixed-length vector. Design intent to provide a scalable production ready service, also allowing researchers to apply BERT quickly.
- fastTextR
- NLPutils
- Natural Language Basics with TextBlob
- textmining
- PyText - learning based NLP modeling framework built on PyTorch.
- embeddings
- textacy
- Apache Tika
- HDF5
- h5py
-
-
Blog Articles, Papers, Case Studies
-
Q&A Systems, Chatbots <a id="qa-systems"></a>
- Microsoft Bot Framework
- Task-oriented Dialogue System for Automatic Diagnosis
- Meet Lucy: Creating a Chatbot Prototype
- Training Millions of Personalized Dialogue Agents
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- Building a Simple Chatbot from Scratch in Python (Using NLTK)- September 2018
- A Survey on Dialogue Systems: Recent Advances and New Frontiers
- Examining the Impact of an Automated Translation Chatbot on Online Collaborative Dialog for Incidental L2 Learning
- Generative Model Chatbots- May 2017
- A Guide to Building a Multi-Featured Slackbot with Python- March 2017
- The Road to a Conversational Banking Future-February 2019
- Chatbots - Designing intents and entities for NLP Models
- Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- The Road to a Conversational Banking Future-February 2019
- NLP — Building a Question Answering model
- NLP — Building a Question Answering model
- NLP — Building a Question Answering model
- NLP — Building a Question Answering model
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- Create a banking chatbot with FAQ discovery, anger detection and natural language understanding
- The Road to a Conversational Banking Future-February 2019
- Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
- A Survey on Dialogue Systems: Recent Advances and New Frontiers
- The Road to a Conversational Banking Future-February 2019
- Meet Lucy: Creating a Chatbot Prototype
- Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
- NLP — Building a Question Answering model
-
Machine Translation
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - research/bert). [Pytorch port.]( https://github.com/codertimo/BERT-pytorch)
- The Annotated Transformer - by-line implementation of "Attention Is All You Need".
- Blog Post: Found in translation: More accurate, fluent sentences in Google Translate
- NYTimes: The Great A.I. Awakening
- Machine Learning Translation and the Google Translate Algorithm
- Phrase-Based & Neural Unsupervised Machine Translation - based model. Awarded as the Best Paper Award at EMNLP 2018. [Implementation code](https://github.com/facebookresearch/UnsupervisedMT).
- Paper Dissected: “Attention is All You Need” Explained
- Neural Machine Translation (seq2seq) Tutorial
- Machine Learning Translation and the Google Translate Algorithm
-
Document Classification
- Bag of Tricks for Efficient Text Classification
- Towards Explainable NLP: A Generative Explanation Framework for Text Classification
- Naive Bayes and Text Classification - depth overview of both the Naive Bayes algorithm and how it can be used in the document classification process.
- Text Classifier Algorithms in Machine Learning
- Classifying Documents in the Reuters-21578 R8 Dataset
- Multi-Class Text Classification with Scikit-Learn - class problems, such as classifying consumer complaints into one of 12 categories.
- Machine Learning with Text in scikit-learn (PyCon 2016) - learn in the document classification process.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Text Classification in Python with scikit-learn and nltk - learn.
- Towards Explainable NLP: A Generative Explanation Framework for Text Classification
- Text Classification in Python with scikit-learn and nltk - learn.
- Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews - paper with code on Github
- Naive Bayes and Text Classification - depth overview of both the Naive Bayes algorithm and how it can be used in the document classification process.
- Tidy Text Mining Beer Reviews
- Machine Learning with Text in scikit-learn (PyCon 2016) - learn in the document classification process.
- Ultimate guide to deal with Text Data (using Python) – for Data Scientists & Engineers
-
Transformers and Language Models
- The Illustrated Transformer
- Understanding Large Language Models
- The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
- Machines Beat Humans on a Reading Test. But Do They Understand?
- the transformer … “explained”?
- A Primer in BERTology: What we know about how BERT works
- OpenAI: Better Language Models and Their Implications - trained Transformer-based unsupervised language model that achieves state-of-the-art on many language benchmarks with focus on text generation. Controversial limited release. February 14, 2019.
- ChatGPT User Experience: Implications for Education
- New Modes of Learning Enabled by AI Chatbots: Three Methods and Assignments
- Educators Battle Plagiarism As 89% Of Students Admit To Using OpenAI’s ChatGPT For Homework
- ChatGPT: Educational friend or foe? - Pasek and Blinkoff (Temple University). January 2023.
- Don’t Ban ChatGPT in Schools. Teach With It.
- ChatGPT and the Future of Business Education
- Udemy course (January 2023). ChatGPT for Teachers in Education.
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- ChatGPT launch blog
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- Awesome ChatGPT Prompts
- A Primer in BERTology: What we know about how BERT works
- A review of BERT based models
- Machines Beat Humans on a Reading Test. But Do They Understand?
- WHAT EVERY NLP ENGINEER NEEDS TO KNOW ABOUT PRE-TRAINED LANGUAGE MODELS
- ChatGPT: Educational friend or foe? - Pasek and Blinkoff (Temple University). January 2023.
-
Word and Document Embeddings
- Learned in Translation: Contextualized Word Vectors
- Universal Language Model Fine-tuning for Text Classification
- Deep Contextualized Word Represenations - tf)
- Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
- A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks - task learning approach for a set of interrelated NLP tasks. Presented at AAAI conference in January 2019.[Implementation code](https://github.com/huggingface/hmtl).
- The Current Best of Universal Word Embeddings and Sentence Embeddings
- The Amazing Power of Word Vectors
- Contextual String Embeddings for Sequence Labeling
- Skip Thought Vectors
- From Word Embeddings To Document Distances
- sense2vec
- An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec
- An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
- Document Embedding with Paragraph Vectors
- GloVe Word Embeddings Demo
- Text Classification With Word2Vec
- Document Embedding
- Word Embeddings, Bias in ML, Why You Don't Like Math, & Why AI Needs You
- Word Vectors in Natural Language Processing: Global Vectors (GloVe)
- Doc2Vec Tutorial on the Lee Dataset
- Word Embeddings in Python with SpaCy and Gensim
- An Idiot’s Guide to Word2vec Natural Language Processing
- Word2vec: fish + music = bass
- Universal Sentence Encoder Visually Explained
- NLP's ImageNet moment has arrived - trained NLP language models, drawing parallels to ImageNet's contributions to computer vision.
- Get Busy with Word Embeddings- An Introduction (February 2018)
- An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec
- Text Classification With Word2Vec
- From Word Embeddings To Document Distances
- Contextual String Embeddings for Sequence Labeling
- Sequence to Sequence Learning with Neural Networks
- An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
- An Idiot’s Guide to Word2vec Natural Language Processing
- Document Embedding
- Doc2Vec Tutorial on the Lee Dataset
- Word2vec: fish + music = bass
- Universal Sentence Encoder Visually Explained
-
Concept Analysis/Topic Modeling <a id="concept-analysis"></a>
- COTA, Uber’s topic modelling approach to improving customer support
- Probabilistic Topic Models
- Topic models: Past, present, and future
- Word vectors using LSA, Part - 2
- LEGO color themes as topic models
- How our startup switched from Unsupervised LDA to Semi-Supervised GuidedLDA
- Topic Modeling with LSA, PLSA, LDA & lda2Vec
- text2vec's Description of Topic Models
- Topic Modelling Portal
- Applications of Topic Models
- MACS 30500: Text analysis: topic modeling
- Topic Modelling The Legal Subject Matter And Judicial Activity Of The High Court Of Australia, 1903–2015
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Probabilistic Topic Models
- text2vec's Description of Topic Models
- Topic Modelling The Legal Subject Matter And Judicial Activity Of The High Court Of Australia, 1903–2015
- Topic models: Past, present, and future
- Word vectors using LSA, Part - 2
- LEGO color themes as topic models
- How our startup switched from Unsupervised LDA to Semi-Supervised GuidedLDA
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
-
General <a id="general-articles"></a>
- Natural Language Processing Tutorial
- Natural language based financial forecasting: a survey
- Futures of text
- Natural Language Processing is Fun! How computers understand Human Language
- NLP in healthcare
- AI Harvard Business Review
- Why Accuracy in Natural Language Processing is Crucial to the Future of AI in Retail
- WEF Live Campaign - Twitter fed Global News Topics & Sentiment Tracker - Live Jan 2019
- From Natural Language to Calendar Entries, with Clojure
- Ask HN: How Can I Get into NLP (Natural Language Processing)?
- Ask HN: What are the best tools for analyzing large bodies of text?
- Quora: How do I learn Natural Language Processing?
- Quora Topic: Natural Language Processing
- The Definitive Guide to Natural Language Processing
- R or Python on Text Mining
- Where to start in Text Mining
- Text Mining in R and Python: 8 Tips To Get Started
- Mining Twitter Data with Python (Part 1: Collecting Data)
- Why Text Mining May Be The Next Big Thing
- SAS CEO offers analytics over BI, reveals use cases for text analytics
- Value and benefits of text mining
- Text Mining South Park - A Text Mining blog which covers on a variety of topics.
- An Introduction to Text Mining using Twitter Streaming API and Python
- How To Get Into Natural Language Processing
- Comparison of the Most Useful Text Processing APIs
- 5 Heroic Tools for Natural Language Processing
- Natural Language Processing unlocks hidden data to transform healthcare efficiency, quality and cost
- Extracting medical problems from electronic clinical documents
- How to Write a Spelling Corrector - by Peter Norvig
- Using AI to unleash the power of unstructured government data - to-comprehend primer and background on NLP, and the various applications NLP could be used on unstructured Government text data. The article includes many US Government examples on how NLP is currently deployed across different domains (e.g. to help analyze public feedback/sentiment analysis/topic modelling, to improve forensic investigations, to aid in Government policy-making and regulatory compliance). The key point is to apply different NLP techniques to explore and uncover key Government intelligence insights.
- Extracting Features of Entertainment Products: A Guided Latent Dirichlet Allocation Approach Informed by the Psychology of Media Consumption - level consumption.” This academic article provides both a framework and managerial implications that suggest the application of LDA and NLP for feature extraction in entertainment products that can aid in traditional content-based consumer behavior models, and relevant marketing models applied to the media and entertainment industry.
- Lessons learned building natural language processing systems in health care
- How Algorithms Know What You’ll Type Next
- Natural Language Processing: An Introduction
- Betty: a friendly English-like interface for your command line.
- 100 Must-Read NLP Papers
- NLP in healthcare
- From Natural Language to Calendar Entries, with Clojure
- Quora: How do I learn Natural Language Processing?
- Natural Language Processing: An Introduction
- Crowdsourcing Ground Truth for Medical Relation Extraction
- Using AI to unleash the power of unstructured government data - to-comprehend primer and background on NLP, and the various applications NLP could be used on unstructured Government text data. The article includes many US Government examples on how NLP is currently deployed across different domains (e.g. to help analyze public feedback/sentiment analysis/topic modelling, to improve forensic investigations, to aid in Government policy-making and regulatory compliance). The key point is to apply different NLP techniques to explore and uncover key Government intelligence insights.
- Modern Deep Learning Techniques Applied to Natural Language Processing
- The Definitive Guide to Natural Language Processing
- Futures of text
- Text Mining in R and Python: 8 Tips To Get Started
- An introduction to text analysis with Python, Part 1
- Why Text Mining May Be The Next Big Thing
- SAS CEO offers analytics over BI, reveals use cases for text analytics
- Text Mining South Park - A Text Mining blog which covers on a variety of topics.
- Natural Language Processing: An Introduction
- Natural Language Processing blog
- An Introduction to Text Mining using Twitter Streaming API and Python
- How To Get Into Natural Language Processing
- Creating machine learning models to analyze startup news - Part1 - machine-learning-models-analyze-news/). [Part 3](https://monkeylearn.com/blog/analyzing-startup-news-with-machine-learning/).
- 5 Heroic Tools for Natural Language Processing
- Natural Language Processing unlocks hidden data to transform healthcare efficiency, quality and cost
- Natural Language Processing (NLP) for Machine Learning
- Lessons learned building natural language processing systems in health care
- How Algorithms Know What You’ll Type Next
-
Biases in NLP
- Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
- AI bias: It is the responsibility of humans to ensure fairness
- Venturebeat Blogpost - Gender biases in datasets - Based on UCLA research paper "Learning Gender Neutral Word Embeddings" Aug 2018.
- Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
- Venturebeat Blogpost - Gender biases in datasets - Based on UCLA research paper "Learning Gender Neutral Word Embeddings" Aug 2018.
- Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
- AI bias: It is the responsibility of humans to ensure fairness
-
Deep Learning
- MATRIX CAPSULES WITH EM ROUTING
- Multi-Task Deep Neural Networks for Natural Language Understanding
- Keras LSTM tutorial – How to easily build a powerful deep learning language model
- Deep Learning for Natural Language Processing: Tutorials with Jupyter Notebooks
- A Survey of the Usages of Deep Learning in Natural Language Processing
- Sequence Classification with Human Attention - tracking corpora to regularize attention in recurrent neural networks (RNN). [Implementation code](https://github.com/coastalcph/Sequence_classification_with_human_attention).
- Tutorial on Text Classification (NLP) using ULMFiT and fastai Library in Python
- Deep Learning for Sentiment Analysis : A Survey
- A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING
- Investigating Capsule Networks with Dynamic Routing for Text Classification
- Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction
- Identifying Aggression and Toxicity in Comments using Capsule Network
- Dynamic Routing Between Capsules
- Microsoft: Multi-Task Deep Neural Network (MT-DNN)
- Natural Language Processing Tutorial for Deep Learning Researchers
- A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING
- Keras LSTM tutorial – How to easily build a powerful deep learning language model
- Sequence Classification with Human Attention - tracking corpora to regularize attention in recurrent neural networks (RNN). [Implementation code](https://github.com/coastalcph/Sequence_classification_with_human_attention).
- Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction
- Identifying Aggression and Toxicity in Comments using Capsule Network
- Investigating Capsule Networks with Dynamic Routing for Text Classification
- Dynamic Routing Between Capsules
- NEURAL READING COMPREHENSION AND BEYOND - Reading comprehension models built on top of deep neural networks.
- TWITTER SENTIMENT ANALYSIS USING CAPSULE NETS AND GRU
-
Text Summarization
-
Cleaning
- Feature Extraction, Basic Pre-processing, and Advanced Processing
- How to solve 90% of NLP problems: a step-by-step guide
- Text Preprocessing in Python: Steps, Tools, and Examples
- How to Clean Text for Machine Learning with Python - by-step guide of how to perform text data pre-processing.
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- Text Preprocessing in Python: Steps, Tools, and Examples
-
Stop Words
-
Stemming
-
Dimensionality Reduction
- Taming Text with the SVD
- Dimensionality Reduction for Bag-of-Words Models: PCA vs LSA
- An introduction to Bag of Words and how to code it in Python for NLP
- Dimensionality Reduction for Bag-of-Words Models: PCA vs LSA
- Bag of Words and Tf-idf Explained
- Bag of Words and Tf-idf Explained
- An introduction to Bag of Words and how to code it in Python for NLP
-
Sarcasm Detection
- CASCADE: Contextual Sarcasm Detection in Online Discussion Forums
- A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks
- Detecting Sarcasm with Deep Convolutional Neural Networks
- Automatic Sarcasm Detection: A Survey
- CASCADE: Contextual Sarcasm Detection in Online Discussion Forums
- A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks
-
Knowledge Graphs
-
Entity and Information Extraction
- Entity Extraction and Network Analysis
- Natural Language Processing for Information Extraction
- NLP Techniques for Extracting Information - depth exploration of the seven steps framework of NLP data mining tools and techniques.
- NLP Techniques for Extracting Information - depth exploration of the seven steps framework of NLP data mining tools and techniques.
- Entity Extraction and Network Analysis
- NLP Techniques for Extracting Information - depth exploration of the seven steps framework of NLP data mining tools and techniques.
-
Document Clustering and Document Similarity
- Text Clustering: Get quick insights from Unstructured Data
- Document Clustering
- Document Clustering: A Detailed Review
- Text mining and sentiment analysis on video game user reviews using SAS® Enterprise Miner
- Who wrote the anti-Trump New York Times op-ed? Using tidytext to find document similarity
- Text Clustering: Get quick insights from Unstructured Data
- Document Clustering with Python
- Who wrote the anti-Trump New York Times op-ed? Using tidytext to find document similarity
- Document Clustering
- Text mining and sentiment analysis on video game user reviews using SAS® Enterprise Miner
-
Sentiment Analysis
- CACM: Techniques and Applications for Sentiment Analysis
- Lexicon-Based Methods for Sentiment Analysis - CAL (Semantic Orientation CALculator), a measure of subjectivity and opinion for sentimental analysis.
- That Sentimental Feeling - syuzhet-validation/).
- Unsupervised Sentiment Neuron
- Sentiment Analysis Tools Overview, Part 1. Positive and Negative Words Databases
- Twitter sentiment analysis using combined LSTM-CNN models
- A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts - based approach for sentiment analysis of Twitter posts, based on lexical resources such as SentiWordNet.
- On the negativity of negation
- Challenges in Sentiment Analysis
- A survey on sentiment analysis challenges - seven papers.
- Sentiment analysis on Trump's tweets using Python
- Donald Trump vs Hillary Clinton: sentiment analysis on Twitter mentions
- Does sentiment analysis work? A tidy analysis of Yelp reviews
- Twitter mood predicts the stock market
- Forbes: How Quant Traders Use Sentiment To Get An Edge On The Market
- Sentdex: Quantifying the Qualitative
- Harry Plotter: Celebrating the 20 year anniversary with tidytext and the tidyverse in R
- Cannes Lions 2017: Hungerithm, Mars Chocolate Australia (Clemenger BBDO, Melbourne)
- What Your Boss Could Learn by Reading the Whole Company’s Emails - mails). Text analytics and NLP have become an increasingly popular approach to help search for clues that may indicate the level of employee engagement in the workplace, and any potential ‘red-flags’ that should receive particular attention by an organization and its ethical implications.
- Aspect Based Sentiment Analysis of Amazon Product Reviews
- Sentiment Analysis of 2.2 million tweets from Super Bowl 51
- Emotion and Sentiment Analysis: A Practitioner’s Guide to NLP
- Streaming Analytics Tutorial on Azure
- How to Analyze sentiment in Azure
- how-to-perform-sentiment-analysis-using-python-tutorial/
- Twitter Sentiment Analysis Overview - by-step walkthrough on how to perform sentiment analysis using TextBlob.
- Twitter Sentiment Analysis in Python using TextBlob
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Current State of Text Sentiment Analysis from Opinion to Emotion Mining
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Breakthrough Research Papers and Models for Sentiment Analysis
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- Sentiment analysis, Concept analysis and Applications
- Sentiment analysis: 10 applications and 4 services
- ELMO embeddings in Keras using Tensorflow Hub
- ELMO embeddings in Keras using Tensorflow Hub
- ELMO embeddings in Keras using Tensorflow Hub
- ELMO embeddings in Keras using Tensorflow Hub
- ELMO embeddings in Keras using Tensorflow Hub
- ELMO embeddings in Keras using Tensorflow Hub
- ELMO embeddings in Keras using Tensorflow Hub
- Trump2Cash: A stock trading bot powered by Trump tweets - traded companies. A [related blog article](https://medium.com/@maxbraun/this-machine-turns-trump-tweets-into-planned-parenthood-donations-4ece8301e722#.3232hx7gx) describes a bot that turns Trump's tweets into Planned Parenthood donations.
- CACM: Techniques and Applications for Sentiment Analysis
- Current State of Text Sentiment Analysis from Opinion to Emotion Mining
- Sentiment analysis, Concept analysis and Applications
- On the negativity of negation
- Does sentiment analysis work? A tidy analysis of Yelp reviews
- Sentdex: Quantifying the Qualitative
- Sentiment Analysis of 2.2 million tweets from Super Bowl 51
- Lexicon-Based Methods for Sentiment Analysis - CAL (Semantic Orientation CALculator), a measure of subjectivity and opinion for sentimental analysis.
- Sentiment Analysis Tools Overview, Part 1. Positive and Negative Words Databases
- Data Science 101: Sentiment Analysis in R Tutorial
- That Sentimental Feeling - syuzhet-validation/).
- Forbes: How Quant Traders Use Sentiment To Get An Edge On The Market
- Unsupervised Sentiment Analysis with Signed Social Networks
- Sentiment analysis, Concept analysis and Applications
- Breakthrough Research Papers and Models for Sentiment Analysis
- Twitter sentiment analysis using combined LSTM-CNN models
- VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text - based model of sentiment analysis.
- A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts - based approach for sentiment analysis of Twitter posts, based on lexical resources such as SentiWordNet.
- From tweets to polls: Linking text sentiment to public opinion time series
- Lost at Sea: How Social Media is Helping Cruise Lines Attract Millennials
- Streaming Analytics Tutorial on Azure
- How to Analyze sentiment in Azure
- how-to-perform-sentiment-analysis-using-python-tutorial/
- Twitter Sentiment Analysis Overview - by-step walkthrough on how to perform sentiment analysis using TextBlob.
- ELMO embeddings in Keras using Tensorflow Hub
- Twitter Sentiment Analysis in Python using TextBlob
-
Fuzzy Matching, Probabilistic Matching, Record Linkage, Etc. <a id="fuzzy-matching"></a>
- agrep method in R
- fuzzywuzzy package in R
- Fuzzy String Matching – a survival skill to tackle unstructured information
- Fuzzy merge in R
- Learning Text Similarity with Siamese Recurrent Networks
- Dedupe - resolution.
- recordlinkage
- R package fastLink: Fast Probabilistic Record Linkage
- Learning Text Similarity with Siamese Recurrent Networks
- agrep method in R
- Fuzzy String Matching – a survival skill to tackle unstructured information
-
Scraping
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Scraping HTML using Scrapy
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Extract text from any document; no muss, no fuss.
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Using Scrapy to Build your Own Dataset
- Extract text from any document; no muss, no fuss.
- Extract text from any document; no muss, no fuss.
- Scraping HTML using Scrapy
-
-
Online courses
-
Knowledge Graphs
- Udacity: Natural Language Processing Nanodegree
- Coursera: Nartual Language Processing
- Coursera: Applied Text Mining in Python
- CMU CS 11-747: Neural Network for NLP
- Machine Translation: Spring 2016
- Stanford CS 224N / Ling 284
- Udemy: Deep Learning and NLP A-Z™: How to create a ChatBot
- Udemy: Natural Language Processing with Deep Learning in Python
- Udemy: NLP - Natural Language Processing with Python
- Udemy: Deep Learning: Advanced NLP and RNNs
- Udemy: Natural Language Processing and Text Mining Without Coding
- Coursera: Sequence Models for Time Series and Natural Language Processing
- Coursera: Coursera: Clinical Natural Language Processing
- DataCamp: Natural Language Processing Fundamentals in Python
- DataCamp: Sentiment Analysis in R: The Tidy Way
- DataCamp: Text Mining: Bag of Words
- DataCamp: Building Chatbots in Python
- DataCamp: Advanced NLP with spaCy
- Natural Language Processing | Dan Jurafsky, Christopher Manning
- UT CS 388: Natural Language Processing
- Columbia: COMS W4705: Natural Language Processing
- Columbia: COMS E6998: Machine Learning for Natural Language Processing (Spring 2012)
- Big Data University: Advanced Text Analytics – Getting Results with SystemT
- Courses for "natural language processing" on Coursera
- Deep Learning Drizzle
- Deep Learning for NLP
- YSDA NLP course
- CMU CS 11-747: Neural Network for NLP
- Natural Language Processing | Dan Jurafsky, Christopher Manning
- CMU Language and Statistics II: (More) Empirical Methods in Natural Language Processing
- Columbia: COMS W4705: Natural Language Processing
- Columbia: COMS E6998: Machine Learning for Natural Language Processing (Spring 2012)
- Machine Translation: Spring 2016
- Lecture Collection | Natural Language Processing with Deep Learning (Winter 2017)
- Courses for "natural language processing" on Coursera
- DataCamp: Natural Language Processing Fundamentals in Python
- DataCamp: Sentiment Analysis in R: The Tidy Way
- DataCamp: Text Mining: Bag of Words
- Natural Language Processing | Dan Jurafsky, Christopher Manning
- Commonlounge: Learn Natural Language Processing: From Beginner to Expert
- Big Data University: Advanced Text Analytics – Getting Results with SystemT
- edX: Natural Language Processing
-
-
Datasets
-
Knowledge Graphs
- Kaggle: UMICH SI650 - Sentiment Classification
- r/datasets
- Amazon product data
- Wikipedia: List of datasets for ML research
- The Big Bad NLP Database
- data.world's Text Datasets
- Insight Resources Datasets
- Consumer Complaint Database
- Sentiment Labelled Sentences Data Set
- Data is Plural
- FiveThirtyEight's datasets
- R's `datasets` package
- 200,000 Russian Troll Tweets - Released by Congress from Twitter suspended accounts and removed from public view.
- Lee's Similarity Data Sets
- Corpus of Presidential Speeches (CoPS) and a Clinton/Trump Corpus
- 15 Best Chatbot Datasets for Machine Learning
- A Survey of Available Corpora for Building Data-Driven Dialogue Systems
- First Quora Dataset Release: Question Pairs
- MIMIC
- Clinical NLP Dataset Repository - available clinical datasets for use in NLP research.
- Million Song Lyrics - Of-Words (BOW) format.
- Twitter US Airline Sentiment
- DuoRC - answer pairs with evaluation script for Paraphrased Reading Comprehension
- EDGAR Financial Statements
- American National Corpus Download
- Awesome Twitter
- Huggingface
- The Multi-Genre NLI Corpus
- 15 Best Chatbot Datasets for Machine Learning
- 15 Best Chatbot Datasets for Machine Learning
- nlp-datasets
- Hate-speech-and-offensive-language
- SWAG - scale dataset created for Natural Language Inference (NLI) with common-sense reasoning.
- Amazon product data
- Insight Resources Datasets
- Corpus of Presidential Speeches (CoPS) and a Clinton/Trump Corpus
- American National Corpus Download
- Leipzig Corpora Collection: Corpora in English, Arabic, French, Russian, German
- UCI's Text Datasets
- Sentiment Labelled Sentences Data Set
- A Survey of Available Corpora for Building Data-Driven Dialogue Systems
- Awesome Public Datasets' Natural Languge
- CBC News Coronavirus articles
-
Lexicons for Sentiment Analysis
-
-
Online Demos and Tools
-
Knowledge Graphs
- RegexPal
- Stanford Parser
- Stanford CoreNLP
- word2vec demo
- sense2vec: Semantic Analysis of the Reddit Hivemind
- Cognitive Computation Group - Part of Speech Tagging Demo - of-speech tagging, information extraction tasks etc.
- Cognitive Computation Group - Part of Speech Tagging Demo - of-speech tagging, information extraction tasks etc.
- Another word2vec demo
- Another word2vec demo
-
-
Blogs
-
Books
- Speech and Language Processing
- How To Label Data
- Getting Started with Natural Language Processing
- Natural Language Processing with Transformers, Revised Edition
- Neural Network Methods in Natural Language Processing
- Mastering Text Mining with R
- Text Mining in Practice with R
- Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (NLP) Applications
- Practical Natural Language Processing
- Natural Language Processing with PyTorch
- Python Natural Language Processing
- Natural Language Processing: Python and NLTK
- Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning
- Applied Natural Language Processing With Python
- Taming Text: How to Find, Organize, and Manipulate It - on guide to learn innovative tools and techniques for finding, organizing, and manipulating unstructured text.
- Foundations of Statistical Natural Language Processing
- Language Processing with Perl and Prolog: Theories, Implementation, and Application (Cognitive Technologies)
- Handbook of Natural Language Processing
- Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications
- Fundamentals of Predictive Text Mining
- Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More
- Neural Network Methods for Natural Language Processing
- Text Mining: A Guidebook for the Social Sciences
- Practical Text Analytics: Interpreting Text and Unstructured Data for Business Intelligence
- Machine Learning for Text (2018)
- Natural Language Processing in Spanish
- Foundations of Computational Linguistics Human-Computer Communication in Natural Language
- Statistical Methods for Speech Recognition
- Practical Text Analytics: Interpreting Text and Unstructured Data for Business Intelligence
- Mastering Text Mining with R
- Statistical Methods for Speech Recognition
- Foundations of Statistical Natural Language Processing
- An introduction for information retrieval
- Mastering Text Mining with R
- Natural Language Processing with PyTorch
- Python Natural Language Processing
- Mastering Natural Language Processing with Python
- Natural Language Processing: Python and NLTK
- Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning
- Statistical Methods for Speech Recognition
- Deep Learning with Text
- Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications
-
Major NLP Conferences
-
Knowledge Graphs
- NeurIPS
- Association for Computational Linguistics (ACL)
- Empirical Methods in Natural Language Processing (EMNLP)
- North American Chapter of the Association for Computational Linguistics (NAACL)
- European Chapter of the Association for Computational Linguistics (EACL)
- International Conference on Computational Linguistics(COLING)
-
-
Benchmarks
-
Knowledge Graphs
- SQuAD 1.0 paper
- SQuAD 2.0 paper
- GLUE leaderboard
- GLUE paper - sentence tasks (e.g. check if grammar is correct, sentiment analysis), similarity and paraphrase tasks (e.g. determine if two questions are equivalent), and inference tasks (e.g. determine whether a premise contradicts a hypothesis).
- SQuAD leaderboard - performing NLP models on the Stanford Question Answering Dataset (SQuAD).
- SQuAD 1.0 paper
- SQuAD 2.0 paper
- GLUE paper - sentence tasks (e.g. check if grammar is correct, sentiment analysis), similarity and paraphrase tasks (e.g. determine if two questions are equivalent), and inference tasks (e.g. determine whether a premise contradicts a hypothesis).
- GLUE leaderboard
-
-
Misc
-
Lexicons for Sentiment Analysis
- AskReddit: People with a mother tongue that isn't English, what are the most annoying things about the English language when you are trying to learn it?
- Funny Video: Emotional Spell Check
- Detecting Gang-Involved Escalation on Social Media Using Context
- Reasoning about Actions and State Changes by Injecting Commonsense Knowledge - scale corp
- The Language of Hip Hop
- Using Natural Language Processing for Automatic Detection of Plagiarism
- Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing
- Human Emotion
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- How to win Kaggle competition based on NLP task, if you are not an NLP expert
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- Detecting Gang-Involved Escalation on Social Media Using Context
- Reasoning about Actions and State Changes by Injecting Commonsense Knowledge - scale corp
-
Programming Languages
Categories
Sub Categories
Knowledge Graphs
249
Sentiment Analysis
120
Q&A Systems, Chatbots <a id="qa-systems"></a>
109
Concept Analysis/Topic Modeling <a id="concept-analysis"></a>
66
General <a id="general-articles"></a>
60
Transformers and Language Models
60
Lexicons for Sentiment Analysis
49
Word and Document Embeddings
37
Document Classification
34
Cleaning
29
Scraping
25
Deep Learning
24
Fuzzy Matching, Probabilistic Matching, Record Linkage, Etc. <a id="fuzzy-matching"></a>
11
Document Clustering and Document Similarity
10
Machine Translation
9
Biases in NLP
7
Dimensionality Reduction
7
Sarcasm Detection
6
Stemming
6
Entity and Information Extraction
6
Text Summarization
5
Stop Words
3
Keywords
natural-language-processing
10
nlp
9
machine-learning
8
python
6
text-mining
4
pdf
3
word-embeddings
3
pytorch
3
deep-learning
3
tensorflow
3
text-visualization
2
topic-modeling
2
computational-social-science
2
transformer
2
record-linkage
2
python-library
2
twitter
2
dataset
2
entity-resolution
2
dedupe
2
r
1
tidy-data
1
tidyverse
1
dedupe-library
1
data-extraction
1
extract
1
java
1
layout
1
pdfbox
1
text
1
attention
1
bert
1
paper
1
tutorial
1
bot
1
trading
1
trump
1
leaderboard
1
visual-analysis
1
bots
1
chatbot
1
chatgpt
1
chatgpt-api
1
language
1
artificial-intelligence-algorithms
1
artificial-neural-networks
1
bayesian-statistics
1
computer-vision
1
deep-neural-networks
1
deep-reinforcement-learning
1