text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
https://github.com/stepthom/text_mining_resources
Last synced: about 24 hours ago
JSON representation
-
Datasets
-
Knowledge Graphs
- Wikipedia: List of datasets for ML research
- Kaggle: UMICH SI650 - Sentiment Classification
- Lee's Similarity Data Sets
- Corpus of Presidential Speeches (CoPS) and a Clinton/Trump Corpus
- 15 Best Chatbot Datasets for Machine Learning
- A Survey of Available Corpora for Building Data-Driven Dialogue Systems
- First Quora Dataset Release: Question Pairs
- MIMIC
- Clinical NLP Dataset Repository - available clinical datasets for use in NLP research.
- Million Song Lyrics - Of-Words (BOW) format.
- Twitter US Airline Sentiment
- DuoRC - answer pairs with evaluation script for Paraphrased Reading Comprehension
- EDGAR Financial Statements
- American National Corpus Download
- Awesome Twitter
- The Big Bad NLP Database
- Huggingface
- The Multi-Genre NLI Corpus
- 15 Best Chatbot Datasets for Machine Learning
- 15 Best Chatbot Datasets for Machine Learning
- Insight Resources Datasets
- Amazon product data
- Corpus of Presidential Speeches (CoPS) and a Clinton/Trump Corpus
- nlp-datasets
- Hate-speech-and-offensive-language
- SWAG - scale dataset created for Natural Language Inference (NLI) with common-sense reasoning.
- American National Corpus Download
- Leipzig Corpora Collection: Corpora in English, Arabic, French, Russian, German
- UCI's Text Datasets
- Sentiment Labelled Sentences Data Set
- A Survey of Available Corpora for Building Data-Driven Dialogue Systems
- MIMIC
- Google Dataset Search
- Leipzig Corpora Collection: Corpora in English, Arabic, French, Russian, German
- CBC News Coronavirus articles
-
Lexicons for Sentiment Analysis
-
-
Major NLP Conferences
-
Knowledge Graphs
- NeurIPS
- Association for Computational Linguistics (ACL)
- Empirical Methods in Natural Language Processing (EMNLP)
- North American Chapter of the Association for Computational Linguistics (NAACL)
- European Chapter of the Association for Computational Linguistics (EACL)
- International Conference on Computational Linguistics(COLING)
-
-
Misc
-
Lexicons for Sentiment Analysis
- AskReddit: People with a mother tongue that isn't English, what are the most annoying things about the English language when you are trying to learn it?
- Funny Video: Emotional Spell Check
- Detecting Gang-Involved Escalation on Social Media Using Context
- Reasoning about Actions and State Changes by Injecting Commonsense Knowledge - scale corp
- The Language of Hip Hop
- Using Natural Language Processing for Automatic Detection of Plagiarism
- Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing
- Human Emotion
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- How to win Kaggle competition based on NLP task, if you are not an NLP expert
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- Detecting Gang-Involved Escalation on Social Media Using Context
- Reasoning about Actions and State Changes by Injecting Commonsense Knowledge - scale corp
- Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing
-
-
Online courses
-
Knowledge Graphs
- Udemy: Deep Learning and NLP A-Z™: How to create a ChatBot
- Udemy: Natural Language Processing with Deep Learning in Python
- Udemy: NLP - Natural Language Processing with Python
- Udemy: Deep Learning: Advanced NLP and RNNs
- Udemy: Natural Language Processing and Text Mining Without Coding
- Stanford CS 224N / Ling 284
- Coursera: Applied Text Mining in Python
- Coursera: Nartual Language Processing
- Coursera: Sequence Models for Time Series and Natural Language Processing
- Coursera: Coursera: Clinical Natural Language Processing
- DataCamp: Natural Language Processing Fundamentals in Python
- DataCamp: Sentiment Analysis in R: The Tidy Way
- DataCamp: Text Mining: Bag of Words
- DataCamp: Building Chatbots in Python
- DataCamp: Advanced NLP with spaCy
- Natural Language Processing | Dan Jurafsky, Christopher Manning
- CMU CS 11-747: Neural Network for NLP
- UT CS 388: Natural Language Processing
- Columbia: COMS W4705: Natural Language Processing
- Columbia: COMS E6998: Machine Learning for Natural Language Processing (Spring 2012)
- Machine Translation: Spring 2016
- Big Data University: Advanced Text Analytics – Getting Results with SystemT
- Udacity: Natural Language Processing Nanodegree
- Courses for "natural language processing" on Coursera
- Deep Learning Drizzle
- Natural Language Processing | Dan Jurafsky, Christopher Manning
- Deep Learning for NLP
- CMU CS 11-747: Neural Network for NLP
- YSDA NLP course
- CMU Language and Statistics II: (More) Empirical Methods in Natural Language Processing
- Columbia: COMS W4705: Natural Language Processing
- Columbia: COMS E6998: Machine Learning for Natural Language Processing (Spring 2012)
- Machine Translation: Spring 2016
- DataCamp: Sentiment Analysis in R: The Tidy Way
- DataCamp: Natural Language Processing Fundamentals in Python
- DataCamp: Text Mining: Bag of Words
- Lecture Collection | Natural Language Processing with Deep Learning (Winter 2017)
- Courses for "natural language processing" on Coursera
- Natural Language Processing | Dan Jurafsky, Christopher Manning
- Commonlounge: Learn Natural Language Processing: From Beginner to Expert
- Big Data University: Advanced Text Analytics – Getting Results with SystemT
- edX: Natural Language Processing
-
-
Online Demos and Tools
-
Knowledge Graphs
- Stanford Parser
- Stanford CoreNLP
- word2vec demo
- sense2vec: Semantic Analysis of the Reddit Hivemind
- RegexPal
- Cognitive Computation Group - Part of Speech Tagging Demo - of-speech tagging, information extraction tasks etc.
- Cognitive Computation Group - Part of Speech Tagging Demo - of-speech tagging, information extraction tasks etc.
- Another word2vec demo
- Another word2vec demo
- AllenNLP Demo
-
-
Other Curated Lists
-
Lexicons for Sentiment Analysis
- awesome-machine-learning
- Chinese NLP Tools
- Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found
- Awesome Deep Learning for Natural Language Processing (NLP)
- Association for Computational Linguistics Papers Anthology
- Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found
- awesome-nlp
-
-
Products
-
Knowledge Graphs
- Systran - Enterprise Translation Products
- SAS Sentiment Analysis
- STATISTICA
- Text Mining (Big Data, Unstructured Data)
- Gate
- Video: How IBM Watson learns (3 minutes)
- Video: IBM Watson on Jeapardy! (10 minutes)
- Video: IBM Watson: The Science Behind an Answer (7 minutes)
- Stocktwits
- Meltwater
- Lexalytics Sematria
- Alchemy API
- brat
- Ask Data by Tableau Software Inc. - on to help assist existing Tableau platform users with retrieving quick and easy data visualizations to drive business intelligence insights. Similar to a search engine user interface, Tableau’s Ask Data feature interface applies NLP from user text input to extract key words to find data analytics and business insights quickly on the Tableau Platform.
- Microsoft Azure Text Analytics
- Amazon Lex
- Amazon Comprehend
- Apache PDFBox
- SO: How to extract text from a PDF?
- Tools for Extracting Data and Text from PDFs - A Review
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- Lyrebird.ai - Realistic Voice Cloning and Text-to-Speech” recognition platform. This Canadian start-up has created a product/platform that syncs both voice cloning with text-to-speech. Lyrebird recognizes the intonations and voice patterns from audio recordings, and overlays text data input to recreate a text-to-speech audio file output from the selected voice pattern audio recording.
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- Dialogflow
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- How I used NLP (SpaCy) to screen Data Science Resumes
- LightTag Annotation Tool
- Anafora - based raw text annotation tool
- PDFLayoutTextStripper: Converts a pdf file into a text file while keeping the layout of the original pdf.
- pdftabextract: A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
- Tools for Extracting Data and Text from PDFs - A Review
- PyPDF2
- Ask Data by Tableau Software Inc. - on to help assist existing Tableau platform users with retrieving quick and easy data visualizations to drive business intelligence insights. Similar to a search engine user interface, Tableau’s Ask Data feature interface applies NLP from user text input to extract key words to find data analytics and business insights quickly on the Tableau Platform.
- SAS Sentiment Analysis
- RapidMiner
-
Programming Languages
Categories
Sub Categories
Knowledge Graphs
270
Sentiment Analysis
120
Q&A Systems, Chatbots <a id="qa-systems"></a>
109
Concept Analysis/Topic Modeling <a id="concept-analysis"></a>
66
Transformers and Language Models
62
General <a id="general-articles"></a>
60
Lexicons for Sentiment Analysis
52
Word and Document Embeddings
38
Document Classification
35
Cleaning
29
Scraping
25
Deep Learning
24
Fuzzy Matching, Probabilistic Matching, Record Linkage, Etc. <a id="fuzzy-matching"></a>
11
Document Clustering and Document Similarity
10
Machine Translation
9
Biases in NLP
7
Dimensionality Reduction
7
Sarcasm Detection
6
Stemming
6
Entity and Information Extraction
6
Text Summarization
5
Stop Words
3
Keywords
natural-language-processing
10
nlp
9
machine-learning
8
python
6
text-mining
4
pdf
3
word-embeddings
3
pytorch
3
deep-learning
3
tensorflow
3
text-visualization
2
topic-modeling
2
computational-social-science
2
transformer
2
record-linkage
2
python-library
2
twitter
2
dataset
2
entity-resolution
2
dedupe
2
r
1
tidy-data
1
tidyverse
1
dedupe-library
1
data-extraction
1
extract
1
java
1
layout
1
pdfbox
1
text
1
attention
1
bert
1
paper
1
tutorial
1
bot
1
trading
1
trump
1
leaderboard
1
visual-analysis
1
bots
1
chatbot
1
chatgpt
1
chatgpt-api
1
language
1
artificial-intelligence-algorithms
1
artificial-neural-networks
1
bayesian-statistics
1
computer-vision
1
deep-neural-networks
1
deep-reinforcement-learning
1