text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
https://github.com/stepthom/text_mining_resources
Last synced: 8 days ago
JSON representation
-
Blog Articles, Papers, Case Studies
-
Transformers and Language Models
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- ChatGPT launch blog
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A review of BERT based models
- BERT Explained - State of the art language model for NLP
- A Primer in BERTology: What we know about how BERT works
- Awesome ChatGPT Prompts
- Hugging Face's course on Transformer Models
-
Word and Document Embeddings
- The Current Best of Universal Word Embeddings and Sentence Embeddings
- An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec
- An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
- Document Embedding with Paragraph Vectors
- GloVe Word Embeddings Demo
- Text Classification With Word2Vec
- Document Embedding
- From Word Embeddings To Document Distances
- Word Embeddings, Bias in ML, Why You Don't Like Math, & Why AI Needs You
- Word Vectors in Natural Language Processing: Global Vectors (GloVe)
- Doc2Vec Tutorial on the Lee Dataset
- Word Embeddings in Python with SpaCy and Gensim
- Deep Contextualized Word Represenations - tf)
- Universal Language Model Fine-tuning for Text Classification
- Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
- Learned in Translation: Contextualized Word Vectors
- sense2vec
- Skip Thought Vectors
- The Amazing Power of Word Vectors
- Contextual String Embeddings for Sequence Labeling
- A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks - task learning approach for a set of interrelated NLP tasks. Presented at AAAI conference in January 2019.[Implementation code](https://github.com/huggingface/hmtl).
- An Idiot’s Guide to Word2vec Natural Language Processing
- Word2vec: fish + music = bass
- Universal Sentence Encoder Visually Explained
- NLP's ImageNet moment has arrived - trained NLP language models, drawing parallels to ImageNet's contributions to computer vision.
- Get Busy with Word Embeddings- An Introduction (February 2018)
- An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec
- Text Classification With Word2Vec
- From Word Embeddings To Document Distances
- Contextual String Embeddings for Sequence Labeling
- Sequence to Sequence Learning with Neural Networks
- An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
- An Idiot’s Guide to Word2vec Natural Language Processing
-
-
Blogs
-
Books
- Mastering Text Mining with R
- Text Mining in Practice with R
- Natural Language Processing with Transformers, Revised Edition
- Getting Started with Natural Language Processing
- Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (NLP) Applications
- Practical Natural Language Processing
- Natural Language Processing with PyTorch
- Python Natural Language Processing
- Natural Language Processing: Python and NLTK
- Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning
- Applied Natural Language Processing With Python
- Taming Text: How to Find, Organize, and Manipulate It - on guide to learn innovative tools and techniques for finding, organizing, and manipulating unstructured text.
- Speech and Language Processing
- Foundations of Statistical Natural Language Processing
- Language Processing with Perl and Prolog: Theories, Implementation, and Application (Cognitive Technologies)
- Handbook of Natural Language Processing
- Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications
- Fundamentals of Predictive Text Mining
- Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More
- Neural Network Methods for Natural Language Processing
- Text Mining: A Guidebook for the Social Sciences
- Practical Text Analytics: Interpreting Text and Unstructured Data for Business Intelligence
- Neural Network Methods in Natural Language Processing
- Machine Learning for Text (2018)
- Natural Language Processing in Spanish
- Foundations of Computational Linguistics Human-Computer Communication in Natural Language
- Statistical Methods for Speech Recognition
- How To Label Data
- Practical Text Analytics: Interpreting Text and Unstructured Data for Business Intelligence
- Mastering Text Mining with R
- Statistical Methods for Speech Recognition
- Foundations of Statistical Natural Language Processing
- An introduction for information retrieval
- Mastering Text Mining with R
- Natural Language Processing with PyTorch
- Python Natural Language Processing
- Mastering Natural Language Processing with Python
- Natural Language Processing: Python and NLTK
- Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning
- Statistical Methods for Speech Recognition
-
Datasets
-
Knowledge Graphs
- data.world's Text Datasets
- Insight Resources Datasets
- Consumer Complaint Database
- Sentiment Labelled Sentences Data Set
- Amazon product data
- Data is Plural
- FiveThirtyEight's datasets
- r/datasets
- R's `datasets` package
- 200,000 Russian Troll Tweets - Released by Congress from Twitter suspended accounts and removed from public view.
- Wikipedia: List of datasets for ML research
- Kaggle: UMICH SI650 - Sentiment Classification
- Lee's Similarity Data Sets
- Corpus of Presidential Speeches (CoPS) and a Clinton/Trump Corpus
- 15 Best Chatbot Datasets for Machine Learning
- A Survey of Available Corpora for Building Data-Driven Dialogue Systems
- First Quora Dataset Release: Question Pairs
- MIMIC
- Clinical NLP Dataset Repository - available clinical datasets for use in NLP research.
- Million Song Lyrics - Of-Words (BOW) format.
- Twitter US Airline Sentiment
- DuoRC - answer pairs with evaluation script for Paraphrased Reading Comprehension
- EDGAR Financial Statements
- American National Corpus Download
- Awesome Twitter
- The Big Bad NLP Database
- Huggingface
- The Multi-Genre NLI Corpus
- 15 Best Chatbot Datasets for Machine Learning
- 15 Best Chatbot Datasets for Machine Learning
- Insight Resources Datasets
- Amazon product data
- Corpus of Presidential Speeches (CoPS) and a Clinton/Trump Corpus
- nlp-datasets
- Hate-speech-and-offensive-language
- SWAG - scale dataset created for Natural Language Inference (NLI) with common-sense reasoning.
- American National Corpus Download
- Leipzig Corpora Collection: Corpora in English, Arabic, French, Russian, German
- UCI's Text Datasets
- Sentiment Labelled Sentences Data Set
- A Survey of Available Corpora for Building Data-Driven Dialogue Systems
- Google Dataset Search
- The Best 25 Datasets for Natural Language Processing
- MIMIC
- Twitter US Airline Sentiment
- Santa Barbara Corpus of Spoken American English
- CBC News Coronavirus articles
- Huggingface
-
Lexicons for Sentiment Analysis
-
-
Major NLP Conferences
-
Knowledge Graphs
- NeurIPS
- Association for Computational Linguistics (ACL)
- Empirical Methods in Natural Language Processing (EMNLP)
- North American Chapter of the Association for Computational Linguistics (NAACL)
- European Chapter of the Association for Computational Linguistics (EACL)
- International Conference on Computational Linguistics(COLING)
-
-
Misc
-
Lexicons for Sentiment Analysis
- AskReddit: People with a mother tongue that isn't English, what are the most annoying things about the English language when you are trying to learn it?
- Funny Video: Emotional Spell Check
- Detecting Gang-Involved Escalation on Social Media Using Context
- Reasoning about Actions and State Changes by Injecting Commonsense Knowledge - scale corp
- The Language of Hip Hop
- Using Natural Language Processing for Automatic Detection of Plagiarism
- Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing
- Human Emotion
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- How to win Kaggle competition based on NLP task, if you are not an NLP expert
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
- A Complete Exploratory Data Analysis and Visualization for Text Data
-
Programming Languages
Categories
Sub Categories
Knowledge Graphs
253
Sentiment Analysis
110
Q&A Systems, Chatbots <a id="qa-systems"></a>
107
Concept Analysis/Topic Modeling <a id="concept-analysis"></a>
61
Transformers and Language Models
57
Lexicons for Sentiment Analysis
53
General <a id="general-articles"></a>
43
Word and Document Embeddings
33
Document Classification
31
Cleaning
29
Scraping
24
Deep Learning
22
Document Clustering and Document Similarity
10
Fuzzy Matching, Probabilistic Matching, Record Linkage, Etc. <a id="fuzzy-matching"></a>
9
Machine Translation
8
Stemming
6
Biases in NLP
6
Dimensionality Reduction
6
Sarcasm Detection
5
Entity and Information Extraction
5
Text Summarization
4
Stop Words
2
Keywords
natural-language-processing
10
nlp
9
machine-learning
8
python
6
text-mining
4
pdf
3
word-embeddings
3
pytorch
3
deep-learning
3
tensorflow
3
text-visualization
2
topic-modeling
2
computational-social-science
2
transformer
2
record-linkage
2
python-library
2
twitter
2
dataset
2
entity-resolution
2
dedupe
2
r
1
tidy-data
1
tidyverse
1
dedupe-library
1
data-extraction
1
extract
1
java
1
layout
1
pdfbox
1
text
1
attention
1
bert
1
paper
1
tutorial
1
bot
1
trading
1
trump
1
leaderboard
1
visual-analysis
1
bots
1
chatbot
1
chatgpt
1
chatgpt-api
1
language
1
artificial-intelligence-algorithms
1
artificial-neural-networks
1
bayesian-statistics
1
computer-vision
1
deep-neural-networks
1
deep-reinforcement-learning
1