text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
https://github.com/stepthom/text_mining_resources
Last synced: 2 days ago
JSON representation
-
APIs and Libraries
-
Knowledge Graphs
- R packages
- tm
- lsa
- lda
- textir
- corpora
- tau
- sentimentr - based sentiment analysis.
- cleanNLP - based sentiment analysis.
- RSentiment - based sentiment analysis. Contains support for negation detection and sarcasm.
- text2vec - friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities.
- LDAvis
- keras - level neural networks 'API'. ([RStudio Blog: TensorFlow for R](https://blog.rstudio.com/2018/02/06/tensorflow-for-r))
- retweet - recipes/))
- topicmodels
- textmineR
- gtrendsR
- Analyzing Google Trends Data in R
- textstem
- NLPutils
- Udpipe
- Python modules
- Tutorial
- Spark NLP - grade, scalable, and trainable versions of the latest research in natural language processing.
- spaCy - Strength Natural Language Processing in Python.
- textblob
- Natural Language Basics with TextBlob
- Gensim
- textmining
- Beautiful Soup
- embeddings
- fastText
- polyglot
- Apache Spark - purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
- MLlib
- LDA
- TFIDF - inverse document frequency
- HDF5
- h5py
- Introduction to StanfordNLP: An Incredible State-of-the-Art NLP Library for 53 Languages (with Python code)
- Stanford Parser
- Stanford POS Tagger - of-Speech tagger.
- Stanford Named Entity Recognizer
- Stanford Classifier
- Stanford Topic Modeling Toolbox
- Apache OpenNLP
- TextRazor API
- Comparison of Top 6 Python NLP Libraries
- pyCaret's NLP Module - code machine learning library in Python that aims to reduce the cycle time from hypothesis to insights; also, PyCaret's Founder Moez Ali is a Smith Alumni - MMA 2020.
- tidytext
- Sentiment140
- retweet - recipes/))
- wordVectors
- Video: NLTK with Python 3 for Natural Language Processing
- scikit-learn
- textmining
- lda2vec
- sent2vec
- flair - of-the-art Natural Language Processing (NLP)
- word_forms - -> "elect", "electoral", "electorate" etc.
- AllenNLP - source NLP research library, built on PyTorch.
- BigARTM
- Scattertext
- Google Seq2Seq - purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summarization, Conversational Modeling, Image Captioning, and more.
- Glove-Python
- Keras-BERT
- Paragraph embedding scripts and Pre-trained models - trained Doc2Vec and Word2Vec models
- Texthero
- Streamcrab - Time, Twitter sentiment analyzer engine http:/www.streamcrab.com
- fastText
- Pattern.en - of-speech tagger for English, sentiment analysis, tools for English verb conjugation and noun singularization & pluralization, and a WordNet interface.
- Tutorial
- Bert As A Service - length sentence to a fixed-length vector. Design intent to provide a scalable production ready service, also allowing researchers to apply BERT quickly.
- fastTextR
- NLPutils
- NLTK
- Natural Language Basics with TextBlob
- textmining
- PyText - learning based NLP modeling framework built on PyTorch.
- embeddings
- textacy
- Apache Tika
- HDF5
- h5py
- Stanford CoreNLP
- Stanford Parser
- Stanford POS Tagger - of-Speech tagger.
- Stanford Named Entity Recognizer
- Stanford Classifier
- Stanford OpenIE
-
-
Benchmarks
-
Knowledge Graphs
- SQuAD 1.0 paper
- SQuAD 2.0 paper
- GLUE leaderboard
- GLUE paper - sentence tasks (e.g. check if grammar is correct, sentiment analysis), similarity and paraphrase tasks (e.g. determine if two questions are equivalent), and inference tasks (e.g. determine whether a premise contradicts a hypothesis).
- SQuAD leaderboard - performing NLP models on the Stanford Question Answering Dataset (SQuAD).
- SQuAD 1.0 paper
- SQuAD 2.0 paper
- GLUE paper - sentence tasks (e.g. check if grammar is correct, sentiment analysis), similarity and paraphrase tasks (e.g. determine if two questions are equivalent), and inference tasks (e.g. determine whether a premise contradicts a hypothesis).
- GLUE leaderboard
-
-
Blog Articles, Papers, Case Studies
-
Biases in NLP
- AI bias: It is the responsibility of humans to ensure fairness
- Venturebeat Blogpost - Gender biases in datasets - Based on UCLA research paper "Learning Gender Neutral Word Embeddings" Aug 2018.
- Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
- Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
- Venturebeat Blogpost - Gender biases in datasets - Based on UCLA research paper "Learning Gender Neutral Word Embeddings" Aug 2018.
- Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
- AI bias: It is the responsibility of humans to ensure fairness
-
Cleaning
- Feature Extraction, Basic Pre-processing, and Advanced Processing
- How to solve 90% of NLP problems: a step-by-step guide
- Text Preprocessing in Python: Steps, Tools, and Examples
- How to Clean Text for Machine Learning with Python - by-step guide of how to perform text data pre-processing.
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- Text Preprocessing in Python: Steps, Tools, and Examples
-
Concept Analysis/Topic Modeling <a id="concept-analysis"></a>
- Topic models: Past, present, and future
- Word vectors using LSA, Part - 2
- Probabilistic Topic Models
- LEGO color themes as topic models
- How our startup switched from Unsupervised LDA to Semi-Supervised GuidedLDA
- Topic Modeling with LSA, PLSA, LDA & lda2Vec
- text2vec's Description of Topic Models
- Topic Modelling Portal
- Applications of Topic Models
- MACS 30500: Text analysis: topic modeling
- COTA, Uber’s topic modelling approach to improving customer support
- Topic Modelling The Legal Subject Matter And Judicial Activity Of The High Court Of Australia, 1903–2015
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Probabilistic Topic Models
- text2vec's Description of Topic Models
- Topic Modelling The Legal Subject Matter And Judicial Activity Of The High Court Of Australia, 1903–2015
- LEGO color themes as topic models
- Topic models: Past, present, and future
- Word vectors using LSA, Part - 2
- How our startup switched from Unsupervised LDA to Semi-Supervised GuidedLDA
- Using LDA Topic Models as a Classification Model Input
-
Programming Languages
Categories
Sub Categories
Knowledge Graphs
266
Sentiment Analysis
120
Q&A Systems, Chatbots <a id="qa-systems"></a>
109
Concept Analysis/Topic Modeling <a id="concept-analysis"></a>
66
Transformers and Language Models
62
General <a id="general-articles"></a>
59
Lexicons for Sentiment Analysis
52
Word and Document Embeddings
38
Document Classification
35
Cleaning
29
Scraping
25
Deep Learning
24
Fuzzy Matching, Probabilistic Matching, Record Linkage, Etc. <a id="fuzzy-matching"></a>
11
Document Clustering and Document Similarity
10
Machine Translation
9
Biases in NLP
7
Dimensionality Reduction
7
Sarcasm Detection
6
Stemming
6
Entity and Information Extraction
6
Text Summarization
5
Stop Words
3
Keywords
natural-language-processing
10
nlp
9
machine-learning
8
python
6
text-mining
4
pdf
3
word-embeddings
3
pytorch
3
deep-learning
3
tensorflow
3
text-visualization
2
topic-modeling
2
computational-social-science
2
transformer
2
record-linkage
2
python-library
2
twitter
2
dataset
2
entity-resolution
2
dedupe
2
r
1
tidy-data
1
tidyverse
1
dedupe-library
1
data-extraction
1
extract
1
java
1
layout
1
pdfbox
1
text
1
attention
1
bert
1
paper
1
tutorial
1
bot
1
trading
1
trump
1
leaderboard
1
visual-analysis
1
bots
1
chatbot
1
chatgpt
1
chatgpt-api
1
language
1
artificial-intelligence-algorithms
1
artificial-neural-networks
1
bayesian-statistics
1
computer-vision
1
deep-neural-networks
1
deep-reinforcement-learning
1