text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
https://github.com/stepthom/text_mining_resources
Last synced: 4 days ago
JSON representation
-
APIs and Libraries
-
Knowledge Graphs
- R packages
- tm
- lsa
- lda
- textir
- corpora
- tau
- sentimentr - based sentiment analysis.
- cleanNLP - based sentiment analysis.
- RSentiment - based sentiment analysis. Contains support for negation detection and sarcasm.
- text2vec - friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities.
- LDAvis
- keras - level neural networks 'API'. ([RStudio Blog: TensorFlow for R](https://blog.rstudio.com/2018/02/06/tensorflow-for-r))
- retweet - recipes/))
- topicmodels
- textmineR
- gtrendsR
- Analyzing Google Trends Data in R
- textstem
- NLPutils
- Udpipe
- Python modules
- Tutorial
- Spark NLP - grade, scalable, and trainable versions of the latest research in natural language processing.
- spaCy - Strength Natural Language Processing in Python.
- textblob
- Natural Language Basics with TextBlob
- Gensim
- textmining
- Beautiful Soup
- embeddings
- fastText
- polyglot
- Apache Spark - purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
- MLlib
- LDA
- TFIDF - inverse document frequency
- HDF5
- h5py
- Introduction to StanfordNLP: An Incredible State-of-the-Art NLP Library for 53 Languages (with Python code)
- Stanford Parser
- Stanford POS Tagger - of-Speech tagger.
- Stanford Named Entity Recognizer
- Stanford Classifier
- Stanford Topic Modeling Toolbox
- Apache OpenNLP
- TextRazor API
- Comparison of Top 6 Python NLP Libraries
- pyCaret's NLP Module - code machine learning library in Python that aims to reduce the cycle time from hypothesis to insights; also, PyCaret's Founder Moez Ali is a Smith Alumni - MMA 2020.
- tidytext
- Sentiment140
- retweet - recipes/))
- wordVectors
- Video: NLTK with Python 3 for Natural Language Processing
- scikit-learn
- textmining
- lda2vec
- sent2vec
- flair - of-the-art Natural Language Processing (NLP)
- word_forms - -> "elect", "electoral", "electorate" etc.
- AllenNLP - source NLP research library, built on PyTorch.
- BigARTM
- Scattertext
- Google Seq2Seq - purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summarization, Conversational Modeling, Image Captioning, and more.
- Glove-Python
- Keras-BERT
- Paragraph embedding scripts and Pre-trained models - trained Doc2Vec and Word2Vec models
- Texthero
- Streamcrab - Time, Twitter sentiment analyzer engine http:/www.streamcrab.com
- fastText
- Pattern.en - of-speech tagger for English, sentiment analysis, tools for English verb conjugation and noun singularization & pluralization, and a WordNet interface.
- Tutorial
- Bert As A Service - length sentence to a fixed-length vector. Design intent to provide a scalable production ready service, also allowing researchers to apply BERT quickly.
-
-
Benchmarks
-
Knowledge Graphs
- SQuAD 1.0 paper
- SQuAD 2.0 paper
- GLUE leaderboard
- GLUE paper - sentence tasks (e.g. check if grammar is correct, sentiment analysis), similarity and paraphrase tasks (e.g. determine if two questions are equivalent), and inference tasks (e.g. determine whether a premise contradicts a hypothesis).
- SQuAD leaderboard - performing NLP models on the Stanford Question Answering Dataset (SQuAD).
- SQuAD 1.0 paper
- SQuAD 2.0 paper
- GLUE paper - sentence tasks (e.g. check if grammar is correct, sentiment analysis), similarity and paraphrase tasks (e.g. determine if two questions are equivalent), and inference tasks (e.g. determine whether a premise contradicts a hypothesis).
-
-
Blog Articles, Papers, Case Studies
-
Biases in NLP
- AI bias: It is the responsibility of humans to ensure fairness
- Venturebeat Blogpost - Gender biases in datasets - Based on UCLA research paper "Learning Gender Neutral Word Embeddings" Aug 2018.
- Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
- Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
- Venturebeat Blogpost - Gender biases in datasets - Based on UCLA research paper "Learning Gender Neutral Word Embeddings" Aug 2018.
- Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
-
Cleaning
- Feature Extraction, Basic Pre-processing, and Advanced Processing
- How to solve 90% of NLP problems: a step-by-step guide
- Text Preprocessing in Python: Steps, Tools, and Examples
- How to Clean Text for Machine Learning with Python - by-step guide of how to perform text data pre-processing.
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- How to solve 90% of NLP problems: a step-by-step guide
- Text Preprocessing in Python: Steps, Tools, and Examples
-
Concept Analysis/Topic Modeling <a id="concept-analysis"></a>
- Topic models: Past, present, and future
- Word vectors using LSA, Part - 2
- Probabilistic Topic Models
- LEGO color themes as topic models
- How our startup switched from Unsupervised LDA to Semi-Supervised GuidedLDA
- Topic Modeling with LSA, PLSA, LDA & lda2Vec
- text2vec's Description of Topic Models
- Topic Modelling Portal
- Applications of Topic Models
- MACS 30500: Text analysis: topic modeling
- COTA, Uber’s topic modelling approach to improving customer support
- Topic Modelling The Legal Subject Matter And Judicial Activity Of The High Court Of Australia, 1903–2015
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- NLP: Extracting the main topics from your dataset using LDA in minutes
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Using LDA Topic Models as a Classification Model Input
- Probabilistic Topic Models
- text2vec's Description of Topic Models
- Topic Modelling The Legal Subject Matter And Judicial Activity Of The High Court Of Australia, 1903–2015
- LEGO color themes as topic models
-
Deep Learning
- Multi-Task Deep Neural Networks for Natural Language Understanding
- Keras LSTM tutorial – How to easily build a powerful deep learning language model
- Deep Learning for Natural Language Processing: Tutorials with Jupyter Notebooks
- A Survey of the Usages of Deep Learning in Natural Language Processing
- Sequence Classification with Human Attention - tracking corpora to regularize attention in recurrent neural networks (RNN). [Implementation code](https://github.com/coastalcph/Sequence_classification_with_human_attention).
- Tutorial on Text Classification (NLP) using ULMFiT and fastai Library in Python
- Deep Learning for Sentiment Analysis : A Survey
- A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING
- Investigating Capsule Networks with Dynamic Routing for Text Classification
- Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction
- Identifying Aggression and Toxicity in Comments using Capsule Network
- Dynamic Routing Between Capsules
- MATRIX CAPSULES WITH EM ROUTING
- Microsoft: Multi-Task Deep Neural Network (MT-DNN)
- Keras LSTM tutorial – How to easily build a powerful deep learning language model
- Sequence Classification with Human Attention - tracking corpora to regularize attention in recurrent neural networks (RNN). [Implementation code](https://github.com/coastalcph/Sequence_classification_with_human_attention).
- Natural Language Processing Tutorial for Deep Learning Researchers
- A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING
- Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction
- Identifying Aggression and Toxicity in Comments using Capsule Network
- Investigating Capsule Networks with Dynamic Routing for Text Classification
- Dynamic Routing Between Capsules
-
Dimensionality Reduction
-
Programming Languages
Categories
Sub Categories
Knowledge Graphs
250
Sentiment Analysis
110
Q&A Systems, Chatbots <a id="qa-systems"></a>
107
Concept Analysis/Topic Modeling <a id="concept-analysis"></a>
61
Transformers and Language Models
57
Lexicons for Sentiment Analysis
55
General <a id="general-articles"></a>
42
Word and Document Embeddings
34
Document Classification
31
Cleaning
29
Scraping
24
Deep Learning
22
Document Clustering and Document Similarity
10
Fuzzy Matching, Probabilistic Matching, Record Linkage, Etc. <a id="fuzzy-matching"></a>
9
Machine Translation
8
Biases in NLP
6
Dimensionality Reduction
6
Sarcasm Detection
5
Stemming
5
Entity and Information Extraction
5
Text Summarization
4
Stop Words
2
Keywords
natural-language-processing
10
nlp
9
machine-learning
8
python
6
text-mining
4
pdf
3
word-embeddings
3
pytorch
3
deep-learning
3
tensorflow
3
text-visualization
2
topic-modeling
2
computational-social-science
2
transformer
2
record-linkage
2
python-library
2
twitter
2
dataset
2
entity-resolution
2
dedupe
2
r
1
tidy-data
1
tidyverse
1
dedupe-library
1
data-extraction
1
extract
1
java
1
layout
1
pdfbox
1
text
1
attention
1
bert
1
paper
1
tutorial
1
bot
1
trading
1
trump
1
leaderboard
1
visual-analysis
1
bots
1
chatbot
1
chatgpt
1
chatgpt-api
1
language
1
artificial-intelligence-algorithms
1
artificial-neural-networks
1
bayesian-statistics
1
computer-vision
1
deep-neural-networks
1
deep-reinforcement-learning
1