Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-text-ml
A curated list of ML awesome frameworks & libraries for text data
https://github.com/oskar-j/awesome-text-ml
Last synced: 5 days ago
JSON representation
-
Frameworks and libraries
-
:snake: Python
- HanLP - Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification via one unified interface. https://bbs.hankcs.com/
- flair - A powerful NLP library for state-of-the-art natural language processing (NLP) models, such as named entity recognition (NER), part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation and classification.
- sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.
- stanza - Official Stanford NLP Python Library for Many Human Languages. https://stanfordnlp.github.io/stanza/
- Transformers - Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. https://huggingface.co/transformers
- texthero - Text preprocessing, representation and visualization from zero to hero. https://texthero.org/
- spark-nlp - Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. https://nlp.johnsnowlabs.com/
- sklearn - Scikit-learn is a Python module for machine learning built on top of SciPy, including tools for text vectorization and vector space compression. https://scikit-learn.org/stable/
- fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python. https://fairseq.readthedocs.io/en/latest/
- nlpaug - Augmenting nlp for your machine learning projects.
- AugLy - A data augmentations library from Facebook research for audio, image, text, and video.
- Kashgari - Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
- Snips NLU - Snips Python library to extract meaning from text. https://snips-nlu.readthedocs.io
- IKY - A python chatbot framework with Natural Language Understanding and Artificial Intelligence.
- rasa - Framework to automate text- and voice-based conversations: NLU, dialogue management, chatbots. https://rasa.com/docs/rasa/
- ParlAI - A framework for training and evaluating AI models on a variety of openly available dialogue datasets. https://parl.ai/
- DeepPavlov - An open source library for deep learning end-to-end dialog systems and chatbots. https://deeppavlov.ai/
- Rhino - On-device speech-to-intent engine powered by deep learning. https://picovoice.ai/
- NeMo - NeMo: a toolkit for conversational AI. https://nvidia.github.io/NeMo/
- dedupe - A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
- Scattertext - Beautiful visualizations of how language differs among document types.
- BIG-bench - Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models.
- gensim - Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. https://radimrehurek.com/gensim/
- bert-as-service - Mapping a variable-length sentence to a fixed-length vector using BERT model. https://bert-as-service.readthedocs.io
- langchain - Building applications with LLMs (large language models) through composability. https://langchain.readthedocs.io
-
-
Knowledge 📚
-
Multiple languages
- Awesome Sentiment Analysis - Repository with all what is necessary for sentiment analysis and related areas
-
Learning 101
- Virgilio - Virgilio is an open-source initiative, aiming to mentor and guide anyone in the world of the Data Science.
-
Python (and Python Notebooks)
- nlp-recipes - Comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems.
-
-
No longer maintained
-
Python (and Python Notebooks)
- NeuronBlocks - NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego.
- artificial-adversary - Tool to generate adversarial text examples and test machine learning models against them.
- EventForecast - Time series prediction and text analysis using Keras LSTM, plus clustering, association rules mining.
- lazynlp - Library to scrape and clean web pages to create massive datasets.
- MeTA: ModErn Text Analysis - A Modern C++ Data Sciences Toolkit. https://meta-toolkit.org
-
Programming Languages
Sub Categories
Keywords
nlp
18
machine-learning
15
natural-language-processing
13
python
11
text-classification
8
named-entity-recognition
7
data-science
6
pytorch
6
deep-learning
6
artificial-intelligence
6
word-embeddings
5
text-mining
5
nlu
4
text-analysis
4
bert
4
chatbot
4
tensorflow
4
sequence-labeling
3
question-answering
3
bot
3
slot-filling
3
ai
3
chatbots
2
clustering
2
text-visualization
2
data-mining
2
entity-extraction
2
statistics
2
natural-language
2
machine-translation
2
onnx
2
text
2
sentiment-analysis
2
intent-classification
2
word2vec
2
natural-language-understanding
2
entity-resolution
2
nlp-machine-learning
2
topic-modeling
2
machine-learning-library
2
language-model
2
ner
2
pretrained-models
2
ml
2
seq2seq
2
speech-recognition
2
conversation
1
intent-parser
1
nltk
1
sklearn
1