Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-text-ml

A curated list of ML awesome frameworks & libraries for text data
https://github.com/oskar-j/awesome-text-ml

Last synced: 2 days ago
JSON representation

  • Frameworks and libraries

    • :snake: Python

      • HanLP - Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification via one unified interface. https://bbs.hankcs.com/
      • flair - A powerful NLP library for state-of-the-art natural language processing (NLP) models, such as named entity recognition (NER), part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation and classification.
      • sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.
      • stanza - Official Stanford NLP Python Library for Many Human Languages. https://stanfordnlp.github.io/stanza/
      • Transformers - Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. https://huggingface.co/transformers
      • texthero - Text preprocessing, representation and visualization from zero to hero. https://texthero.org/
      • spark-nlp - Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. https://nlp.johnsnowlabs.com/
      • sklearn - Scikit-learn is a Python module for machine learning built on top of SciPy, including tools for text vectorization and vector space compression. https://scikit-learn.org/stable/
      • fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python. https://fairseq.readthedocs.io/en/latest/
      • nlpaug - Augmenting nlp for your machine learning projects.
      • AugLy - A data augmentations library from Facebook research for audio, image, text, and video.
      • Kashgari - Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
      • Snips NLU - Snips Python library to extract meaning from text. https://snips-nlu.readthedocs.io
      • IKY - A python chatbot framework with Natural Language Understanding and Artificial Intelligence.
      • rasa - Framework to automate text- and voice-based conversations: NLU, dialogue management, chatbots. https://rasa.com/docs/rasa/
      • ParlAI - A framework for training and evaluating AI models on a variety of openly available dialogue datasets. https://parl.ai/
      • DeepPavlov - An open source library for deep learning end-to-end dialog systems and chatbots. https://deeppavlov.ai/
      • Rhino - On-device speech-to-intent engine powered by deep learning. https://picovoice.ai/
      • langchain - Building applications with LLMs (large language models) through composability. https://langchain.readthedocs.io
      • NeMo - NeMo: a toolkit for conversational AI. https://nvidia.github.io/NeMo/
      • dedupe - A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
      • Scattertext - Beautiful visualizations of how language differs among document types.
      • BIG-bench - Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models.
      • gensim - Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. https://radimrehurek.com/gensim/
      • bert-as-service - Mapping a variable-length sentence to a fixed-length vector using BERT model. https://bert-as-service.readthedocs.io
  • Knowledge 📚

    • Learning 101

      • Virgilio - Virgilio is an open-source initiative, aiming to mentor and guide anyone in the world of the Data Science.
    • Multiple languages

    • Python (and Python Notebooks)

      • nlp-recipes - Comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems.
  • No longer maintained

    • Python (and Python Notebooks)

      • NeuronBlocks - NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego.
      • artificial-adversary - Tool to generate adversarial text examples and test machine learning models against them.
      • EventForecast - Time series prediction and text analysis using Keras LSTM, plus clustering, association rules mining.
      • lazynlp - Library to scrape and clean web pages to create massive datasets.
      • MeTA: ModErn Text Analysis - A Modern C++ Data Sciences Toolkit. https://meta-toolkit.org