Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/davidmigloz/langchain_dart

Build LLM-powered Dart/Flutter applications.

ai dart flutter generative-ai llms nlp

Last synced: 01 Aug 2024

https://github.com/wuba/qa_match

A simple effective ToolKit for short text matching

58 ai deep-learning dssm lstm machine-learning nlp qabot qatools tensorflow

Last synced: 03 Aug 2024

https://github.com/CogStack/OpenGPT

A framework for creating grounded instruction based datasets and training conversational domain expert Large Language Models (LLMs).

chatgpt gpt-4 health healthcare huggingface llm medicine nlp opengpt

Last synced: 03 Aug 2024

https://github.com/discopy/discopy

The Python toolkit for computing with string diagrams.

category-theory diagrams nlp quantum-computing

Last synced: 09 Aug 2024

https://github.com/machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 30 Jul 2024

https://github.com/drahnr/cargo-spellcheck

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

cargo cargo-plugin cargo-spellcheck grammar grammar-mistakes grammarchecker hacktoberfest hunspell languagetool nlp spellchecker spelling

Last synced: 20 Aug 2024

https://github.com/cli99/llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

analysis deep-learning language-model language-models machine-learning nlp transformers

Last synced: 06 Aug 2024

https://github.com/xiangking/ark-nlp

A private nlp coding package, which quickly implements the SOTA solutions.

bert nlp transfomer

Last synced: 01 Aug 2024

https://github.com/JetRunner/BERT-of-Theseus

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

bert glue model-compression nlp transformers

Last synced: 01 Aug 2024

https://github.com/SimGus/Chatette

A powerful dataset generator for Rasa NLU, inspired by Chatito

botkit chatbot chatbots chatito cli dataset-generation nlg nlp nlu parsing python rasa rasa-nlu sentence

Last synced: 31 Jul 2024

https://github.com/UKPLab/gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

bert domain-adaptation information-retrieval nlp transformers vector-search

Last synced: 05 Aug 2024

https://github.com/natasha/yargy

Rule-based facts extraction for Russian language

earley-parser information-extraction morphology nlp python russian tomita tomita-parser

Last synced: 07 Aug 2024

https://github.com/xkzhangsan/xk-time

xk-time 是时间转换,时间计算,时间格式化,时间解析,日历,时间cron表达式和时间NLP等的工具,使用Java8(JSR-310),线程安全,简单易用,多达70几种常用日期格式化模板,支持Java8时间类和Date,轻量级,无第三方依赖。

calendar cron cron-java8 date datetimeformatter-formatter dateutil formatter java jsr-310 localdate localdatetime nlp time timeconvertion

Last synced: 04 Aug 2024

https://github.com/graykode/ai-docstring

Visual Studio Code extension to quickly generate docstrings for python functions using AI(NLP) technology.

bert code-summarization docstrings nlp vs-code-extenstion

Last synced: 01 Aug 2024

https://github.com/qiangsiwei/bert_distill

BERT distillation(基于BERT的蒸馏实验 )

bert classification distillation nlp

Last synced: 01 Aug 2024

https://github.com/GaoQ1/rasa_nlu_gq

turn natural language into structured data(支持中文,自定义了N种模型,支持不同的场景和任务)

bert bilstm-idcnn jieba natural-language nlp nlu rasa rasa-nlu rasa-nlu-gao tensorflow

Last synced: 01 Aug 2024

https://github.com/alibaba-edu/simple-effective-text-matching-pytorch

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

deep-learning nlp pytorch quora-question-pairs snli

Last synced: 01 Aug 2024

https://github.com/dair-ai/nlp_newsletter

📰Natural language processing (NLP) newsletter

deep-learning machine-learning nlp

Last synced: 03 Sep 2024

https://github.com/phospho-app/phospho

Text analytics for LLM apps. PostHog for prompts. Extract evaluations, intents and events from text messages. phospho leverages LLM (OpenAI, MistralAI, Ollama, etc.)

ai analytics generative-ai llm nextjs nlp ollama python self-hosted typescript

Last synced: 01 Aug 2024

https://github.com/HIT-SCIR/huozi

活字通用大模型

fine-tuning large-language-models llm nlp

Last synced: 01 Aug 2024

https://github.com/kevinlu1248/pyate

PYthon Automated Term Extraction

ai nlp symbolic-ai term-extraction

Last synced: 03 Aug 2024

https://github.com/gagolews/stringi

Fast and portable character string processing in R (with the Unicode ICU)

icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode

Last synced: 31 Jul 2024

https://github.com/daac-tools/vibrato

🎤 vibrato: Viterbi-based accelerated tokenizer

japanese morphological-analysis nlp rust segmentation tokenization tokenizer

Last synced: 01 Aug 2024

https://github.com/sekwiatkowski/Komputation

Komputation is a neural network framework for the Java Virtual Machine written in Kotlin and CUDA C.

artificial-intelligence convolutional-neural-networks cuda framework gpu jvm kotlin machine-learning neural-networks nlp nvidia recurrent-neural-networks seq2seq

Last synced: 01 Aug 2024

https://github.com/jameshwade/gpttools

gpttools extends gptstudio for package development to help you document code, write tests, or even explain code

chatgpt nlp openai package-development rstats rstudio-addin

Last synced: 02 Aug 2024

https://github.com/feedly/transfer-nlp

NLP library designed for reproducible experimentation management

framework language-model natural-language-understanding nlp playground pytorch transfer-learning

Last synced: 01 Aug 2024

https://github.com/JamesHWade/gpttools

gpttools extends gptstudio for package development to help you document code, write tests, or even explain code

chatgpt nlp openai package-development rstats rstudio-addin

Last synced: 13 Aug 2024

https://github.com/asahi417/lm-question-generation

Multilingual/multidomain question generation datasets, models, and python library for question generation.

bart nlp pytorch question-answering question-generation t5

Last synced: 01 Aug 2024

https://github.com/jsksxs360/AHANLP

啊哈自然语言处理包,提供包括分词、依存句法分析、语义角色标注、自动摘要、语义相似度计算、LDA 主题预测、词云等服务。

chinese nlp

Last synced: 31 Jul 2024

https://github.com/enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

ai llm nlp ocr openai python

Last synced: 01 Aug 2024

https://github.com/thunlp/NSC

Neural Sentiment Classification

nlp

Last synced: 07 Aug 2024

https://github.com/igorbrigadir/stopwords

Default English stopword lists from many different sources

en-stopwords english-stopwords natural-language-processing nlp stopwords

Last synced: 07 Aug 2024

https://github.com/zhongkaifu/RNNSharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

c-sharp crf deep-learning dotnet lstm machine-learning nlp recurrent-neural-networks rnn rnn-model sequence-labeling

Last synced: 17 Aug 2024

https://github.com/sunzeyeah/RLHF

Implementation of Chinese ChatGPT

chatgpt deep-learning deepspeed glm nlp pangu pytorch

Last synced: 31 Jul 2024

https://github.com/merantix-momentum/squirrel-core

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed jax machine-learning ml natural-language-processing nlp python pytorch tensorflow

Last synced: 01 Aug 2024

https://github.com/extreme-bert/extreme-bert

ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT”.

bert deep-learning language-model language-models machine-learning natural-language-processing nlp python pytorch transformer

Last synced: 03 Aug 2024

https://github.com/natasha/corus

Links to Russian corpora + Python functions for loading and parsing

corpora datasets nlp python russian

Last synced: 02 Aug 2024

https://github.com/bonzanini/nlp-tutorial

Tutorial: Natural Language Processing in Python

natural-language-processing nlp python

Last synced: 07 Aug 2024

https://github.com/RTIInternational/gobbli

Deep learning with text doesn't have to be scary.

deep-learning docker nlp python

Last synced: 01 Aug 2024

https://github.com/google-research/retvec

RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.

deep-learning natural-language-processing nlp python tensorflow text-classification

Last synced: 01 Aug 2024

https://github.com/shineware/KOMORAN

Korean Morphological Analyzer by shineware

komoran korean-nlp korean-text-processing morphological-analysis nlp shineware

Last synced: 02 Aug 2024

https://github.com/LudwigStumpp/llm-leaderboard

A joint community effort to create one central leaderboard for LLMs.

leaderboard llm machine-learning nlp

Last synced: 01 Aug 2024

https://github.com/amirshnll/Persian-Swear-Words

Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها

dataset datasets farsi farsiswear farsiswearword nlp nlp-dataset persian persiandataset persianswearword swear sweardataset swearword

Last synced: 04 Aug 2024

https://github.com/jenojp/negspacy

spaCy pipeline object for negating concepts in text

negation negation-phrases negex nlp python spacy spacy-extension spacy-pipeline

Last synced: 07 Aug 2024

https://github.com/stanford-oval/genie-server

The home server version of Almond

hacktoberfest nlp raspberrypi voice

Last synced: 01 Aug 2024

https://github.com/polm/cutlet

Japanese to romaji converter in Python

japanese nlp romaji

Last synced: 02 Aug 2024

https://github.com/sakuranew/BERT-AttributeExtraction

USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction. 使用基于bert的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取。

ai attribute-extraction bert deeplearning feature-extraction fine-tuning knowledge-graph nlp relation-extraction

Last synced: 01 Aug 2024

https://github.com/lucasxlu/LagouJob

Data Analysis & Mining for lagou.com

data-analysis data-mining lagou machine-learning nlp python3 web-crawler

Last synced: 06 Aug 2024

https://github.com/tensorchord/modelz-llm

OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)

llm nlp openai-api transformer

Last synced: 01 Aug 2024

https://github.com/aryn-ai/sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.

ai dataprep etl information-retrieval llm ml nlp opensearch search semantic-search

Last synced: 18 Aug 2024

https://github.com/esteininger/vector-search

The definitive guide to using Vector Search to solve your semantic search production workload needs.

lucene nlp search-engine vector-search

Last synced: 01 Aug 2024

https://github.com/linonetwo/segmentit

任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment

chinese chinese-nlp nlp segmentation

Last synced: 02 Aug 2024

https://github.com/b2ihealthcare/snow-owl

:owl: Snow Owl Terminology Server - production-ready, scalable, supports FHIR R4, FHIR R5, SNOMED CT International and Extensions, LOINC, ICD-10, dm+d, custom code systems and many others

codesystem conceptmap elasticsearch fhir fhir-api fhir-server hpo icd java knowledge-graph loinc nlp owl snomed snomed-ct terminology terminology-server valueset

Last synced: 31 Jul 2024

https://github.com/gabeur/mmt

Multi-Modal Transformer for Video Retrieval

fusion language multimodal nlp video vision

Last synced: 03 Aug 2024

https://github.com/30lm32/ml-projects

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

ab-testing deep-learning docker gensim geolocation imbalanced-data kdtree keras lstm-neural-networks machine-learning mlflow nlp random-forest spam-classification svm tensorboard tensorflow text-classification timeseries-analysis word2vec

Last synced: 03 Aug 2024

https://github.com/grumpyp/aixplora

AIxplora is a open-source tool which let's you query all kind of files not limited to any length or format.

audio chat chatbot chatgpt embeddings embeddings-model generativeai llm llms nlp openai ownfiles pdf question-answering search second-brain vectorstore

Last synced: 01 Aug 2024

https://github.com/oxford-cs-deepnlp-2017/practical-1

Oxford Deep NLP 2017 course - Practical 1: word2vec

deep-learning natural-language-processing nlp oxford word2vec

Last synced: 07 Aug 2024

https://github.com/akanyaani/gpt-2-tensorflow2.0

OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0

gpt gpt-2 gpt2 implementation nlp openai pre-training pretraining tensorflow tensorflow2 text-generation transformer

Last synced: 02 Aug 2024

https://github.com/cbaziotis/neat-vision

Neat (Neural Attention) Vision, is a visualization tool for the attention mechanisms of deep-learning models for Natural Language Processing (NLP) tasks. (framework-agnostic)

attention attention-mechanism attention-mechanisms attention-scores attention-visualization deep-learning deep-learning-library deep-learning-visualization natural-language-processing nlp self-attention self-attentive-rnn text-visualization visualization vuejs

Last synced: 01 Aug 2024

https://github.com/natasha/razdel

Rule-based token, sentence segmentation for Russian language

nlp python russian sentence-boundary-detection sentence-segmentation tokenization

Last synced: 07 Aug 2024

https://github.com/abelriboulot/onnxt5

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

inference nlp nlp-machine-learning onnx onnxruntime sentiment-analysis summarization text-classification text-generation transformer transformers translation

Last synced: 01 Aug 2024

https://github.com/quanteda/spacyr

R wrapper to spaCy NLP

extract-entities nlp r spacy speech-tagging

Last synced: 02 Aug 2024

https://github.com/gmihaila/ml_things

This is where I put things I find useful that speed up my work with Machine Learning. Ever looked in your old projects to reuse those cool functions you created before? Well, this repo is designed to be a Python Library of functions I created in my previous project that can be reused. I also share some Notebooks Tutorials and Python Code Snippets.

google-colab machine-learning nlp nlp-machine-learning notebooks python-snippets pytorch snippets transformer

Last synced: 01 Aug 2024

https://github.com/neuml/txtchat

💭 Retrieval augmented generation (RAG) and language model powered search applications

large-language-models llm machine-learning nlp python rag retrieval-augmented-generation search txtai

Last synced: 31 Jul 2024

https://github.com/PlanTL-GOB-ES/lm-spanish

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

benchmarks corpora embeddings language-model nlp transformers

Last synced: 05 Aug 2024

https://github.com/zilliztech/akcio

Akcio is a demonstration project for Retrieval Augmented Generation (RAG). It leverages the power of LLM to generate responses and uses vector databases to fetch relevant documents to enhance the quality and relevance of the output.

artificial-intelligence chatbot chatgpt dolly embeddings ernie-bot fastapi gradio langchain llm milvus minimax nlp openai retrieval-augmented-generation retrieval-chatbot semantic-search towhee

Last synced: 09 Aug 2024

https://webanno.github.io/webanno/

🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the end of the line. -- 🚀 To migrate, export your annotation projects from WebAnno, then import them into INCEpTION and just work on.

annotation annotation-editor annotation-tool java nlp web-application

Last synced: 31 Jul 2024

https://github.com/webanno/webanno

🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the end of the line. -- 🚀 To migrate, export your annotation projects from WebAnno, then import them into INCEpTION and just work on.

annotation annotation-editor annotation-tool java nlp web-application

Last synced: 31 Jul 2024

https://github.com/neomatrix369/nlp_profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

google-colab grammar-checks hacktoberfest jupyter kaggle-kernels natural-language-processing nlp nlp-keywords-extraction nlp-library nlp-machine-learning nlp-parsing nlp-profiler profiler profiling profiling-datasets text-mining

Last synced: 31 Jul 2024

https://github.com/explosion/spacy-services

💫 REST microservices for various spaCy-related tasks

falcon natural-language-processing nlp rest-api rest-microservice spacy

Last synced: 07 Aug 2024

https://github.com/IBM/transition-amr-parser

SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch. Includes checkpoints and other tools such as statistical significance Smatch.

abstract-meaning-representation amr amr-graphs amr-parser amr-parsing machine-learning nlp semantic-parsing

Last synced: 02 Aug 2024

https://github.com/lucasmccabe/emailgpt

a quick and easy interface to generate emails with ChatGPT

chatgpt gpt nlp openai productivity streamlit

Last synced: 02 Aug 2024

https://github.com/as-ideas/headliner

🏖 Easy training and deployment of seq2seq models.

neural-network nlp python seq2seq tensorflow

Last synced: 01 Aug 2024