Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Natural language processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
- GitHub: https://github.com/topics/nlp
- Wikipedia: https://en.wikipedia.org/wiki/Natural_language_processing
- Created by: Alan Turing
- Aliases: natural-language-processing, nlp-machine-learning, nlp-resources,
- Last updated: 2024-11-09 00:20:12 UTC
- JSON Representation
https://github.com/dccuchile/spanish-word-embeddings
Spanish word embeddings computed with different methods and from different corpora
fasttext-embeddings glove-embeddings nlp spanish word-embeddings word2vec-embeddinngs
Last synced: 05 Aug 2024
https://github.com/MAIF/melusine
📧 Melusine: Use python to automatize your email processing workflow
courriels datascience emails natural-language-processing nlp nlp-machine-learning python python3
Last synced: 03 Nov 2024
https://github.com/momegas/megabots
🤖 State-of-the-art, production ready LLM apps made mega-easy, so you don't have to build them from scratch 🤯 Create a bot, now 🫵
chatbot faiss fastapi gpt-35-turbo gpt-4 information-retrieval langchain llama natural-language-processing nlp pinecone prompt-engineering python question-answering s3
Last synced: 11 Oct 2024
https://github.com/Koziev/NLP_Datasets
My NLP datasets for Russian language
Last synced: 02 Aug 2024
https://github.com/domluna/memn2n
End-To-End Memory Network using Tensorflow
memory-networks nlp tensorflow
Last synced: 26 Oct 2024
https://github.com/izuna385/entity-linking-recent-trends
Recent trends of Entity Linking, Disambiguation, and Representation.
bert entity-disambiguation entity-language-model entity-linking entity-representation entity-resolution natural-language-processing nlp
Last synced: 18 Oct 2024
https://github.com/deepset-ai/covid-qa
API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.
api corona covid-19 covid-data faq nlp question-answering search
Last synced: 06 Nov 2024
https://github.com/OpenBMB/BMList
A List of Big Models
ai api code computer-vision deep-learning natural-language-processing nlp paper pretrained-models speech-recognition visualization
Last synced: 03 Aug 2024
https://github.com/alibaba-edu/simple-effective-text-matching
Source code of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".
deep-learning nlp quora-question-pairs snli tensorflow
Last synced: 06 Nov 2024
https://github.com/graphaware/neo4j-nlp
NLP Capabilities in Neo4j
algorithms graph-database machine-learning neo4j nlp opennlp stanford-corenlp
Last synced: 26 Sep 2024
https://github.com/jacksonllee/pycantonese
Cantonese Linguistics and NLP
cantonese computational-linguistics jyutping linguistics natural-language-processing nlp part-of-speech-tagging pycantonese python stop-words word-segmentation
Last synced: 04 Aug 2024
https://github.com/thisandagain/troll
Language sentiment analysis and neural networks... for trolls.
javascript moderation neural-network nlp sentiment sentiment-analysis
Last synced: 26 Oct 2024
https://github.com/kyzhouhzau/nlpgnn
1. Use BERT, ALBERT and GPT2 as tensorflow2.0's layer. 2. Implement GCN, GAN, GIN and GraphSAGE based on message passing.
albert albert-ner bert bert-cls bert-ner bilstm-attention gan gcn gin gnn gpt2 graph-classfication graph-convolutional-networks graphsage message-passing nlp tensorflow2 textcnn textgcn tf2
Last synced: 14 Oct 2024
https://github.com/oswaldoludwig/Seq2seq-Chatbot-for-Keras
This repository contains a new generative model of chatbot based on seq2seq modeling.
chatbot conversational-agents deep-learning dialogue dialogue-generation gan generative-adversarial-network glove keras nlp seq2seq
Last synced: 02 Nov 2024
https://github.com/xplip/pixel
Research code for pixel-based encoders of language (PIXEL)
deep-learning deep-neural-networks language-model machine-learning nlp pytorch
Last synced: 31 Oct 2024
https://github.com/davidmigloz/langchain_dart
Build LLM-powered Dart/Flutter applications.
ai dart flutter generative-ai llms nlp
Last synced: 03 Nov 2024
https://github.com/wuba/qa_match
A simple effective ToolKit for short text matching
58 ai deep-learning dssm lstm machine-learning nlp qabot qatools tensorflow
Last synced: 03 Aug 2024
https://github.com/shibing624/dialogbot
dialogbot, provide search-based dialogue, task-based dialogue and generative dialogue model. 对话机器人,基于问答型对话、任务型对话、聊天型对话等模型实现,支持网络检索问答,领域知识问答,任务引导问答,闲聊问答,开箱即用。
chatbot deep-learning dialog dialogbot nlp qa question-answering
Last synced: 30 Oct 2024
https://github.com/machine-learning-apps/Issue-Label-Bot
Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"
bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow
Last synced: 25 Oct 2024
https://github.com/yunwei37/covid-19-nlp-vis
使用 flask + pyecharts 搭建的新冠肺炎疫情数据可视化交互分析网站平台,包含疫情数据获取、每日疫情地图、曲线图展示,数据统计分析、态势感知、确诊人数预测分析算法设计、NLP舆情监测等任务(部署在http://covid.yunwei123.tech/)
covid-19 flask maps nlp pyecharts visualization
Last synced: 26 Oct 2024
https://github.com/machine-learning-apps/issue-label-bot
Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"
bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow
Last synced: 29 Sep 2024
https://github.com/CogStack/OpenGPT
A framework for creating grounded instruction based datasets and training conversational domain expert Large Language Models (LLMs).
chatgpt gpt-4 health healthcare huggingface llm medicine nlp opengpt
Last synced: 03 Aug 2024
https://github.com/discopy/discopy
The Python toolkit for computing with string diagrams.
category-theory diagrams nlp quantum-computing
Last synced: 09 Aug 2024
https://github.com/drahnr/cargo-spellcheck
Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar
cargo cargo-plugin cargo-spellcheck grammar grammar-mistakes grammarchecker hacktoberfest hunspell languagetool nlp spellchecker spelling
Last synced: 30 Oct 2024
https://github.com/zjunlp/openue
[EMNLP 2020] OpenUE: An Open Toolkit of Universal Extraction from Text
bert event-extraction intent-classification named-entity-recognition natural-language-processing nlp nlp-extraction-tasks openue pytorch relation-extraction slot-filling triple-extraction
Last synced: 03 Aug 2024
https://github.com/HIT-SCIR/huozi
活字通用大模型
fine-tuning large-language-models llm nlp
Last synced: 08 Nov 2024
https://github.com/explosion/prodigy-openai-recipes
✨ Bootstrap annotation with zero- & few-shot learning via OpenAI GPT-3
annotation-tool few-shot-learning gpt-3 nlp openai openai-api prodigy zero-shot-learning
Last synced: 25 Sep 2024
https://github.com/dpressel/dliss-tutorial
Tutorial for International Summer School on Deep Learning, 2019
deep-learning machine-learning nlp
Last synced: 26 Oct 2024
https://github.com/jcrodriguez1989/chatgpt
Interface to ChatGPT from R
assistant chatgpt gpt-3 gpt-4 hacktoberfest llm nlp openai r rstats rstats-package rstatses rstudio rstudio-addin
Last synced: 11 Oct 2024
https://github.com/asahi417/lm-question-generation
Multilingual/multidomain question generation datasets, models, and python library for question generation.
bart nlp pytorch question-answering question-generation t5
Last synced: 04 Nov 2024
https://github.com/hlasse/textdescriptives
A Python library for calculating a large variety of metrics from text
dependency-distance descriptive-statistics nlp python readability readability-scores spacy spacy-extension statistics syntactic-analysis
Last synced: 14 Oct 2024
https://github.com/cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
analysis deep-learning language-model language-models machine-learning nlp transformers
Last synced: 06 Aug 2024
https://github.com/swhl/ai-competition-collections
AI比赛经验帖子 & 训练和测试技巧帖子 集锦(收集整理各种人工智能比赛经验帖)
competition cv data-discovery graph-neural-networks knowledge-graph nlp recommender-system speech
Last synced: 01 Nov 2024
https://github.com/xiangking/ark-nlp
A private nlp coding package, which quickly implements the SOTA solutions.
Last synced: 06 Nov 2024
https://github.com/JetRunner/BERT-of-Theseus
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).
bert glue model-compression nlp transformers
Last synced: 03 Nov 2024
https://github.com/mcs07/chemdataextractor
Automatically extract chemical information from scientific documents
chemistry information-extraction natural-language-processing nlp python text-mining
Last synced: 07 Nov 2024
https://github.com/qiangsiwei/bert_distill
BERT distillation(基于BERT的蒸馏实验 )
bert classification distillation nlp
Last synced: 02 Nov 2024
https://github.com/UKPLab/gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
bert domain-adaptation information-retrieval nlp transformers vector-search
Last synced: 05 Aug 2024
https://github.com/CUNY-CL/wikipron
Massively multilingual pronunciation mining
computational-linguistics g2p language linguistics nlp phonetics phonology pronunciation python-api scraped-data speech
Last synced: 04 Nov 2024
https://github.com/natasha/yargy
Rule-based facts extraction for Russian language
earley-parser information-extraction morphology nlp python russian tomita tomita-parser
Last synced: 10 Nov 2024
https://github.com/graykode/ai-docstring
Visual Studio Code extension to quickly generate docstrings for python functions using AI(NLP) technology.
bert code-summarization docstrings nlp vs-code-extenstion
Last synced: 04 Nov 2024
https://github.com/xkzhangsan/xk-time
xk-time 是时间转换,时间计算,时间格式化,时间解析,日历,时间cron表达式和时间NLP等的工具,使用Java8(JSR-310),线程安全,简单易用,多达70几种常用日期格式化模板,支持Java8时间类和Date,轻量级,无第三方依赖。
calendar cron cron-java8 date datetimeformatter-formatter dateutil formatter java jsr-310 localdate localdatetime nlp time timeconvertion
Last synced: 04 Aug 2024
https://github.com/TengHu/ActionWeaver
Make function calling with LLM easier
chatgpt-functions nlp openai-api openai-chatgpt openai-function-call openai-functions python
Last synced: 05 Nov 2024
https://github.com/alibaba-edu/simple-effective-text-matching-pytorch
A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".
deep-learning nlp pytorch quora-question-pairs snli
Last synced: 06 Nov 2024
https://github.com/GaoQ1/rasa_nlu_gq
turn natural language into structured data(支持中文,自定义了N种模型,支持不同的场景和任务)
bert bilstm-idcnn jieba natural-language nlp nlu rasa rasa-nlu rasa-nlu-gao tensorflow
Last synced: 02 Nov 2024
https://github.com/abhimishra91/insight
Repository for Project Insight: NLP as a Service
docker fastapi huggingface huggingface-transformer machine-learning microservice natural-language-processing nlp streamlit streamlit-webapp transformer transformers-models
Last synced: 13 Oct 2024
https://github.com/HLasse/TextDescriptives
A Python library for calculating a large variety of metrics from text
dependency-distance descriptive-statistics nlp python readability readability-scores spacy spacy-extension statistics syntactic-analysis
Last synced: 04 Aug 2024
https://github.com/abhijithneilabraham/tableqa
AI Tool for querying natural language on tabular data.
ai csv database machine-learning nl2sql nlp qa querying-natural-language question-answering sql sql-generation sql-query table-qa tableqa tabular-data
Last synced: 30 Oct 2024
https://github.com/dair-ai/nlp_newsletter
📰Natural language processing (NLP) newsletter
deep-learning machine-learning nlp
Last synced: 03 Sep 2024
https://github.com/hankcs/multi-criteria-cws
Simple Solution for Multi-Criteria Chinese Word Segmentation
bi-lstm-crf cws dynet multi-criteria-cws nlp
Last synced: 09 Nov 2024
https://github.com/textpipe/textpipe
Textpipe: clean and extract metadata from text
language-identification named-entities named-entity-recognition nlp text-analysis text-processing
Last synced: 06 Nov 2024
https://github.com/farukalamai/advanced-machine-learning-engineer-roadmap-2024
A Full Stack ML (Machine Learning) Roadmap involves learning the necessary skills and technologies to become proficient in all aspects of machine learning, including data collection and preprocessing, model development, deployment, and maintenance.
aws computer-vision data-analysis data-science data-visualization deep-learning git-github machine-learning machine-learning-roadmap mlops natural-language-processing neural-network nlp opencv pandas python pytorch statistics tensorflow yolo
Last synced: 07 Nov 2024
https://github.com/phospho-app/phospho
Text analytics for LLM apps. PostHog for prompts. Extract evaluations, intents and events from text messages. phospho leverages LLM (OpenAI, MistralAI, Ollama, etc.)
ai analytics generative-ai llm nextjs nlp ollama python self-hosted typescript
Last synced: 13 Oct 2024
https://github.com/charles9n/bert-sklearn
a sklearn wrapper for Google's BERT model
bert conll-2003 language-model named-entity-recognition natural-language-processing ner nlp pytorch scikit-learn transfer-learning
Last synced: 02 Nov 2024
https://github.com/kevinlu1248/pyate
PYthon Automated Term Extraction
ai nlp symbolic-ai term-extraction
Last synced: 28 Sep 2024
https://github.com/gagolews/stringi
Fast and portable character string processing in R (with the Unicode ICU)
icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode
Last synced: 26 Oct 2024
https://github.com/hankcs/hanlp-lucene-plugin
HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统
chinese-text-segmentation hanlp lucene nlp solr traditional-chinese
Last synced: 26 Oct 2024
https://github.com/daac-tools/vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
japanese morphological-analysis nlp rust segmentation tokenization tokenizer
Last synced: 07 Nov 2024
https://github.com/jackdh/RasaTalk
A chatbot framework for Rasa NLU
bot botkit bots chatbot chatbot-framework conversational-ai nlp nodejs rasa rasa-nlu react
Last synced: 30 Oct 2024
https://github.com/gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching
bilingual code-mixed code-mixing code-switch code-switching language nlp papers research speech
Last synced: 08 Nov 2024
https://github.com/jameshwade/gpttools
gpttools extends gptstudio for package development to help you document code, write tests, or even explain code
chatgpt nlp openai package-development rstats rstudio-addin
Last synced: 09 Nov 2024
https://github.com/jsksxs360/AHANLP
啊哈自然语言处理包,提供包括分词、依存句法分析、语义角色标注、自动摘要、语义相似度计算、LDA 主题预测、词云等服务。
Last synced: 30 Oct 2024
https://github.com/sekwiatkowski/Komputation
Komputation is a neural network framework for the Java Virtual Machine written in Kotlin and CUDA C.
artificial-intelligence convolutional-neural-networks cuda framework gpu jvm kotlin machine-learning neural-networks nlp nvidia recurrent-neural-networks seq2seq
Last synced: 02 Nov 2024
https://github.com/primaryobjects/lda
LDA topic modeling for node.js
ai artificial-intelligence javascript keywords language lda machine-learning natural-language-processing nlp node node-js nodejs topic-modeling topics
Last synced: 13 Oct 2024
https://github.com/feedly/transfer-nlp
NLP library designed for reproducible experimentation management
framework language-model natural-language-understanding nlp playground pytorch transfer-learning
Last synced: 13 Oct 2024
https://github.com/JamesHWade/gpttools
gpttools extends gptstudio for package development to help you document code, write tests, or even explain code
chatgpt nlp openai package-development rstats rstudio-addin
Last synced: 13 Aug 2024
https://github.com/caiyinqiong/semantic-retrieval-models
A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).
dense-retrieval information-retrieval nlp paper-list semantic-retrieval
Last synced: 10 Nov 2024
https://github.com/artpar/languagecrunch
LanguageCrunch NLP server docker image
coreference-resolution natural-language-processing nlp relation-extraction sentiment-analysis spacy-nlp word2vec wordnet
Last synced: 26 Oct 2024
https://github.com/igorbrigadir/stopwords
Default English stopword lists from many different sources
en-stopwords english-stopwords natural-language-processing nlp stopwords
Last synced: 05 Nov 2024
https://github.com/zhongkaifu/RNNSharp
RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.
c-sharp crf deep-learning dotnet lstm machine-learning nlp recurrent-neural-networks rnn rnn-model sequence-labeling
Last synced: 17 Aug 2024
https://github.com/artitw/text2text
Text2Text: Crosslingual NLP/G toolkit
backtranslation chatgpt cross-lingual embeddings information-retrieval levenshtein-distance llama llm multi-lingual natural-language-generation natural-language-processing nlp question-answering question-generation search summarization tf-idf tokenizer transformers translator
Last synced: 11 Oct 2024
https://github.com/liyucheng09/selective_context
Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.
Last synced: 30 Oct 2024
https://github.com/sunzeyeah/RLHF
Implementation of Chinese ChatGPT
chatgpt deep-learning deepspeed glm nlp pangu pytorch
Last synced: 31 Oct 2024
https://github.com/merantix-momentum/squirrel-core
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed internal machine-learning ml natural-language-processing nlp python pytorch tensorflow
Last synced: 02 Nov 2024
https://github.com/boat-group/fancy-nlp
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
bert bert-chinese bert-classifier bert-embeddings bert-ner bilstm-crf bimpm chinese-nlp crf esim keras named-entity-recognition nlp python-library semantic-similarity tensorflow text-classification tf2
Last synced: 30 Oct 2024
https://github.com/google-research/retvec
RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.
deep-learning natural-language-processing nlp python tensorflow text-classification
Last synced: 13 Oct 2024
https://github.com/extreme-bert/extreme-bert
ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT”.
bert deep-learning language-model language-models machine-learning natural-language-processing nlp python pytorch transformer
Last synced: 03 Aug 2024
https://github.com/LudwigStumpp/llm-leaderboard
A joint community effort to create one central leaderboard for LLMs.
leaderboard llm machine-learning nlp
Last synced: 02 Nov 2024
https://github.com/jboynyc/textnets
Text analysis with networks.
computational-social-science network-analysis nlp sociology text-analysis text-as-data visualization
Last synced: 07 Aug 2024
https://github.com/amirshnll/persian-swear-words
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
dataset datasets farsi farsiswear farsiswearword nlp nlp-dataset persian persiandataset persianswearword swear sweardataset swearword
Last synced: 30 Oct 2024
https://github.com/bonzanini/nlp-tutorial
Tutorial: Natural Language Processing in Python
natural-language-processing nlp python
Last synced: 07 Aug 2024
https://github.com/RTIInternational/gobbli
Deep learning with text doesn't have to be scary.
deep-learning docker nlp python
Last synced: 04 Nov 2024
https://github.com/shineware/KOMORAN
Korean Morphological Analyzer by shineware
komoran korean-nlp korean-text-processing morphological-analysis nlp shineware
Last synced: 02 Aug 2024
https://github.com/krishnap25/mauve
Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.
deep-learning huggingface-transformers nlp pytorch text-generation
Last synced: 13 Oct 2024
https://github.com/tirthajyoti/web-database-analytics
Web scrapping and related analytics using Python tools
analytics beautifulsoup4 data-science data-wrangling database json json-parser natural-language-processing nlp python regular-expression sql sqlite3 web-scraping xml-parser
Last synced: 31 Oct 2024
https://github.com/phantominsights/summarizer
A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.
nlp praw python3 reddit-bot spacy web-scraper wordcloud
Last synced: 31 Oct 2024
https://github.com/stanford-oval/genie-server
The home server version of Almond
hacktoberfest nlp raspberrypi voice
Last synced: 11 Oct 2024
https://github.com/tirthajyoti/Web-Database-Analytics
Web scrapping and related analytics using Python tools
analytics beautifulsoup4 data-science data-wrangling database json json-parser natural-language-processing nlp python regular-expression sql sqlite3 web-scraping xml-parser
Last synced: 09 Nov 2024
https://github.com/affjljoo3581/gpt2
PyTorch Implementation of OpenAI GPT-2
gpt2 language-model natural-language-generation natural-language-processing nlp pytorch transformer
Last synced: 09 Nov 2024
https://github.com/amirshnll/Persian-Swear-Words
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
dataset datasets farsi farsiswear farsiswearword nlp nlp-dataset persian persiandataset persianswearword swear sweardataset swearword
Last synced: 04 Aug 2024
https://github.com/ikegami-yukino/neologdn
Japanese text normalizer for mecab-neologd
japanese-language mecab-ipadic-neologd nlp preprocessing text-normalization
Last synced: 12 Oct 2024
https://github.com/quadrismegistus/prosodic
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
finnish-language-analysis linguistics metrical-parser nlp poetry rhythm
Last synced: 30 Oct 2024
https://github.com/neuml/txtchat
💭 Retrieval augmented generation (RAG) and language model powered search applications
large-language-models llm machine-learning nlp python rag retrieval-augmented-generation search txtai
Last synced: 28 Oct 2024