Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Natural language processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
- GitHub: https://github.com/topics/nlp
- Wikipedia: https://en.wikipedia.org/wiki/Natural_language_processing
- Created by: Alan Turing
- Aliases: natural-language-processing, nlp-machine-learning, nlp-resources,
- Last updated: 2024-11-15 00:20:20 UTC
- JSON Representation
https://github.com/hyunjoonbok/Python-Projects
Portfolio in Python
augmentation cnn-classification data data-visualization dataanalytics datascience deep-learning forecasting gan lightgbm machine-learning nlp rnn rnn-pytorch textclassification timeseries xgboost
Last synced: 08 Nov 2024
https://github.com/yozuk/yozuk
Chatbot for Programmers
bot chatbot command-line-tool developer-tools nlp rust telegram telegram-bot text-based
Last synced: 12 Oct 2024
https://github.com/leoneversberg/llm-chatbot-rag
A local LLM chatbot with RAG for PDF input files
Last synced: 08 Aug 2024
https://github.com/ahmedbesbes/audiolizr
A bentoML-powered API to transcribe audio and make sense of it
bentoml bentoml-service docker nlp openai openai-whisper pytube speech-recognition t5 torch transformers
Last synced: 07 Aug 2024
https://github.com/Flight-School/sentences
A command-line utility that splits natural language text into sentences.
cli macos nlp sentence-tokenizer swift
Last synced: 05 Aug 2024
https://github.com/gentaiscool/indonesian-nlp
A curated list of research papers and resources on Indonesian languages
deep-learning indonesian javanese local local-languages machine-learning nlp papers research speech sundanese survey
Last synced: 08 Nov 2024
https://github.com/nlpodyssey/gotokenizers
Go implementation of today's most used tokenizers
bert language-model natural-language-processing natural-language-understanding nlp transformers
Last synced: 15 Nov 2024
https://github.com/stanfordnlp/stanza-train
Model training tutorials for the Stanza Python NLP Library
natural-language-processing nlp stanza
Last synced: 08 Nov 2024
https://github.com/aashrafh/anees
Multi-turn open-domain Arabic chatbot with a wide set of features.
anees arabic-dialects arabic-nlp chatbot chatbots dialogue-generation gpt-2 multi-turn-dialogue natural-language-understanding nlp
Last synced: 22 Oct 2024
https://github.com/famished-tiger/rley
An Earley parser written in Ruby
earley-parser natural-language-processing nlp parser ruby rubynlp
Last synced: 14 Oct 2024
https://github.com/amazon-science/recode
Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"
code-generation large-language-models nlp robustness
Last synced: 12 Nov 2024
https://github.com/dair-ai/notebooks
🔬 Sharing your data science notebooks with the community has never been this easy.
artificial-intelligence deep-learning machine-learning nlp
Last synced: 10 Nov 2024
https://github.com/robmch/cyk-parser
A CYK parser written in Python 3.
cyk-parser natural-language-processing nlp nlp-parsing parser parsing python-3-6
Last synced: 12 Oct 2024
https://github.com/ropensci-archive/geoparser
:no_entry: ARCHIVED :no_entry:
geocoding geoparser nlp peer-reviewed r r-package rstats
Last synced: 05 Aug 2024
https://github.com/cocoa-ai/namescoremldemo
🏷 iOS11 demo application for predicting gender from first names.
classification coreml coreml-models gender-classification ios machine-learning nlp swift swift4
Last synced: 07 Nov 2024
https://github.com/promptable/Promptable-web-sdk
Web SDK for Promptable Website.
ai chaining compose gpt-3 llm model nlp prompt promptable promptengineering prompting prompts
Last synced: 03 Sep 2024
https://github.com/seanlee97/clfzoo
A deep text classifiers library.
nlp tensorflow text-classification
Last synced: 27 Oct 2024
https://github.com/chrismattmann/lucene-geo-gazetteer
Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.
allcountries apache gazetteer geoindex geonames irds lucene nlp nlp-machine-learning opennlp
Last synced: 30 Oct 2024
https://github.com/aws-solutions/content-localization-on-aws
Automatically generate multi-language subtitles using AWS AI/ML services. Machine generated subtitles can be edited to improve accuracy and downstream tracks will automatically be regenerated based on the edits. Built on Media Insights Engine (https://github.com/awslabs/aws-media-insights-engine)
amazon-comprehend amazon-polly amazon-transcribe amazon-translate audio aws-media-insights-engine captions content-analysis localisation localization media mie nlp nlp-machine-learning speech-to-text subtitles video video-on-demand vod
Last synced: 08 Nov 2024
https://github.com/macournoyer/utterance_parser
Extract intent and entities from natural language utterances
extracts-intent nlp slot-filling
Last synced: 09 Nov 2024
https://github.com/mchmarny/tsignal
Analyzing social media sentiment and its impact on stock market
analytics golang nasdaq nlp sentiment-analysis twitter
Last synced: 08 Nov 2024
https://github.com/promptable/promptable-web-sdk
Web SDK for Promptable Website.
ai chaining compose gpt-3 llm model nlp prompt promptable promptengineering prompting prompts
Last synced: 08 Nov 2024
https://github.com/cocoa-ai/NamesCoreMLDemo
🏷 iOS11 demo application for predicting gender from first names.
classification coreml coreml-models gender-classification ios machine-learning nlp swift swift4
Last synced: 09 Aug 2024
https://github.com/psolbach/metadoc
Aviation grade news article metadata extraction
extraction metadata news nlp perceptron
Last synced: 08 Nov 2024
https://github.com/hiyouga/pban-pytorch
A Position-aware Bidirectional Attention Network for Aspect-level Sentiment Analysis, PyTorch implementation.
aspect-based-sentiment-analysis attention-model deep-learning natural-language-processing nlp pytorch sentiment-analysis
Last synced: 27 Oct 2024
https://github.com/kudoai/chatgpt.js-greasemonkey-starter
🙈 A starting point for developing your own Greasemonkey userscript using chatgpt.js
ai artificial-intelligence chatgpt gpt gpt-3 gpt-4 greasemonkey greasemonkey-script greasemonkey-userscript javascript javascript-library kudoai nlp nlp-machine-learning openai template userscript userscripts ux ux-design
Last synced: 14 Oct 2024
https://github.com/liebeck/spacy-sentiws
German sentiment scores with SentiWS as extension for spaCy
nlp spacy spacy-extension spacy-pipeline
Last synced: 14 Oct 2024
https://github.com/bastienbot/nlp-js-tools-french
POS Tagger, lemmatizer and stemmer for french language in javascript
lemmatization lemmatizer nlp postagging postgresql stemmer stemming tokenization tokenizer
Last synced: 28 Aug 2024
https://github.com/xxjwxc/gohanlp
Golang RESTful Client for HanLP.中文分词 词性标注 命名实体识别 依存句法分析 语义依存分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
ai dependency-parser hanlp named-entity-recognition natural-language-processing nlp pos-tagging semantic-parsing text-classification
Last synced: 28 Oct 2024
https://github.com/bangla-rag/porag
Fully Configurable RAG Pipeline for Bengali Language RAG Applications. Supports both Local and Huggingface Models, Built with Langchain.
ai bengali bengali-nlp chromadb langchain llama3 llm nlp rag transformers
Last synced: 10 Oct 2024
https://github.com/syzer/sentiment-analyser
ML that can extract german and english sentiment
english german nlp nlp-library node-js nodejs sentiment-analyser sentiment-analysis
Last synced: 28 Oct 2024
https://github.com/thisisiron/nmt-attention-tf2
👫 Effective Approaches to Attention-based Neural Machine Translation implemented as Tensorflow 2.0
attention lstm natural-language-processing neural-machine-translation nlp nmt tensorflow tensorflow2 tf2 translation
Last synced: 08 Nov 2024
https://github.com/MiuLab/FlowDelta
FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension
machine-comprehension nlp pytorch question-answering
Last synced: 07 Aug 2024
https://github.com/kasnerz/reffix
A tool for fixing a BibTeX reference list using DBLP API
arxiv arxiv-org arxiv-papers bibtex bibtex-entry bibtex-references bibtexparser dblp dblp-api dblp-bibliography natural-language-processing nlp nlproc research-paper
Last synced: 28 Oct 2024
https://github.com/anjum48/commonlitreadabilityprize
4th Place solution for the Kaggle CommonLit Readability Prize
huggingface kaggle nlp pytorch transformers
Last synced: 14 Oct 2024
https://github.com/adirthaborgohain/ner-re
A Named Entity Recognition + Entity Linker + Relation Extraction Pipeline built using spacy v3. Given a text, the pipeline will extract entities from the text as trained and will disambiguate the entities to its normalized form through an Entity Linker connected to a Knowledge Base and will assign a relation between the entities, if any.
named-entity-recognition nlp relation-extraction spacy transformers
Last synced: 09 Nov 2024
https://github.com/GermanT5/wikipedia2corpus
Wikipedia text corpus for self-supervised NLP model training
corpus german-nlp machine-learning nlp somajo wikipedia wikipedia-corpus
Last synced: 31 Oct 2024
https://github.com/vinitra-zz/neural-text-style-transfer
Style Transfer for non-parallel text
autoencoder deep-neural-networks nlp style-transfer
Last synced: 22 Oct 2024
https://github.com/Lingkai-Kong/Calibrated-BERT-Fine-Tuning
Code for Paper: Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data
bert calibration deep-learning language-model nlp nlp-machine-learning ood-detection open-world-classification robustness text-classification uncertainty-estimation uncertainty-quantification
Last synced: 13 Nov 2024
https://github.com/kbogas/medknow
Medical Relations and Entities Extraction
biomedical metamap neo4j nlp relation-extraction semrep umls
Last synced: 28 Oct 2024
https://github.com/nikhilbarhate99/char-rnn-pytorch
Minimal implementation of Multi-layer Recurrent Neural Networks (LSTM) for character-level language modelling in PyTorch
char-rnn deep-learning lstm natural-language-generation natural-language-processing nlp pytorch pytorch-implementation pytorch-nlp pytorch-tutorial rnn
Last synced: 13 Nov 2024
https://github.com/rainmaker712/nlp_ryan
Study for Natural Language Processing & Deep Learning Framework
chatbot deep-learning machine-comprehension machine-learning nlp python pytorch scala spark tensorflow
Last synced: 13 Nov 2024
https://github.com/neomatrix369/chatbot-conversations
Chatbot conversations: a demo application how two (or more) chatbots can talk to each other, the logic used to build Eliza (along with an NLP model) has been used to power the chatbots.
ai chat-application chatbot eliza eliza-chatbot graalvm helidon helidon-example java ml nlp python quarkus text
Last synced: 14 Oct 2024
https://github.com/datawhalechina/whale-paper
Datawhale论文分享,阅读前沿论文,分享技术创新
cv nlp papers recommendation-system
Last synced: 09 Nov 2024
https://github.com/johncmunson/react-taggy
A simple zero-dependency React component for tagging user-defined entities within a block of text.
component entities named-entity-recognition natural-language ner nlp react react-component
Last synced: 28 Aug 2024
https://github.com/mirusu400/clova-x
Unofficial API for CLOVA X
api clova clovaai hacktoberfest llm naver naver-api nlp
Last synced: 06 Nov 2024
https://github.com/bnosac/rdrpostagger
R package for Ripple Down Rules-based Part-Of-Speech Tagging (RDRPOS). On more than 45 languages.
java multi-language natural-language-processing nlp pos pos-tagging r r-package tagging
Last synced: 11 Nov 2024
https://github.com/alexeyev/keras-generating-sentences-from-a-continuous-space
Text Variational Autoencoder inspired by the paper 'Generating Sentences from a Continuous Space' Bowman et al. https://arxiv.org/abs/1511.06349
deep-learning deeplearning keras keras-implementations nlp text-generation vae variational-autoencoder
Last synced: 11 Nov 2024
https://github.com/michaelaquilina/hashedindex
Python package providing an Inverted Index implementation using dictionaries
indexing nlp nlp-machine-learning numpy pandas python2 python3 text-processing
Last synced: 28 Oct 2024
https://github.com/hyperparticle/lemmatag
A neural network that jointly part-of-speech tags and lemmatizes sentences, boosting accuracy for morphologically-rich languages (Czech, Arabic, etc.)
deep-learning lemmatization machine-learning natural-language-processing neural-network nlp pos-tagging tensorflow
Last synced: 14 Nov 2024
https://github.com/selimfirat/bilkent-turkish-writings-dataset
Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.
bilkent-university creative-writing dataset nlp nlp-datasets pdf-conversion turkish turkish-language
Last synced: 10 Oct 2024
https://github.com/koichiyasuoka/unidic2ud
Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese
dependency-parser japanese-language nlp
Last synced: 16 Nov 2024
https://github.com/sea-snell/calm-dialogue
Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"
deep-learning language-model nlp python pytorch reinforcement-learning
Last synced: 27 Oct 2024
https://github.com/cyclecycle/spacy-pattern-builder
Reverse engineer patterns for use with SpaCy's DependencyMatcher
Last synced: 10 Oct 2024
https://github.com/nlppln/nlppln
NLP pipeline software using common workflow language
cwl nlp pipeline text-mining workflow
Last synced: 23 Oct 2024
https://github.com/livingbio/syntaxnet_wrapper
A Python Wrapper for Google SyntaxNet
google-syntaxnet nlp python python-wrapper syntaxnet
Last synced: 09 Nov 2024
https://github.com/omarsar/clinical_nlp_elastic
Clinical NLP Analysis with Elasticsearch and Kibana
elastic elasticsearch kibana linguistics machine-learning mental-health nlp
Last synced: 28 Oct 2024
https://github.com/x-lance/mobile-env
A Universal Platform for Training and Evaluation of Mobile Interaction
decision-making information-ui infoui interaction-platform nlp rl-environments rl-platform
Last synced: 12 Nov 2024
https://github.com/ivan-bilan/nlp-and-data-science-spotlights
Regular spotlights of underrated NLP and Data Science GitHub repositories
data-science deep-learning natural-language-processing nlp spotlight
Last synced: 08 Nov 2024
https://github.com/paulbricman/semantica
Extending conceptual thinking with semantic embeddings.
creativity embeddings nlp tools-for-thought wordembeddings
Last synced: 17 Nov 2024
https://github.com/wit-ai/android-voice-demo
Example on how to build a voice-enabled Android app with Wit.ai
android machine-learning nlp nlu voice wit witai
Last synced: 15 Nov 2024
https://github.com/writer/replacy
spaCy match and replace, maintaining conjugation
Last synced: 01 Nov 2024
https://github.com/omarsar/nlp_pycon
Material for PyCon 2019 NLP Tutorial
deep machine-learning nlp pytorch
Last synced: 28 Oct 2024
https://github.com/pyunits/pyunit-ner
NER实体识别模型,快速高效简单一键部署docker部署调用模型。能识别:地址、人名、机构名实体。
Last synced: 12 Nov 2024
https://github.com/georgezouq/awosome-ai-in-social-media
💻 Collect those AI & Bot use in social media wechat/facebook/twitter/instagram/weibo/TikTok etc.
facebook ins nlp social-media social-network social-network-analysis twitter wechat
Last synced: 10 Nov 2024
https://github.com/wri-dssg-omdena/policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers
Last synced: 30 Oct 2024
https://github.com/Ermlab/PoLitBert
Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.
nlp polish roberta text-corpus
Last synced: 11 Nov 2024
https://github.com/miroozyx/BERT_with_keras
A Keras version of Google's BERT model
bert deep-learning nlp tensorflow
Last synced: 02 Nov 2024
https://github.com/riccorl/transformers-embedder
A Word Level Transformer layer based on PyTorch and 🤗 Transformers.
allennlp bert deep-learning embeddings hidden-states huggingface huggingface-transformers language-model natural-language-processing nlp preprocess pretrained-models python pytorch sentences tokenizer transformer transformer-embedder transformers transformers-embedder
Last synced: 08 Nov 2024
https://github.com/sayakpaul/bert-for-mobile
Compares the DistilBERT and MobileBERT architectures for mobile deployments.
bert distilbert mobile mobile-bert nlp tensorflow-lite
Last synced: 23 Oct 2024
https://github.com/nitotm/efficient-language-detector-js
Fast and accurate natural language detection. Detector written in Javascript. Nito-ELD, ELD.
javascript language language-detection language-detector language-identification natural-language natural-language-processing nlp nodejs
Last synced: 12 Oct 2024
https://github.com/pszemraj/ai-msgbot
Training & Implementation of chatbots leveraging GPT-like architecture with the aitextgen package to enable dynamic conversations.
ai aitextgen chat-application chatbot deep-learning deepspeed deployment gpt-2 gpt-j gpt-j-6b gradio huggingface huggingface-transformers natural-language-processing nlp nlp-parsing telegram telegram-bot text-generation transformers
Last synced: 03 Oct 2024
https://github.com/alan-turing-institute/robots-in-disguise
Information and materials for the Turing's "robots-in-disguise" reading group on fundamental AI research.
deep-learning diffusion-models foundation-model hut23 language-models large-language-models machine-learning nlp transformers
Last synced: 13 Nov 2024
https://github.com/dzieciou/pystempel
Python port of Stempel, an algorithmic stemmer for Polish language.
Last synced: 26 Oct 2024
https://github.com/nlpir-team/nlpir-python
NLPIR-python A python wrapper and toolkit for NLPIR
Last synced: 14 Nov 2024
https://github.com/google-research-datasets/swim-ir
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
cross-lingual datasets deep-learning information-retrieval machine-learning multilingual natural-language-processing neural-information-retrieval nlp training-data
Last synced: 08 Nov 2024
https://github.com/zlsh80826/msmarco
Machine Comprehension Train on MSMARCO with S-NET Extraction Modification
cntk extraction-net machine-comprehension msmarco nlp question-answering s-net
Last synced: 28 Oct 2024
https://github.com/aliosm/simplerepresentations
Easy-to-use text representations extraction library based on the Transformers library.
Last synced: 27 Oct 2024
https://github.com/JackHCC/Chinese-Tokenization
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】
bert-crf bilstm-crf hmm-viterbi-algorithm ngram nlp tokenization
Last synced: 18 Nov 2024
https://github.com/hellohaptik/HINT3
This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020's Insights Workshop https://insights-workshop.github.io/ Preprint for the paper is available here https://arxiv.org/abs/2009.13833
conversational-ai datasets dialogue-systems nlp
Last synced: 16 Nov 2024
https://github.com/centre-for-humanities-computing/tweetopic
Blazing fast topic modelling for short texts.
dirichlet-process-mixtures dmm gibbs-sampling gsdmm machine-learning mcmc nlp python scikit-learn topic-modeling tweet tweet-analysis visualization
Last synced: 16 Nov 2024
https://github.com/uminosachi/open-llm-webui
This repository contains a web application designed to execute relatively compact, locally-operated Large Language Models (LLMs).
chatbot ggml gradio huggingface language-model llama llama2 llama3 llava llava-llama3 llm nlp transformers
Last synced: 10 Oct 2024
https://github.com/staticdev/human-readable
Lib to make data intended for machines, readable to humans.
formatting humanizable humanization natural-language-processing nlp readable
Last synced: 16 Nov 2024
https://github.com/vasilescur/parse_context
Use GPT-3 to process human conversations and extract context, identify information that would be useful, and suggest data sources to get that information. Intended for a voice assistant.
ai assistants gpt-3 natural-language nlp semantic-analysis
Last synced: 16 Nov 2024
https://github.com/thisiscetin/textoken
Simple and customizable text tokenization gem.
Last synced: 07 Nov 2024
https://github.com/oneoffcoder/docker-containers
A collection of pedantic docker containers.
deep-learning docker-containers docker-images jupyter nlp object-detection python raspberry-pi yolo
Last synced: 05 Nov 2024
https://github.com/hyunwoongko/bert2bert-summarization
Abstractive summarization using Bert2Bert framework.
Last synced: 28 Oct 2024
https://github.com/princeton-vl/attach-juxtapose-parser
Code for the paper "Strongly Incremental Constituency Parsing with Graph Neural Networks"
machine-learning neurips-2020 nlp parsing
Last synced: 09 Nov 2024
https://github.com/eimg/myanmar-text-breaker
Syllable and word, breaker/boundary-segmentation for Myanmar text in JavaScript
Last synced: 25 Oct 2024
https://github.com/peaceiris/actions-suggest-related-links
A GitHub Action to suggest related or similar issues, documents, and links. Based on the power of NLP and fastText.
actions fasttext github-actions issue-management nlp
Last synced: 31 Oct 2024
https://github.com/proycon/analiticcl
an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
approximate-string-matching fuzzy-matching nlp normalization spelling-correction
Last synced: 14 Nov 2024
https://github.com/Smat26/Roman-Urdu-Dataset
Compilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources
data-science dataset hindi hindi-language natural-language-processing nlp urdu urdu-language urdu-nlp
Last synced: 04 Aug 2024
https://github.com/codewithzichao/deepclassifier
DeepClassifier is aimed at building general text classification model library.It's easy and user-friendly to build any text classification task.
deep-learning deepclassifier nlp pytorch text-classification torch
Last synced: 07 Nov 2024