Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Natural language processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
- GitHub: https://github.com/topics/nlp
- Wikipedia: https://en.wikipedia.org/wiki/Natural_language_processing
- Created by: Alan Turing
- Aliases: natural-language-processing, nlp-machine-learning, nlp-resources,
- Last updated: 2024-07-29 13:51:14 UTC
- JSON Representation
https://github.com/s-nlp/rudetoxifier
Code and data of "Methods for Detoxification of Texts for the Russian Language" paper
nlp russian-language style-transfer
Last synced: 07 Aug 2024
https://github.com/natasha/naeval
Comparing quality and performance of NLP systems for Russian language
evaluation nlp performance-analysis python russian
Last synced: 07 Aug 2024
https://github.com/jonathanbratt/RBERTviz
Visualization tools to use with RBERT
bert htmlwidgets natural-language-processing nlp rstats rstudio tensorflow
Last synced: 05 Aug 2024
https://github.com/ai-forever/model-zoo
NLP model zoo for Russian
bert nlp pytorch roberta roberta-model russian russian-language t5 t5-model transformers
Last synced: 07 Aug 2024
https://github.com/coosto/dutch-word-embeddings
Dutch word embeddings, trained on a large collection of Dutch social media messages and news/blog/forum posts.
coosto dutch nlp word2vec word2vec-model wordembeddings
Last synced: 03 Aug 2024
https://github.com/aphp/eds-pseudo
EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports
Last synced: 03 Sep 2024
https://github.com/Lipairui/textgo
Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!
bert nlp text-classification text-preprocessing text-representation text-search text-similarity
Last synced: 07 Aug 2024
https://github.com/nlpcloud/nlpcloud-js
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and much more...
ad-generator chatbot code-generation conversational-ai embeddings intent-classification keywords-extraction language-detection machine-translation ner nlp paraphrasing question-answering semantic-similarity sentiment-analysis text-classification text-generation text-summarization tokenization
Last synced: 01 Aug 2024
https://github.com/OpenSextant/Xponents
Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.
document-conversion geocoding geonames geoparsing geotagging information-extraction nlp solr tika
Last synced: 01 Aug 2024
https://github.com/shineware/PyKOMORAN
(Beta) PyKOMORAN is wrapped KOMORAN in Python using Py4J.
komoran korean korean-analysis korean-nlp korean-text-processing korean-tokenizer morphological-analyser nlp py4j pypi-packages
Last synced: 02 Aug 2024
https://github.com/edwardcooper/piidetect
A package to build an end-to-end pipeline for detecting personally identifiable information from text.
nlp pii pii-detection word2vec
Last synced: 02 Aug 2024
https://github.com/lizadaly/blackout
NaNoGenMo 2016 entry #2
blackout grammar nlp ocr tesseract-ocr tracery tracery-grammar
Last synced: 02 Aug 2024
https://github.com/saareliad/FTPipe
FTPipe and related pipeline model parallelism research.
deep-neural-networks distributed-training fine-tuning nlp pipeline-parallelism t5
Last synced: 01 Aug 2024
https://github.com/sanghviharshit/pocket-tagger
📖👓🏷Tag your getpocket.com articles automatically using natural language processing
articles getpocket google-cloud natural-language-processing nlp pocket scraper tag
Last synced: 31 Jul 2024
https://github.com/IndexFziQ/KMRC-Papers
A list of recent papers regarding knowledge-based machine reading comprehension.
knowledge knowledge-base machine-reading-comprehension nlp paper reading-comprehension
Last synced: 02 Aug 2024
https://github.com/GeekDream-x/SemEval2022-Task8-TonyX
Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity
computational-linguistics cross-lingual crosslingual deep-learning machine-learning multi-lingual multilingual natural-language-processing nlp paper semantic-similarity semeval-2022 xlm-roberta
Last synced: 03 Aug 2024
https://github.com/adamlui/autoclear-chatgpt-history
🕶️ Adds chat auto-clear functionality to ChatGPT for more privacy
artificial-intelligence chat chatbot chatgpt chatgpt3 gpt gpt-3 gpt-4 greasemonkey javascript machine-learning ml nlp openai privacy userscripts
Last synced: 31 Jul 2024
https://github.com/danieldeutsch/repro
Repro is a library for easily running code from published papers via Docker.
docker machine-learning nlp reproducibility reproducible-research
Last synced: 01 Aug 2024
https://github.com/TheHamkerCat/python-arq
Asynchronous Python Wrapper For A.R.Q API.
api api-wrapper arq chatbot-api deezer deezer-api fastapi natural-language-processing nlp pornhub-api python-arq saavn spam-classification spam-detection spellcheck torrent-api wallpaper-api youtube-api
Last synced: 09 Aug 2024
https://github.com/megagonlabs/t5-japanese
Codes to pre-train Japanese T5 models
natural-language-processing nlp t5 transformer
Last synced: 02 Aug 2024
https://github.com/kinosal/cowriter
Write 10x faster using OpenAI's GPT-3 based Davinci model to autocomplete your text
Last synced: 31 Jul 2024
https://github.com/ysy1216/firewallm
By calling FirewaLLM, users can ensure the accuracy of the large model while greatly reducing the risk of privacy leakage when interacting with it. We believe that FirewallLLM is a privacy protected chatgpt interaction platform.
chatbot chatgpt firewall flask llm nlp privacy python web
Last synced: 02 Aug 2024
https://github.com/Flight-School/sentences
A command-line utility that splits natural language text into sentences.
cli macos nlp sentence-tokenizer swift
Last synced: 05 Aug 2024
https://github.com/leoneversberg/llm-chatbot-rag
A local LLM chatbot with RAG for PDF input files
Last synced: 08 Aug 2024
https://github.com/winkjs/wink-naive-bayes-text-classifier
Naive Bayes Text Classifier
chatbot classifier machine-learning naive-bayes natural-language-processing nlp sentiment-analysis text-classification winkjs winknlp
Last synced: 31 Jul 2024
https://github.com/ahmedbesbes/audiolizr
A bentoML-powered API to transcribe audio and make sense of it
bentoml bentoml-service docker nlp openai openai-whisper pytube speech-recognition t5 torch transformers
Last synced: 07 Aug 2024
https://github.com/dair-ai/notebooks
🔬 Sharing your data science notebooks with the community has never been this easy.
artificial-intelligence deep-learning machine-learning nlp
Last synced: 01 Aug 2024
https://github.com/rosette-api/python
Rosette API Client Library for Python
categorization entity-extraction fuzzy-matching language-detection language-identification lemmatization machine-learning morphology name-generation name-similarity name-translation natural-language-processing nlp python relation-extraction sentiment-analysis text text-analysis text-mining tokenization
Last synced: 03 Aug 2024
https://github.com/ropensci-archive/geoparser
:no_entry: ARCHIVED :no_entry:
geocoding geoparser nlp peer-reviewed r r-package rstats
Last synced: 05 Aug 2024
https://github.com/GermanT5/wikipedia2corpus
Wikipedia text corpus for self-supervised NLP model training
corpus german-nlp machine-learning nlp somajo wikipedia wikipedia-corpus
Last synced: 31 Jul 2024
https://github.com/syzer/sentiment-analyser
ML that can extract german and english sentiment
english german nlp nlp-library node-js nodejs sentiment-analyser sentiment-analysis
Last synced: 01 Aug 2024
https://github.com/ardauzunoglu/rte-speech-generator
Natural Language Processing to generate new speeches for the President of Turkey.
natural-language-processing nlp politics python speech-processing tensorflow turkce turkish turkish-nlp
Last synced: 02 Aug 2024
https://github.com/MiuLab/FlowDelta
FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension
machine-comprehension nlp pytorch question-answering
Last synced: 07 Aug 2024
https://github.com/bastienbot/nlp-js-tools-french
POS Tagger, lemmatizer and stemmer for french language in javascript
lemmatization lemmatizer nlp postagging postgresql stemmer stemming tokenization tokenizer
Last synced: 28 Aug 2024
https://github.com/thisisiron/nmt-attention-tf2
👫 Effective Approaches to Attention-based Neural Machine Translation implemented as Tensorflow 2.0
attention lstm natural-language-processing neural-machine-translation nlp nmt tensorflow tensorflow2 tf2 translation
Last synced: 31 Jul 2024
https://github.com/mchmarny/tsignal
Analyzing social media sentiment and its impact on stock market
analytics golang nasdaq nlp sentiment-analysis twitter
Last synced: 03 Aug 2024
https://github.com/promptable/Promptable-web-sdk
Web SDK for Promptable Website.
ai chaining compose gpt-3 llm model nlp prompt promptable promptengineering prompting prompts
Last synced: 03 Sep 2024
https://github.com/cocoa-ai/NamesCoreMLDemo
🏷 iOS11 demo application for predicting gender from first names.
classification coreml coreml-models gender-classification ios machine-learning nlp swift swift4
Last synced: 09 Aug 2024
https://github.com/aws-solutions/content-localization-on-aws
Automatically generate multi-language subtitles using AWS AI/ML services. Machine generated subtitles can be edited to improve accuracy and downstream tracks will automatically be regenerated based on the edits. Built on Media Insights Engine (https://github.com/awslabs/aws-media-insights-engine)
amazon-comprehend amazon-polly amazon-transcribe amazon-translate audio aws-media-insights-engine captions content-analysis localisation localization media mie nlp nlp-machine-learning speech-to-text subtitles video video-on-demand vod
Last synced: 01 Aug 2024
https://github.com/johncmunson/react-taggy
A simple zero-dependency React component for tagging user-defined entities within a block of text.
component entities named-entity-recognition natural-language ner nlp react react-component
Last synced: 28 Aug 2024
https://github.com/Lingkai-Kong/Calibrated-BERT-Fine-Tuning
Code for Paper: Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data
bert calibration deep-learning language-model nlp nlp-machine-learning ood-detection open-world-classification robustness text-classification uncertainty-estimation uncertainty-quantification
Last synced: 02 Aug 2024
https://github.com/rainmaker712/nlp_ryan
Study for Natural Language Processing & Deep Learning Framework
chatbot deep-learning machine-comprehension machine-learning nlp python pytorch scala spark tensorflow
Last synced: 02 Aug 2024
https://github.com/miroozyx/BERT_with_keras
A Keras version of Google's BERT model
bert deep-learning nlp tensorflow
Last synced: 01 Aug 2024
https://github.com/Ermlab/PoLitBert
Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.
nlp polish roberta text-corpus
Last synced: 02 Aug 2024
https://github.com/wri-dssg-omdena/policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers
Last synced: 31 Jul 2024
https://github.com/JackHCC/Chinese-Tokenization
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】
bert-crf bilstm-crf hmm-viterbi-algorithm ngram nlp tokenization
Last synced: 03 Aug 2024
https://github.com/dzieciou/pystempel
Python port of Stempel, an algorithmic stemmer for Polish language.
Last synced: 31 Jul 2024
https://github.com/hellohaptik/HINT3
This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020's Insights Workshop https://insights-workshop.github.io/ Preprint for the paper is available here https://arxiv.org/abs/2009.13833
conversational-ai datasets dialogue-systems nlp
Last synced: 03 Aug 2024
https://github.com/mananshah99/sentR
Simple sentiment analysis framework for R
Last synced: 05 Aug 2024
https://github.com/Furyton/awesome-language-model-analysis
This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papers in this list investigate the learning behavior, generalization ability, and other properties of language models through theoretical analysis, empirical analysis, or a combination of both.
ai analysis analytics awesome chatgpt deep-learning generative-ai large-language-models llm nlp theory transformers
Last synced: 19 Sep 2024
https://github.com/Smat26/Roman-Urdu-Dataset
Compilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources
data-science dataset hindi hindi-language natural-language-processing nlp urdu urdu-language urdu-nlp
Last synced: 04 Aug 2024
https://github.com/pooya-mohammadi/persian-spell-checker-kenlm
A complete instruction for training a Persian spell checker and a language model based on SymSpell and KenLM, respectively using Wikipedia dataset.
bash kenlm language-model nlp persian python spellcheck spellchecker symspell
Last synced: 04 Aug 2024
https://github.com/tangbinh/question-answering
bidaf drqa nlp pytorch question-answering squad
Last synced: 02 Aug 2024
https://github.com/benjaminvdb/DBRD
110k Dutch Book Reviews Dataset for Sentiment Analysis
dataset dataset-creation dutch nlp nlp-machine-learning python python3 scraped-data scraper
Last synced: 03 Aug 2024
https://github.com/stevenay/myan-word-breaker
Myanmar Word Segmentation Tool
Last synced: 30 Jul 2024
https://github.com/Qznan/QizNLP
Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)
beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow
Last synced: 03 Aug 2024
https://github.com/sedthh/lara-hungarian-nlp
NLP class for rapid ChatBot development in Hungarian language
chatbot hungarian hungarian-language lemmatizer nlp python3 stemmer
Last synced: 03 Aug 2024
https://github.com/andreaferretti/charade
A server for multilanguage, composable NLP API in Python
Last synced: 05 Aug 2024
https://github.com/akosbalasko/obsidian-autotagger-plugin
This plugin offers smart tags for notes by performing Named Entity Recognition (NER) on the content
natural-language-processing nlp obsidian-md obsidian-plugin
Last synced: 07 Aug 2024
https://github.com/wannaphong/laonlp
Lao language NLP
hacktoberfest lao lao-language natural-language-processing nlp nlp-library python
Last synced: 04 Aug 2024
https://github.com/nschneid/amr-hackathon
Abstract Meaning Representation (AMR) Hackathon
abstract-meaning-representation computational-linguistics natural-language-processing nlp python semantics
Last synced: 31 Jul 2024
https://github.com/eimg/myanmar-text-breaker
Syllable and word, breaker/boundary-segmentation for Myanmar text in JavaScript
Last synced: 30 Jul 2024
https://github.com/eimg/burmese-text-classifier
A neural network based text classification system for Burmese
Last synced: 30 Jul 2024
https://github.com/rosette-api/rosette-elasticsearch-plugin
Document Enrichment plugin for Elasticsearch
categorization elasticsearch elasticsearch-plugin entity-extraction fuzzy-name-matching fuzzy-search identity-resolution machine-learning named-entity-recognition natural-language-processing nlp rosette-plugin sentiment-analysis text-analytics text-mining
Last synced: 03 Aug 2024
https://github.com/laugustyniak/textlytics
Text processing library for sentiment analysis and related tasks
classification natural-language-processing nlp opinion-mining scikit-learn sentiment-analysis supervised-learning word-embeddings
Last synced: 03 Aug 2024
https://github.com/fredriko/bert-tensorflow-pytorch-spacy-conversion
Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.
bert bert-model how-to keras nlp pytorch-transformers spacy spacy-models spacy-nlp spacy-package spacy-pytorch-transformers tensorflow
Last synced: 07 Aug 2024
https://github.com/swanhtet1992/ReSegment
Burmese (Myanmar) syllable level segmentation with regex.
burmese-nlp myanmar-nlp myanmar-text nlp segmentation
Last synced: 30 Jul 2024
https://github.com/agatan/yoin
A Japanese Morphological Analyzer written in pure Rust
Last synced: 01 Aug 2024
https://github.com/loristns/Wisty.js
🧚♀️ Chatbot library turning conversations into actions, locally, in the browser.
assistant bot bot-framework chatbot chatbots conversational-agents conversational-ai dialogue-systems hybrid-code-networks javascript machine-learning named-entity-recognition natural-language-processing nlp nlu tensorflow tensorflowjs
Last synced: 01 Aug 2024
https://github.com/yasinkuyu/Turkish.cs
Turkish Suffix Library for C# & .NET- Türkçe Çekim ve Yapım Ekleri
Last synced: 02 Aug 2024
https://github.com/luoyuanlab/text_gcn_tutorial
A tutorial & minimal example (8min on CPU) for Graph Convolutional Networks for Text Classification. AAAI 2019
deep-learning graph-convolutional-networks nlp text-classification
Last synced: 01 Aug 2024
https://github.com/kargaranamir/parstdex
A package that extracts Persian time and date markers by applying regexes -- AACL 2022
datetime event-extract event-extraction hengam hengamtagger information-extraction nlp parstdex persian persian-calendar persian-datetime persian-time regex-pattern time-date
Last synced: 04 Aug 2024
https://github.com/Praful932/llmsearch
Find better generation parameters for your LLM
llm llm-evaluation llm-inference nlp
Last synced: 01 Aug 2024
https://github.com/anoopkunchukuttan/geomm
Geometry-aware Multilingual Embeddings
bilingual-word-embedding multilingual nlp translation word-embedding
Last synced: 03 Aug 2024
https://github.com/ElizaLo/Question-Answering-based-on-SQuAD
Question Answering System using BiDAF Model on SQuAD v2.0
bidaf machine-learning natural-language-processing natural-language-understanding neural-network nlp nlp-datasets nlp-machine-learning python python-3-6 question-answering squad
Last synced: 02 Aug 2024
https://github.com/shrebox/Personified-Chatbot
A personified chatbot responding to a query based on the answering pattern of Dr. APJ Abdul Kalam using Information Retrieval, Natural Language Processing, and Deep Learning techniques.
apj-abdul-kalam chatbot deep-learning information-retrieval lstm natural-language-processing nlp ranking-algorithm seq2seq-chatbot seq2seq-model summarization word2vec
Last synced: 02 Aug 2024
https://github.com/quickgrid/AI-Resources
Research Paper Summaries, Setup & Performance Notes, Resource Links on AI, Deep Learning, NLP, Computer Vision for my learning.
ai ai-notes ai-research blender computer-vision deep-learning nlp paper-summaries papers research-paper research-paper-summaries
Last synced: 01 Aug 2024
https://github.com/quickgrid/ai-resources
Research Paper Summaries, Setup & Performance Notes, Resource Links on AI, Deep Learning, NLP, Computer Vision for my learning.
ai ai-notes ai-research blender computer-vision deep-learning nlp paper-summaries papers research-paper research-paper-summaries
Last synced: 07 Aug 2024
https://github.com/davidsvy/Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
dataset deduplication fine-tuning fraud gpt2 huggingface lsh minhash nlp pytorch readability scam transformer web-scraping
Last synced: 05 Aug 2024
https://github.com/PhilipMay/stsb-multi-mt
Machine translated multilingual STS benchmark dataset.
Last synced: 03 Aug 2024
https://github.com/minerva-ml/steppy-toolkit
Curated set of transformers that make your work with steppy faster and more effective :telescope:
data-science deep-learning keras keras-models machine-learning nlp open-source pipeline pipeline-framework python python3 pytorch pytorch-models reproducibility reproducible-research steppy steppy-toolkit steps tensorflow tensorflow-models
Last synced: 31 Jul 2024
https://github.com/AmrHendy/programming-language-translator
An easy way to use the released TransCoder by Facebook AI Research to convert code from one programming language to another using unsupervised neural machine translation (NMT) systems that use deep-learning to translate text from one natural language to another and is trained only on monolingual source data.
machine-translation nlp programming-language transcoder transformer unsupervised-deep-learning unsupervised-translation
Last synced: 06 Aug 2024
https://github.com/KGCP/MEL-TNNT/
Metadata Extractor & Loader (MEL) ■ The NLP-NER Toolkit (TNNT)
metadata-extraction named-entity-recognition natural-language-processing nlp nlp-ner pipeline
Last synced: 14 Aug 2024
https://github.com/undertheseanlp/slp3-vietnamese
Speech and Language Processing 3rd edition Vietnamese Translation
book-translation nlp vietnamese-nlp
Last synced: 01 Aug 2024
https://github.com/jawahar273/practNLPTools-lite
Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), semantic role labeling (SRL) and syntactic parsing (PSG) with skip-gram all in Python and still more features will be added. The website give is for downlarding Senna tool
nlp practnlptools3 senna senna-nlp
Last synced: 07 Aug 2024
https://github.com/hscspring/ALL4AI
AI Related Tools/Projects
ai jupyter linux machine-learning nlp python ssh toolbox
Last synced: 01 Aug 2024
https://github.com/kampersanda/tongrams-rs
Rust library providing fast language model queries in compressed space
compression elias-fano language-model ngrams nlp trie
Last synced: 02 Aug 2024
https://github.com/senthilchandrasegaran/textplorer
Visual analytics application for qualitative text analysis
nlp text-visualization visual-analytics
Last synced: 31 Jul 2024
https://github.com/amsqr/NaiveSumm
NaiveSumm is a naive summarization approach based on Luhn1958 work "The Automatic Creation of Literature Abstracts" It uses the frequencies of words in the document in order to calculate and extract the sentences that include the most frequent words.
natural-language-processing nlp python summarization
Last synced: 31 Jul 2024
https://github.com/KGCP/MEL-TNNT
Metadata Extractor & Loader (MEL) ■ The NLP-NER Toolkit (TNNT)
metadata-extraction named-entity-recognition natural-language-processing nlp nlp-ner pipeline
Last synced: 03 Aug 2024
https://github.com/korpling/pepper
A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.
annotations converter format java linguistic-formats linguistics nlp pepper
Last synced: 03 Aug 2024
https://github.com/derintelligence/en-az-parallel-corpus
English-Azerbaijani parallel language corpus
azerbaijan azerbaijani-translation corpus language linguistics nlp parallel translation
Last synced: 02 Aug 2024
https://github.com/yhy1117/x-mixup
Implementation of ICLR 2022 paper "Enhancing Cross-lingual Transfer by Manifold Mixup".
cross-lingual-transfer manifold-mixup nlp
Last synced: 28 Aug 2024
https://github.com/janekb04/py2gpt
Convert Python code into JSON consumable by OpenAI's function API.
ai api chatgpt converter function gpt gpt-4 json nlp openai openai-api python schema transcoding
Last synced: 01 Aug 2024