Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Natural language processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
- GitHub: https://github.com/topics/nlp
- Wikipedia: https://en.wikipedia.org/wiki/Natural_language_processing
- Created by: Alan Turing
- Aliases: natural-language-processing, nlp-machine-learning, nlp-resources,
- Last updated: 2025-02-20 00:20:37 UTC
- JSON Representation
https://github.com/brianspiering/nlp-course
An introduction to Natural Language Processing (NLP) course
machine-learning natural-language-processing nlp python
Last synced: 07 Nov 2024
https://github.com/ownthink/chatbot
基于语义理解、知识图谱的聊天机器人
chatbot knowledgegraph nlp nlu qa
Last synced: 07 Nov 2024
https://github.com/explosion/vscode-prodigy
🧬 A VS Code extension for annotating data with Prodigy
annotation-tool data-annotation data-labeling data-labeling-tools data-science labeling-tool nlp prodigy spacy vscode vscode-extension
Last synced: 04 Feb 2025
https://github.com/dalmia/quora-question-pairs
The code for our submission in Kaggle's competition Quora Question Pairs which ranked in the top 25%.
deep-learning machine-learning nlp quora-question-pairs tensorflow
Last synced: 30 Oct 2024
https://github.com/binsarjr/chatbot-indonesia
Kumpulan data yang akan digunakan untuk keperluan chatbot bahasa Indonesia dengan kode chatbot sederhana menggunakan Typescript
bot chatbot chatbot-indonesia hacktoberfest nlp text-processing
Last synced: 10 Dec 2024
https://github.com/Qznan/QizNLP
Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)
beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow
Last synced: 16 Nov 2024
https://github.com/sudharsan13296/word2vec-from-scratch
simple Word2vec from scratch using tensorflow for understanding
deep-learning natural-language-processing nlp scratch word2vec word2vec-algorithm word2vec-model
Last synced: 15 Nov 2024
https://github.com/thunlp/cokebert
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
bert knowledge-graph nlp pretrained-language-model pytorch
Last synced: 10 Nov 2024
https://github.com/ganjinzero/triaffine-nested-ner
Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition [ACL 2022 Findings]
Last synced: 22 Nov 2024
https://github.com/sarthakjshetty/pyresearchinsights
End-to-end NLP tool to analyze research publications. Published in Ecology & Evolution 2021.
gensim natural-language-processing nlp python scientific-analysis spacy text-mining
Last synced: 26 Dec 2024
https://github.com/google-marketing-solutions/ml_toast
Cluster multilingual search terms captured from different time windows into semantically relevant topics.
data-science machine-learning marketing-science nlp tensorflow topic-clustering
Last synced: 05 Dec 2024
https://github.com/nicolasassi/gomtch
Find text even if it doesn't want to be found
nlp text-mining text-processing
Last synced: 23 Nov 2024
https://github.com/benjaminvdb/DBRD
110k Dutch Book Reviews Dataset for Sentiment Analysis
dataset dataset-creation dutch nlp nlp-machine-learning python python3 scraped-data scraper
Last synced: 17 Nov 2024
https://github.com/tangbinh/question-answering
bidaf drqa nlp pytorch question-answering squad
Last synced: 13 Nov 2024
https://github.com/qznan/qiznlp
Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)
beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow
Last synced: 17 Feb 2025
https://github.com/songyouwei/fiction_generator
Fiction generator with Tensorflow. 模仿王小波的风格的小说生成器
deep-learning keras lstm nlp seq2seq tensorflow text-generation
Last synced: 11 Nov 2024
https://github.com/wannaphong/laonlp
Lao language NLP
hacktoberfest lao lao-language natural-language-processing nlp nlp-library python
Last synced: 17 Feb 2025
https://github.com/stevenay/myan-word-breaker
Myanmar Word Segmentation Tool
Last synced: 25 Oct 2024
https://github.com/akosbalasko/obsidian-autotagger-plugin
This plugin offers smart tags for notes by performing Named Entity Recognition (NER) on the content
natural-language-processing nlp obsidian-md obsidian-plugin
Last synced: 22 Oct 2024
https://github.com/ayaka14732/bart-base-jax
JAX implementation of the bart-base model
bart jax natural-language-processing nlp nlp-model
Last synced: 28 Oct 2024
https://github.com/maxent-ai/lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
chainer deep-learning embeddings lda nlp python3 sklearn text text-mining topic-modeling word-embeddings word2vec
Last synced: 25 Jan 2025
https://github.com/dbklim/stressrnn
Modified version of RusStress (https://github.com/MashaPo/russtress) — python package for placing stress in Russian text using RNN (BiLSTM) and the "Grammatical Dictionary" by A. A. Zaliznyak (from http://odict.ru/).
accent bilstm emphasis linguistic linguistics lstm nlp rnn russian russian-accent russian-stress russtress rustress stress
Last synced: 11 Nov 2024
https://github.com/eimg/burmese-text-classifier
A neural network based text classification system for Burmese
Last synced: 25 Oct 2024
https://github.com/arjunpatel7/perfect-prompt
An approach to creating the perfect prompt for any image generation task.
cohere nlp prompt stable-diffusion streamlit text-generation
Last synced: 24 Jan 2025
https://github.com/proycon/python-ucto
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
computational-linguistics folia nlp nlp-library python text-processing tokenizer
Last synced: 18 Feb 2025
https://github.com/97k/spam-ham-web-app
A web app that classifies text as a spam or ham. I am using my own ML algorithm in the backend, Code to that can be found under machine_learning_section. For Live Demo: Checkout this link
bag-of-words data-visualization django heroku-deployment jupyter-notebook machine-learning machine-learning-projects multinomial-naive-bayes nlp nltk spam-classification text-classification tfidf
Last synced: 11 Nov 2024
https://github.com/PhilipMay/stsb-multi-mt
Machine translated multilingual STS benchmark dataset.
Last synced: 16 Nov 2024
https://github.com/anthonysigogne/web-search-engine-ui
UI - a simple web search engine
elasticsearch google-search indexing nlp python search-engine
Last synced: 12 Nov 2024
https://github.com/dsdanielpark/gpt2-bert-medical-qa-chat
Medical domain-focused GPT-2 fine-tuning, optimization, and lightweighting research repository (compared to GPT-4).
bert chatgpt gpt2 gpt4 medical-chatbot natural-language-processing nlp nlp-keywords-extraction
Last synced: 14 Nov 2024
https://github.com/ademakdogan/gpterm
Creating Intelligent Terminal Apps with ChatGPT and LLM Models
chatgpt chatgpt-api iterm2 langchain langchain-python natural-language-processing nlp python query-generator terminal
Last synced: 07 Nov 2024
https://github.com/zwhe99/selftraining4unmt
Implementaion of our ACL 2022 paper "Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation"
machine-translation nlp unsupervised-learning
Last synced: 02 Dec 2024
https://github.com/centre-for-humanities-computing/embedding-explorer
Tools for interactive visual exploration of semantic embeddings.
clustering embedding embeddings interactive knowledge-graph machine-learning networks nlp projection semantic
Last synced: 19 Dec 2024
https://github.com/hscspring/pnlp
NLP预/后处理工具。
chinese-nlp concurrency nlp nlp-enhancer nlp-preprocess normalization preprocessing text-cleaning text-extraction text-length text-processing
Last synced: 17 Jan 2025
https://github.com/sedthh/lara-hungarian-nlp
NLP class for rapid ChatBot development in Hungarian language
chatbot hungarian hungarian-language lemmatizer nlp python3 stemmer
Last synced: 17 Nov 2024
https://github.com/imkett/adavae
[Preprint] AdaVAE: Exploring Adaptive GPT-2s in VAEs for Language Modeling PyTorch Implementation
controllable-generation gpt-2 nlp parameter-efficient-tuning representation-learning text-classification text-generation vae variational-autoencoder
Last synced: 26 Nov 2024
https://github.com/flight-school/lemma
A command-line utility that lemmatizes words in natural language text.
cli lemmatization macos nlp swift
Last synced: 26 Nov 2024
https://gair-nlp.github.io/BeHonest/
BeHonest: Benchmarking Honesty in Large Language Models
alignment benchmark evaluation honesty llm nlp
Last synced: 12 Feb 2025
https://github.com/mideind/tokenizer
A tokenizer for Icelandic text
icelandic nlp python tokenizer
Last synced: 13 Feb 2025
https://github.com/shashwath94/hierarchical-seq2seq
A PyTorch implementation of the hierarchical encoder-decoder architecture (HRED) introduced in Sordoni et al (2015). It is a hierarchical encoder-decoder architecture for modeling conversation triples in the MovieTriples dataset. This version of the model is built for the MovieTriples dataset.
deep-learning hred nlp pytorch seq2seq-pytorch
Last synced: 03 Jan 2025
https://github.com/kampsy/gwizo
Simple Go implementation of the Porter Stemmer algorithm with powerful features.
consonants nlp nlp-stemming porter-stemmer-algorithm stemmer vowel
Last synced: 17 Dec 2024
https://github.com/t-systems-on-site-services-gmbh/german-elmo-model
This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.
bilm elmo embedding german machine-learning nlp python tensorflow
Last synced: 20 Nov 2024
https://github.com/vidhi1290/llm---detect-ai-generated-text
AI-Generated Text Detection: A BERT-powered solution for accurately identifying AI-generated text. Seamlessly integrated, highly accurate, and user-friendly.🚀
ai-generated bert bert-model detection-algorithm kaggle kaggle-competition llm machine-learning natural-language-processing nlp
Last synced: 08 Dec 2024
https://github.com/old-storyai/Story.ai
Notebook for building internal tools.
Last synced: 25 Nov 2024
https://github.com/old-storyai/story.ai
Notebook for building internal tools.
Last synced: 22 Jan 2025
https://github.com/gagolews/stringx
Drop-in replacements for base R string functions powered by stringi
icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi text text-processing unicode
Last synced: 19 Dec 2024
https://github.com/kodiks/turkish-news-classification
Turkish News Category Classification Tutorial
datasets huggingface machine-learning news-classification nlp svm-classifier text-classification tf-idf-vectorizer turkish-nlp
Last synced: 10 Feb 2025
https://github.com/andreaferretti/charade
A server for multilanguage, composable NLP API in Python
Last synced: 14 Oct 2024
https://github.com/anakin87/fact-checking-rocks
Fact checking baseline combining dense retrieval and textual entailment
fact-checking haystack huggingface-spaces information-retrieval natural-language-inference natural-language-processing neural-search nlp python semantic-search streamlit streamlit-webapp text-entailment transformers
Last synced: 22 Oct 2024
https://github.com/rosette-api/rosette-elasticsearch-plugin
Document Enrichment plugin for Elasticsearch
categorization elasticsearch elasticsearch-plugin entity-extraction fuzzy-name-matching fuzzy-search identity-resolution machine-learning named-entity-recognition natural-language-processing nlp rosette-plugin sentiment-analysis text-analytics text-mining
Last synced: 27 Oct 2024
https://github.com/saidziani/feedny
The Internet plays an increasingly important part in our daily lives as a source of written content for news and leisure. Yet it is tedious and difficult to sort through this staggering flow of information and stay updated with changes in our world, even using automated tools. Reading magazines and newspapers is too time-consuming, and there is a huge amount of online content that is updated or generated each minute. Our solution considers each user’s interests and leverages Artificial Intelligence, Machine Learning and Natural Language Processing in order to suggest to relevant articles from the internet.
automatic-summarization javascript machine-learning machine-translation natural-language-processing nlp profiling react-native recommendation-system text-classification
Last synced: 28 Oct 2024
https://github.com/tianduowang/diffaug
EMNLP 2022: Differentiable Data Augmentation for Contrastive Sentence Representation Learning. https://arxiv.org/abs/2210.16536
data-augmentation nlp sentence-embeddings
Last synced: 14 Oct 2024
https://github.com/zimmerrol/attention-is-all-you-need-keras
Implementation of the Transformer architecture described by Vaswani et al. in "Attention Is All You Need"
attention-is-all-you-need keras neural-network nlp seq2seq transformer
Last synced: 22 Oct 2024
https://github.com/yuyuzha0/word2vec
a word2vec impl of Chinese language, based on deeplearning4j and ansj
chinese java nlp word2vec word2vec-zh
Last synced: 12 Nov 2024
https://github.com/vaibhavs10/10_days_of_deep_learning
10 days 10 different practical applications of Deep Learning (primarily NLP) using Tensorflow and Keras
classification gensim keras nlp python tensorflow tfidf-matrix
Last synced: 19 Dec 2024
https://github.com/ramtinms/tokenquery
TokenQuery (regular expressions over tokens)
machine-learning natural-language-processing nlp regex regular-expressions
Last synced: 11 Nov 2024
https://github.com/bloomberg/entsum
Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization
Last synced: 09 Nov 2024
https://github.com/adapter-hub/efficient-task-transfer
Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021
adapters bert nlp roberta transfer-learning transformers
Last synced: 06 Nov 2024
https://github.com/ariya/tinker-chat
chatbot generative-ai gpt llama llama2 llm mistral nlp openai
Last synced: 08 Jan 2025
https://github.com/jonsafari/tok-tok
A fast, simple, multilingual tokenizer
multilingual nlp tokeniser tokenizer
Last synced: 18 Feb 2025
https://github.com/trainingbypackt/deep-learning-for-natural-language-processing
Solve your natural language processing problems with smart deep neural networks
deeplearning glove gru keras language lstm namedentityrecognizer natural nlp nlp-library nlp-machine-learning partsofspeechtagger textpreprocessing word2vec
Last synced: 14 Nov 2024
https://github.com/nschneid/amr-hackathon
Abstract Meaning Representation (AMR) Hackathon
abstract-meaning-representation computational-linguistics natural-language-processing nlp python semantics
Last synced: 08 Nov 2024
https://github.com/aqibsaeed/research-paper-categorization
Research paper classification using machine learning and NLP
machine-learning nlp text-classification
Last synced: 09 Nov 2024
https://github.com/praful932/llmsearch
Find better generation parameters for your LLM
llm llm-evaluation llm-inference nlp
Last synced: 27 Oct 2024
https://github.com/siphulangeni/tortus
A PyPI package for easy text annotation in a Jupyter Notebook.
annotation-tool ipywidgets jupyter-notebook labeling-tool nlp
Last synced: 08 Nov 2024
https://github.com/veler/notepad-based-calculator
A smart calculator using natural language processing
calculator csharp dotnet mef natural-language-processing nlp
Last synced: 29 Oct 2024
https://github.com/trashhalo/logseq-summarizer
Logseq plugin to summarize text
Last synced: 02 Nov 2024
https://github.com/sap-samples/acl2020-commonsense
Source code for paper on commonsense reasoning for 2020 Annual Conference of the Association for Computational Linguistics (ACL) 2020.
commonsense-reasoning contrastive deep-learning machine-learning nlp sample sample-code self-supervised
Last synced: 15 Nov 2024
https://github.com/delonnewman/mini-levenshtein
Simple, fast Levenshtein distance and similarity ratio for Ruby
algorithms comparison fuzzy-matching levenshtein-distance natural-language-processing nlp ruby ruby-extension string-matching text
Last synced: 10 Dec 2024
https://github.com/anthonymrios/pyclausie
Python wrapper for ClausIE.
nlp open-information-extraction python relation-extraction
Last synced: 20 Nov 2024
https://github.com/microsoft/verseagility
Ramp up your custom natural language processing (NLP) task, allowing you to bring your own data, use your preferred frameworks and bring models into production.
classification dstoolkit machine-reading-comprehension ner nlp question-answering summarization transformer
Last synced: 04 Dec 2024
https://github.com/fredriko/bert-tensorflow-pytorch-spacy-conversion
Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.
bert bert-model how-to keras nlp pytorch-transformers spacy spacy-models spacy-nlp spacy-package spacy-pytorch-transformers tensorflow
Last synced: 27 Nov 2024
https://github.com/shamspias/google-reviews-chatbot
The Google Reviews Chatbot fetches reviews via the Google My Business API, analyzes sentiments using GPT-3, and generates tailored responses. Deployed on AWS, it uses Elastic Beanstalk, EC2, and ElastiCache for Redis to run tasks on a schedule, ensuring seamless and efficient chatbot functionality.
ai artificial-intelligence automation celery chatbot flask google google-my-business google-reviews google-reviews-api gpt3 gpt4 natural-language-processing nlp nlp-machine-learning python python3 task-scheduler
Last synced: 31 Jan 2025
https://github.com/houbb/word-cloud
The word cloud tool for java.(java 好用的词云工具-云图)
cloud image nlp word word-cloud wordcloud
Last synced: 07 Nov 2024
https://github.com/suicao/vn-accent-restorer
This project applies multiple deep learning models to the problem of restoring diacritical marks to sentences in Vietnamese.
deep-learning nlp tensorflow tensorflow-experiments
Last synced: 09 Feb 2025
https://github.com/griptape-ai/griptape-tools
Tools for the Griptape Framework.
ai cohere gpt huggingface llm nlp openai python
Last synced: 20 Jan 2025
https://github.com/laugustyniak/textlytics
Text processing library for sentiment analysis and related tasks
classification natural-language-processing nlp opinion-mining scikit-learn sentiment-analysis supervised-learning word-embeddings
Last synced: 18 Nov 2024
https://github.com/Praful932/llmsearch
Find better generation parameters for your LLM
llm llm-evaluation llm-inference nlp
Last synced: 08 Nov 2024
https://github.com/swanhtet1992/ReSegment
Burmese (Myanmar) syllable level segmentation with regex.
burmese-nlp myanmar-nlp myanmar-text nlp segmentation
Last synced: 25 Oct 2024
https://github.com/loomchild/maligna
Bilingual sengence aligner
nlp text-alignment translation
Last synced: 11 Jan 2025
https://github.com/jayyip/cws-tensorflow
基于Tensorflow的中文分词模型
nlp tensorflow word-segmentation
Last synced: 11 Nov 2024
https://github.com/agatan/yoin
A Japanese Morphological Analyzer written in pure Rust
Last synced: 05 Nov 2024
https://github.com/loristns/wisty.js
🧚♀️ Chatbot library turning conversations into actions, locally, in the browser.
assistant bot bot-framework chatbot chatbots conversational-agents conversational-ai dialogue-systems hybrid-code-networks javascript machine-learning named-entity-recognition natural-language-processing nlp nlu tensorflow tensorflowjs
Last synced: 09 Feb 2025
https://github.com/philipmay/stsb-multi-mt
Machine translated multilingual STS benchmark dataset.
Last synced: 28 Oct 2024
https://github.com/gatenlp/gateplugin-learningframework
A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
classification crf machine-learning nlp sequence-tagging
Last synced: 13 Nov 2024
https://github.com/zimmerrol/show-attend-and-tell-keras
Keras implementation of the "Show, Attend and Tell" paper
attention-mechanism image-captioning keras lstm mscoco-image-dataset nlp rnn show-attend-and-tell tensorflow
Last synced: 22 Oct 2024
https://github.com/luoyuanlab/text_gcn_tutorial
A tutorial & minimal example (8min on CPU) for Graph Convolutional Networks for Text Classification. AAAI 2019
deep-learning graph-convolutional-networks nlp text-classification
Last synced: 02 Nov 2024
https://github.com/kavgan/clinical-concepts
Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes.
clinical-concepts clinical-nlp clinical-notes concept-graph graph-nlp nlp paper terminologies
Last synced: 09 Feb 2025
https://github.com/generall/entitycategoryprediction
Model for predicting categories of entities by its mentions
allennlp classification mentions nlp
Last synced: 14 Oct 2024
https://github.com/seanlee97/llano
Let ChatGPT (Large Language Models) Serve As Data Annotator and Zero-shot/few-shot Information Extractor.
annotataion annotator chatgpt chatie classification data-augmentation few-shot gpt gpt-3 gpt-4 information-extraction large-language-models llm ner nlp openai prompt prompt-engineering relation-extraction zero-shot
Last synced: 23 Jan 2025
https://github.com/loristns/Wisty.js
🧚♀️ Chatbot library turning conversations into actions, locally, in the browser.
assistant bot bot-framework chatbot chatbots conversational-agents conversational-ai dialogue-systems hybrid-code-networks javascript machine-learning named-entity-recognition natural-language-processing nlp nlu tensorflow tensorflowjs
Last synced: 09 Nov 2024
https://github.com/yasinkuyu/turkish.cs
Turkish Suffix Library for C# & .NET- Türkçe Çekim ve Yapım Ekleri
Last synced: 06 Nov 2024
https://github.com/shibing624/pinyin-tokenizer
pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。
nlp pinyin pinyin-analysis pinyin4j tokenizer trie-tree
Last synced: 22 Oct 2024
https://github.com/yasinkuyu/Turkish.cs
Turkish Suffix Library for C# & .NET- Türkçe Çekim ve Yapım Ekleri
Last synced: 12 Nov 2024
https://github.com/karma9874/seq2seq-chatbot
Chatbot based Seq2Seq model with bidirectional rnn and attention mechanism with tensorflow, trained on Cornell Movie-Dialogs Corpus and deployed on a Flask Server
attention-mechanism bidirectional-lstm chatbot deep-learning flask nlp question-answering seq2seq tensorflow
Last synced: 06 Nov 2024