Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Natural language processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
- GitHub: https://github.com/topics/nlp
- Wikipedia: https://en.wikipedia.org/wiki/Natural_language_processing
- Created by: Alan Turing
- Aliases: natural-language-processing, nlp-machine-learning, nlp-resources,
- Last updated: 2024-11-15 00:20:20 UTC
- JSON Representation
https://github.com/Smat26/Roman-Urdu-Dataset
Compilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources
data-science dataset hindi hindi-language natural-language-processing nlp urdu urdu-language urdu-nlp
Last synced: 04 Aug 2024
https://github.com/Furyton/awesome-language-model-analysis
This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papers in this list investigate the learning behavior, generalization ability, and other properties of language models through theoretical analysis, empirical analysis, or a combination of both.
ai analysis analytics awesome chatgpt deep-learning generative-ai large-language-models llm nlp theory transformers
Last synced: 19 Sep 2024
https://github.com/mananshah99/sentR
Simple sentiment analysis framework for R
Last synced: 05 Aug 2024
https://github.com/solygambas/python-openai-projects
13 projects using ChatGPT API, Whisper, Embeddings, and DALL-E with Python.
auto-gpt chatbot chatgpt dall-e embeddings gpt-4 langchain langchain-python machine-learning nlp nlp-machine-learning open-ai-api openai python reddit reddit-api spotify spotify-api stable-diffusion whisper
Last synced: 27 Oct 2024
https://github.com/X-LANCE/Mobile-Env
A Universal Platform for Training and Evaluation of Mobile Interaction
decision-making information-ui infoui interaction-platform nlp rl-environments rl-platform
Last synced: 09 Nov 2024
https://github.com/hienduyph/oxford-deepnlp-2017
:rocket: :tada: :sparkles: Oxford Deep NLP 2017 Course Materials and Practicals, Solutions
Last synced: 09 Nov 2024
https://github.com/brianspiering/nlp-course
An introduction to Natural Language Processing (NLP) course
machine-learning natural-language-processing nlp python
Last synced: 07 Nov 2024
https://github.com/ownthink/chatbot
基于语义理解、知识图谱的聊天机器人
chatbot knowledgegraph nlp nlu qa
Last synced: 07 Nov 2024
https://github.com/tangbinh/question-answering
bidaf drqa nlp pytorch question-answering squad
Last synced: 13 Nov 2024
https://github.com/qznan/qiznlp
Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)
beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow
Last synced: 13 Oct 2024
https://github.com/dalmia/quora-question-pairs
The code for our submission in Kaggle's competition Quora Question Pairs which ranked in the top 25%.
deep-learning machine-learning nlp quora-question-pairs tensorflow
Last synced: 30 Oct 2024
https://github.com/navalnica/be_nlp_speech_resources
Links to Belarusian NLP and Speech resources
asr belarus belarusian belarusian-language natural-language-processing nlp speech speech-processing speech-recognition speech-synthesis speech-to-text stt text-to-speech tts
Last synced: 13 Nov 2024
https://github.com/Qznan/QizNLP
Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)
beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow
Last synced: 16 Nov 2024
https://github.com/pooya-mohammadi/persian-spell-checker-kenlm
A complete instruction for training a Persian spell checker and a language model based on SymSpell and KenLM, respectively using Wikipedia dataset.
bash kenlm language-model nlp persian python spellcheck spellchecker symspell
Last synced: 04 Aug 2024
https://github.com/sarthakjshetty/pyresearchinsights
End-to-end NLP tool to analyze research publications. Published in Ecology & Evolution 2021.
gensim natural-language-processing nlp python scientific-analysis spacy text-mining
Last synced: 12 Oct 2024
https://github.com/sudharsan13296/word2vec-from-scratch
simple Word2vec from scratch using tensorflow for understanding
deep-learning natural-language-processing nlp scratch word2vec word2vec-algorithm word2vec-model
Last synced: 15 Nov 2024
https://github.com/thunlp/cokebert
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
bert knowledge-graph nlp pretrained-language-model pytorch
Last synced: 10 Nov 2024
https://github.com/explosion/vscode-prodigy
🧬 A VS Code extension for annotating data with Prodigy
annotation-tool data-annotation data-labeling data-labeling-tools data-science labeling-tool nlp prodigy spacy vscode vscode-extension
Last synced: 07 Oct 2024
https://github.com/benjaminvdb/DBRD
110k Dutch Book Reviews Dataset for Sentiment Analysis
dataset dataset-creation dutch nlp nlp-machine-learning python python3 scraped-data scraper
Last synced: 17 Nov 2024
https://github.com/songyouwei/fiction_generator
Fiction generator with Tensorflow. 模仿王小波的风格的小说生成器
deep-learning keras lstm nlp seq2seq tensorflow text-generation
Last synced: 11 Nov 2024
https://github.com/anthonysigogne/web-search-engine-ui
UI - a simple web search engine
elasticsearch google-search indexing nlp python search-engine
Last synced: 12 Nov 2024
https://github.com/PhilipMay/stsb-multi-mt
Machine translated multilingual STS benchmark dataset.
Last synced: 16 Nov 2024
https://github.com/arjunpatel7/perfect-prompt
An approach to creating the perfect prompt for any image generation task.
cohere nlp prompt stable-diffusion streamlit text-generation
Last synced: 11 Oct 2024
https://github.com/proycon/python-ucto
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
computational-linguistics folia nlp nlp-library python text-processing tokenizer
Last synced: 14 Nov 2024
https://github.com/ademakdogan/gpterm
Creating Intelligent Terminal Apps with ChatGPT and LLM Models
chatgpt chatgpt-api iterm2 langchain langchain-python natural-language-processing nlp python query-generator terminal
Last synced: 07 Nov 2024
https://github.com/97k/spam-ham-web-app
A web app that classifies text as a spam or ham. I am using my own ML algorithm in the backend, Code to that can be found under machine_learning_section. For Live Demo: Checkout this link
bag-of-words data-visualization django heroku-deployment jupyter-notebook machine-learning machine-learning-projects multinomial-naive-bayes nlp nltk spam-classification text-classification tfidf
Last synced: 11 Nov 2024
https://github.com/fractalego/pynsett
A programmable relation extraction tool
extract-relationships nlp relation-extraction spacy wikidata-knowledge
Last synced: 12 Oct 2024
https://github.com/ucrel/pymusas
Python Multilingual Ucrel Semantic Analysis System
natural-language-processing nlp python spacy spacy-pipeline
Last synced: 12 Oct 2024
https://github.com/dbklim/stressrnn
Modified version of RusStress (https://github.com/MashaPo/russtress) — python package for placing stress in Russian text using RNN (BiLSTM) and the "Grammatical Dictionary" by A. A. Zaliznyak (from http://odict.ru/).
accent bilstm emphasis linguistic linguistics lstm nlp rnn russian russian-accent russian-stress russtress rustress stress
Last synced: 11 Nov 2024
https://github.com/ayaka14732/bart-base-jax
JAX implementation of the bart-base model
bart jax natural-language-processing nlp nlp-model
Last synced: 28 Oct 2024
https://github.com/stevenay/myan-word-breaker
Myanmar Word Segmentation Tool
Last synced: 25 Oct 2024
https://github.com/eimg/burmese-text-classifier
A neural network based text classification system for Burmese
Last synced: 25 Oct 2024
https://github.com/sedthh/lara-hungarian-nlp
NLP class for rapid ChatBot development in Hungarian language
chatbot hungarian hungarian-language lemmatizer nlp python3 stemmer
Last synced: 17 Nov 2024
https://github.com/dsdanielpark/gpt2-bert-medical-qa-chat
Medical domain-focused GPT-2 fine-tuning, optimization, and lightweighting research repository (compared to GPT-4).
bert chatgpt gpt2 gpt4 medical-chatbot natural-language-processing nlp nlp-keywords-extraction
Last synced: 14 Nov 2024
https://github.com/hscspring/pnlp
NLP预/后处理工具。
chinese-nlp concurrency nlp nlp-enhancer nlp-preprocess normalization preprocessing text-cleaning text-extraction text-length text-processing
Last synced: 17 Nov 2024
https://github.com/akosbalasko/obsidian-autotagger-plugin
This plugin offers smart tags for notes by performing Named Entity Recognition (NER) on the content
natural-language-processing nlp obsidian-md obsidian-plugin
Last synced: 22 Oct 2024
https://github.com/maxent-ai/lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
chainer deep-learning embeddings lda nlp python3 sklearn text text-mining topic-modeling word-embeddings word2vec
Last synced: 30 Sep 2024
https://github.com/andreaferretti/charade
A server for multilanguage, composable NLP API in Python
Last synced: 14 Oct 2024
https://github.com/vaibhavs10/10_days_of_deep_learning
10 days 10 different practical applications of Deep Learning (primarily NLP) using Tensorflow and Keras
classification gensim keras nlp python tensorflow tfidf-matrix
Last synced: 02 Nov 2024
https://github.com/bloomberg/entsum
Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization
Last synced: 09 Nov 2024
https://github.com/yuyuzha0/word2vec
a word2vec impl of Chinese language, based on deeplearning4j and ansj
chinese java nlp word2vec word2vec-zh
Last synced: 12 Nov 2024
https://github.com/tianduowang/diffaug
EMNLP 2022: Differentiable Data Augmentation for Contrastive Sentence Representation Learning. https://arxiv.org/abs/2210.16536
data-augmentation nlp sentence-embeddings
Last synced: 14 Oct 2024
https://github.com/shashwath94/hierarchical-seq2seq
A PyTorch implementation of the hierarchical encoder-decoder architecture (HRED) introduced in Sordoni et al (2015). It is a hierarchical encoder-decoder architecture for modeling conversation triples in the MovieTriples dataset. This version of the model is built for the MovieTriples dataset.
deep-learning hred nlp pytorch seq2seq-pytorch
Last synced: 27 Oct 2024
https://github.com/ariya/tinker-chat
chatbot generative-ai gpt llama llama2 llm mistral nlp openai
Last synced: 01 Nov 2024
https://github.com/griptape-ai/griptape-tools
Tools for the Griptape Framework.
ai cohere gpt huggingface llm nlp openai python
Last synced: 27 Sep 2024
https://github.com/rosette-api/rosette-elasticsearch-plugin
Document Enrichment plugin for Elasticsearch
categorization elasticsearch elasticsearch-plugin entity-extraction fuzzy-name-matching fuzzy-search identity-resolution machine-learning named-entity-recognition natural-language-processing nlp rosette-plugin sentiment-analysis text-analytics text-mining
Last synced: 27 Oct 2024
https://github.com/wannaphong/laonlp
Lao language NLP
hacktoberfest lao lao-language natural-language-processing nlp nlp-library python
Last synced: 14 Nov 2024
https://github.com/trainingbypackt/deep-learning-for-natural-language-processing
Solve your natural language processing problems with smart deep neural networks
deeplearning glove gru keras language lstm namedentityrecognizer natural nlp nlp-library nlp-machine-learning partsofspeechtagger textpreprocessing word2vec
Last synced: 14 Nov 2024
https://github.com/nschneid/amr-hackathon
Abstract Meaning Representation (AMR) Hackathon
abstract-meaning-representation computational-linguistics natural-language-processing nlp python semantics
Last synced: 08 Nov 2024
https://github.com/ramtinms/tokenquery
TokenQuery (regular expressions over tokens)
machine-learning natural-language-processing nlp regex regular-expressions
Last synced: 11 Nov 2024
https://github.com/adapter-hub/efficient-task-transfer
Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021
adapters bert nlp roberta transfer-learning transformers
Last synced: 06 Nov 2024
https://github.com/centre-for-humanities-computing/embedding-explorer
Tools for interactive visual exploration of semantic embeddings.
clustering embedding embeddings interactive knowledge-graph machine-learning networks nlp projection semantic
Last synced: 16 Nov 2024
https://github.com/saidziani/feedny
The Internet plays an increasingly important part in our daily lives as a source of written content for news and leisure. Yet it is tedious and difficult to sort through this staggering flow of information and stay updated with changes in our world, even using automated tools. Reading magazines and newspapers is too time-consuming, and there is a huge amount of online content that is updated or generated each minute. Our solution considers each user’s interests and leverages Artificial Intelligence, Machine Learning and Natural Language Processing in order to suggest to relevant articles from the internet.
automatic-summarization javascript machine-learning machine-translation natural-language-processing nlp profiling react-native recommendation-system text-classification
Last synced: 28 Oct 2024
https://github.com/zimmerrol/attention-is-all-you-need-keras
Implementation of the Transformer architecture described by Vaswani et al. in "Attention Is All You Need"
attention-is-all-you-need keras neural-network nlp seq2seq transformer
Last synced: 22 Oct 2024
https://github.com/anakin87/fact-checking-rocks
Fact checking baseline combining dense retrieval and textual entailment
fact-checking haystack huggingface-spaces information-retrieval natural-language-inference natural-language-processing neural-search nlp python semantic-search streamlit streamlit-webapp text-entailment transformers
Last synced: 22 Oct 2024
https://github.com/kennethenevoldsen/scandinavian-embedding-benchmark
A Scandinavian Benchmark for sentence embeddings
benchmark low-resource-nlp natural-language-processing nlp scandinavian
Last synced: 31 Oct 2024
https://github.com/trashhalo/logseq-summarizer
Logseq plugin to summarize text
Last synced: 02 Nov 2024
https://github.com/Praful932/llmsearch
Find better generation parameters for your LLM
llm llm-evaluation llm-inference nlp
Last synced: 08 Nov 2024
https://github.com/sap-samples/acl2020-commonsense
Source code for paper on commonsense reasoning for 2020 Annual Conference of the Association for Computational Linguistics (ACL) 2020.
commonsense-reasoning contrastive deep-learning machine-learning nlp sample sample-code self-supervised
Last synced: 15 Nov 2024
https://github.com/houbb/word-cloud
The word cloud tool for java.(java 好用的词云工具-云图)
cloud image nlp word word-cloud wordcloud
Last synced: 07 Nov 2024
https://github.com/siphulangeni/tortus
A PyPI package for easy text annotation in a Jupyter Notebook.
annotation-tool ipywidgets jupyter-notebook labeling-tool nlp
Last synced: 08 Nov 2024
https://github.com/swanhtet1992/ReSegment
Burmese (Myanmar) syllable level segmentation with regex.
burmese-nlp myanmar-nlp myanmar-text nlp segmentation
Last synced: 25 Oct 2024
https://github.com/loomchild/maligna
Bilingual sengence aligner
nlp text-alignment translation
Last synced: 08 Nov 2024
https://github.com/fredriko/bert-tensorflow-pytorch-spacy-conversion
Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.
bert bert-model how-to keras nlp pytorch-transformers spacy spacy-models spacy-nlp spacy-package spacy-pytorch-transformers tensorflow
Last synced: 07 Aug 2024
https://github.com/praful932/llmsearch
Find better generation parameters for your LLM
llm llm-evaluation llm-inference nlp
Last synced: 27 Oct 2024
https://github.com/laugustyniak/textlytics
Text processing library for sentiment analysis and related tasks
classification natural-language-processing nlp opinion-mining scikit-learn sentiment-analysis supervised-learning word-embeddings
Last synced: 18 Nov 2024
https://github.com/veler/notepad-based-calculator
A smart calculator using natural language processing
calculator csharp dotnet mef natural-language-processing nlp
Last synced: 29 Oct 2024
https://github.com/aqibsaeed/research-paper-categorization
Research paper classification using machine learning and NLP
machine-learning nlp text-classification
Last synced: 09 Nov 2024
https://github.com/suicao/vn-accent-restorer
This project applies multiple deep learning models to the problem of restoring diacritical marks to sentences in Vietnamese.
deep-learning nlp tensorflow tensorflow-experiments
Last synced: 10 Oct 2024
https://github.com/yasinkuyu/Turkish.cs
Turkish Suffix Library for C# & .NET- Türkçe Çekim ve Yapım Ekleri
Last synced: 12 Nov 2024
https://github.com/luoyuanlab/text_gcn_tutorial
A tutorial & minimal example (8min on CPU) for Graph Convolutional Networks for Text Classification. AAAI 2019
deep-learning graph-convolutional-networks nlp text-classification
Last synced: 02 Nov 2024
https://github.com/seanlee97/llano
Let ChatGPT (Large Language Models) Serve As Data Annotator and Zero-shot/few-shot Information Extractor.
annotataion annotator chatgpt chatie classification data-augmentation few-shot gpt gpt-3 gpt-4 information-extraction large-language-models llm ner nlp openai prompt prompt-engineering relation-extraction zero-shot
Last synced: 27 Oct 2024
https://github.com/gatenlp/gateplugin-learningframework
A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
classification crf machine-learning nlp sequence-tagging
Last synced: 13 Nov 2024
https://github.com/amazon-science/bold
Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper
bert bert-model bias fairness-ml gpt-2 language-model nlg nlg-dataset nlp text-generation
Last synced: 12 Nov 2024
https://github.com/shibing624/pinyin-tokenizer
pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。
nlp pinyin pinyin-analysis pinyin4j tokenizer trie-tree
Last synced: 22 Oct 2024
https://github.com/loristns/wisty.js
🧚♀️ Chatbot library turning conversations into actions, locally, in the browser.
assistant bot bot-framework chatbot chatbots conversational-agents conversational-ai dialogue-systems hybrid-code-networks javascript machine-learning named-entity-recognition natural-language-processing nlp nlu tensorflow tensorflowjs
Last synced: 10 Oct 2024
https://github.com/yasinkuyu/turkish.cs
Turkish Suffix Library for C# & .NET- Türkçe Çekim ve Yapım Ekleri
Last synced: 06 Nov 2024
https://github.com/agatan/yoin
A Japanese Morphological Analyzer written in pure Rust
Last synced: 05 Nov 2024
https://github.com/generall/entitycategoryprediction
Model for predicting categories of entities by its mentions
allennlp classification mentions nlp
Last synced: 14 Oct 2024
https://github.com/philipmay/stsb-multi-mt
Machine translated multilingual STS benchmark dataset.
Last synced: 28 Oct 2024
https://github.com/jayyip/cws-tensorflow
基于Tensorflow的中文分词模型
nlp tensorflow word-segmentation
Last synced: 11 Nov 2024
https://github.com/zimmerrol/show-attend-and-tell-keras
Keras implementation of the "Show, Attend and Tell" paper
attention-mechanism image-captioning keras lstm mscoco-image-dataset nlp rnn show-attend-and-tell tensorflow
Last synced: 22 Oct 2024
https://github.com/loristns/Wisty.js
🧚♀️ Chatbot library turning conversations into actions, locally, in the browser.
assistant bot bot-framework chatbot chatbots conversational-agents conversational-ai dialogue-systems hybrid-code-networks javascript machine-learning named-entity-recognition natural-language-processing nlp nlu tensorflow tensorflowjs
Last synced: 09 Nov 2024
https://github.com/karma9874/seq2seq-chatbot
Chatbot based Seq2Seq model with bidirectional rnn and attention mechanism with tensorflow, trained on Cornell Movie-Dialogs Corpus and deployed on a Flask Server
attention-mechanism bidirectional-lstm chatbot deep-learning flask nlp question-answering seq2seq tensorflow
Last synced: 06 Nov 2024
https://github.com/omarsar/nlp_pytorch_tensorflow_notebooks
Deep Learning for NLP Python Notebooks in PyTorch and TensorFlow
deeplearning emotion nlp pytorch rnn sentiment-analysis tensorflow
Last synced: 13 Oct 2024
https://github.com/anoopkunchukuttan/geomm
Geometry-aware Multilingual Embeddings
bilingual-word-embedding multilingual nlp translation word-embedding
Last synced: 18 Nov 2024
https://github.com/gaussalgo/adaptor
ACL 2022: Adaptor: a library to easily adapt a language model to your own task, domain, or custom objective(s).
domain-adaptation multi-objective-optimization ner nlp pytorch robustness text-classification text-generation transformers
Last synced: 08 Nov 2024
https://github.com/elizalo/question-answering-based-on-squad
Question Answering System using BiDAF Model on SQuAD v2.0
bidaf machine-learning natural-language-processing natural-language-understanding neural-network nlp nlp-datasets nlp-machine-learning python python-3-6 question-answering squad
Last synced: 28 Sep 2024
https://github.com/kargaranamir/parstdex
A package that extracts Persian time and date markers by applying regexes -- AACL 2022
datetime event-extract event-extraction hengam hengamtagger information-extraction nlp parstdex persian persian-calendar persian-datetime persian-time regex-pattern time-date
Last synced: 04 Aug 2024
https://github.com/rileynwong/spotify-analysis
Data analysis on my monthly playlists
audio-features data-analysis data-scraping lyrics machine-learning natural-language-processing nlp nlp-machine-learning sentiment-analysis spotify-analysis supervised-learning supervised-machine-learning text text-analysis
Last synced: 14 Oct 2024
https://github.com/loomchild/segment
Program used to split text into segments
Last synced: 14 Nov 2024
https://github.com/tlack/hairytext
A data labeling and NLP tool for Elixir (uses Spacy)
elixir entity-recognition nlp nlp-machine-learning phoenix-live-view spacy text-classification
Last synced: 28 Oct 2024
https://github.com/hankcs/sub-character-cws
Sub-Character Representation Learning
chinese-word-segmentation cws natural-language-processing nlp representation-learning simplified-chinese traditional-chinese
Last synced: 13 Oct 2024
https://github.com/princeton-nlp/rationale-robustness
NAACL 2022: Can Rationalization Improve Robustness? https://arxiv.org/abs/2204.11790
interpretability nlp robustness
Last synced: 11 Nov 2024
https://github.com/ElizaLo/Question-Answering-based-on-SQuAD
Question Answering System using BiDAF Model on SQuAD v2.0
bidaf machine-learning natural-language-processing natural-language-understanding neural-network nlp nlp-datasets nlp-machine-learning python python-3-6 question-answering squad
Last synced: 13 Nov 2024