Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/brianspiering/nlp-course

An introduction to Natural Language Processing (NLP) course

machine-learning natural-language-processing nlp python

Last synced: 07 Nov 2024

https://github.com/ownthink/chatbot

基于语义理解、知识图谱的聊天机器人

chatbot knowledgegraph nlp nlu qa

Last synced: 07 Nov 2024

https://github.com/dalmia/quora-question-pairs

The code for our submission in Kaggle's competition Quora Question Pairs which ranked in the top 25%.

deep-learning machine-learning nlp quora-question-pairs tensorflow

Last synced: 30 Oct 2024

https://github.com/binsarjr/chatbot-indonesia

Kumpulan data yang akan digunakan untuk keperluan chatbot bahasa Indonesia dengan kode chatbot sederhana menggunakan Typescript

bot chatbot chatbot-indonesia hacktoberfest nlp text-processing

Last synced: 10 Dec 2024

https://github.com/Qznan/QizNLP

Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)

beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow

Last synced: 16 Nov 2024

https://github.com/thunlp/cokebert

CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models

bert knowledge-graph nlp pretrained-language-model pytorch

Last synced: 10 Nov 2024

https://github.com/ganjinzero/triaffine-nested-ner

Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition [ACL 2022 Findings]

ner nlp

Last synced: 22 Nov 2024

https://github.com/sarthakjshetty/pyresearchinsights

End-to-end NLP tool to analyze research publications. Published in Ecology & Evolution 2021.

gensim natural-language-processing nlp python scientific-analysis spacy text-mining

Last synced: 26 Dec 2024

https://github.com/google-marketing-solutions/ml_toast

Cluster multilingual search terms captured from different time windows into semantically relevant topics.

data-science machine-learning marketing-science nlp tensorflow topic-clustering

Last synced: 05 Dec 2024

https://github.com/nicolasassi/gomtch

Find text even if it doesn't want to be found

nlp text-mining text-processing

Last synced: 23 Nov 2024

https://github.com/benjaminvdb/DBRD

110k Dutch Book Reviews Dataset for Sentiment Analysis

dataset dataset-creation dutch nlp nlp-machine-learning python python3 scraped-data scraper

Last synced: 17 Nov 2024

https://github.com/qznan/qiznlp

Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)

beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow

Last synced: 17 Feb 2025

https://github.com/songyouwei/fiction_generator

Fiction generator with Tensorflow. 模仿王小波的风格的小说生成器

deep-learning keras lstm nlp seq2seq tensorflow text-generation

Last synced: 11 Nov 2024

https://github.com/stevenay/myan-word-breaker

Myanmar Word Segmentation Tool

burmese nlp word-segmentation

Last synced: 25 Oct 2024

https://github.com/akosbalasko/obsidian-autotagger-plugin

This plugin offers smart tags for notes by performing Named Entity Recognition (NER) on the content

natural-language-processing nlp obsidian-md obsidian-plugin

Last synced: 22 Oct 2024

https://github.com/ayaka14732/bart-base-jax

JAX implementation of the bart-base model

bart jax natural-language-processing nlp nlp-model

Last synced: 28 Oct 2024

https://github.com/maxent-ai/lda2vec

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019

chainer deep-learning embeddings lda nlp python3 sklearn text text-mining topic-modeling word-embeddings word2vec

Last synced: 25 Jan 2025

https://github.com/dbklim/stressrnn

Modified version of RusStress (https://github.com/MashaPo/russtress) — python package for placing stress in Russian text using RNN (BiLSTM) and the "Grammatical Dictionary" by A. A. Zaliznyak (from http://odict.ru/).

accent bilstm emphasis linguistic linguistics lstm nlp rnn russian russian-accent russian-stress russtress rustress stress

Last synced: 11 Nov 2024

https://github.com/eimg/burmese-text-classifier

A neural network based text classification system for Burmese

deep-learning javascript nlp

Last synced: 25 Oct 2024

https://github.com/arjunpatel7/perfect-prompt

An approach to creating the perfect prompt for any image generation task.

cohere nlp prompt stable-diffusion streamlit text-generation

Last synced: 24 Jan 2025

https://github.com/proycon/python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

computational-linguistics folia nlp nlp-library python text-processing tokenizer

Last synced: 18 Feb 2025

https://github.com/97k/spam-ham-web-app

A web app that classifies text as a spam or ham. I am using my own ML algorithm in the backend, Code to that can be found under machine_learning_section. For Live Demo: Checkout this link

bag-of-words data-visualization django heroku-deployment jupyter-notebook machine-learning machine-learning-projects multinomial-naive-bayes nlp nltk spam-classification text-classification tfidf

Last synced: 11 Nov 2024

https://github.com/PhilipMay/stsb-multi-mt

Machine translated multilingual STS benchmark dataset.

dataset multilingual nlp

Last synced: 16 Nov 2024

https://github.com/dsdanielpark/gpt2-bert-medical-qa-chat

Medical domain-focused GPT-2 fine-tuning, optimization, and lightweighting research repository (compared to GPT-4).

bert chatgpt gpt2 gpt4 medical-chatbot natural-language-processing nlp nlp-keywords-extraction

Last synced: 14 Nov 2024

https://github.com/ademakdogan/gpterm

Creating Intelligent Terminal Apps with ChatGPT and LLM Models

chatgpt chatgpt-api iterm2 langchain langchain-python natural-language-processing nlp python query-generator terminal

Last synced: 07 Nov 2024

https://github.com/zwhe99/selftraining4unmt

Implementaion of our ACL 2022 paper "Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation"

machine-translation nlp unsupervised-learning

Last synced: 02 Dec 2024

https://github.com/adamspannbauer/app_rasa_chat_bot

a stateless chat bot to perform natural language queries against the App Store top charts

chatbot dash nlp nlu plotly rasa

Last synced: 13 Feb 2025

https://github.com/sedthh/lara-hungarian-nlp

NLP class for rapid ChatBot development in Hungarian language

chatbot hungarian hungarian-language lemmatizer nlp python3 stemmer

Last synced: 17 Nov 2024

https://github.com/imkett/adavae

[Preprint] AdaVAE: Exploring Adaptive GPT-2s in VAEs for Language Modeling PyTorch Implementation

controllable-generation gpt-2 nlp parameter-efficient-tuning representation-learning text-classification text-generation vae variational-autoencoder

Last synced: 26 Nov 2024

https://github.com/flight-school/lemma

A command-line utility that lemmatizes words in natural language text.

cli lemmatization macos nlp swift

Last synced: 26 Nov 2024

https://gair-nlp.github.io/BeHonest/

BeHonest: Benchmarking Honesty in Large Language Models

alignment benchmark evaluation honesty llm nlp

Last synced: 12 Feb 2025

https://github.com/mideind/tokenizer

A tokenizer for Icelandic text

icelandic nlp python tokenizer

Last synced: 13 Feb 2025

https://github.com/stephanj/bm25

A BM25 Java implementation using streams, stop words and stemming.

bm25 llm nlp rerank stemming

Last synced: 30 Jan 2025

https://github.com/shashwath94/hierarchical-seq2seq

A PyTorch implementation of the hierarchical encoder-decoder architecture (HRED) introduced in Sordoni et al (2015). It is a hierarchical encoder-decoder architecture for modeling conversation triples in the MovieTriples dataset. This version of the model is built for the MovieTriples dataset.

deep-learning hred nlp pytorch seq2seq-pytorch

Last synced: 03 Jan 2025

https://github.com/voidful/nlprep

🍳 NLPrep - dataset tool for many natural language processing task

dataset nlp prepare pytorch tfkit

Last synced: 31 Dec 2024

https://github.com/kampsy/gwizo

Simple Go implementation of the Porter Stemmer algorithm with powerful features.

consonants nlp nlp-stemming porter-stemmer-algorithm stemmer vowel

Last synced: 17 Dec 2024

https://github.com/t-systems-on-site-services-gmbh/german-elmo-model

This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.

bilm elmo embedding german machine-learning nlp python tensorflow

Last synced: 20 Nov 2024

https://github.com/vidhi1290/llm---detect-ai-generated-text

AI-Generated Text Detection: A BERT-powered solution for accurately identifying AI-generated text. Seamlessly integrated, highly accurate, and user-friendly.🚀

ai-generated bert bert-model detection-algorithm kaggle kaggle-competition llm machine-learning natural-language-processing nlp

Last synced: 08 Dec 2024

https://github.com/old-storyai/Story.ai

Notebook for building internal tools.

ai nlp nocode

Last synced: 25 Nov 2024

https://github.com/old-storyai/story.ai

Notebook for building internal tools.

ai nlp nocode

Last synced: 22 Jan 2025

https://github.com/voytas75/GPTprompts

Prompts for LLMs

gpt llm nlp prompts

Last synced: 20 Nov 2024

https://github.com/gagolews/stringx

Drop-in replacements for base R string functions powered by stringi

icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi text text-processing unicode

Last synced: 19 Dec 2024

https://github.com/andreaferretti/charade

A server for multilanguage, composable NLP API in Python

nlp nlp-apis python

Last synced: 14 Oct 2024

https://github.com/saidziani/feedny

The Internet plays an increasingly important part in our daily lives as a source of written content for news and leisure. Yet it is tedious and difficult to sort through this staggering flow of information and stay updated with changes in our world, even using automated tools. Reading magazines and newspapers is too time-consuming, and there is a huge amount of online content that is updated or generated each minute. Our solution considers each user’s interests and leverages Artificial Intelligence, Machine Learning and Natural Language Processing in order to suggest to relevant articles from the internet.

automatic-summarization javascript machine-learning machine-translation natural-language-processing nlp profiling react-native recommendation-system text-classification

Last synced: 28 Oct 2024

https://github.com/tianduowang/diffaug

EMNLP 2022: Differentiable Data Augmentation for Contrastive Sentence Representation Learning. https://arxiv.org/abs/2210.16536

data-augmentation nlp sentence-embeddings

Last synced: 14 Oct 2024

https://github.com/zimmerrol/attention-is-all-you-need-keras

Implementation of the Transformer architecture described by Vaswani et al. in "Attention Is All You Need"

attention-is-all-you-need keras neural-network nlp seq2seq transformer

Last synced: 22 Oct 2024

https://github.com/yuyuzha0/word2vec

a word2vec impl of Chinese language, based on deeplearning4j and ansj

chinese java nlp word2vec word2vec-zh

Last synced: 12 Nov 2024

https://github.com/vaibhavs10/10_days_of_deep_learning

10 days 10 different practical applications of Deep Learning (primarily NLP) using Tensorflow and Keras

classification gensim keras nlp python tensorflow tfidf-matrix

Last synced: 19 Dec 2024

https://github.com/ramtinms/tokenquery

TokenQuery (regular expressions over tokens)

machine-learning natural-language-processing nlp regex regular-expressions

Last synced: 11 Nov 2024

https://github.com/bloomberg/entsum

Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization

nlp

Last synced: 09 Nov 2024

https://github.com/adapter-hub/efficient-task-transfer

Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

adapters bert nlp roberta transfer-learning transformers

Last synced: 06 Nov 2024

https://github.com/jonsafari/tok-tok

A fast, simple, multilingual tokenizer

multilingual nlp tokeniser tokenizer

Last synced: 18 Feb 2025

https://github.com/aqibsaeed/research-paper-categorization

Research paper classification using machine learning and NLP

machine-learning nlp text-classification

Last synced: 09 Nov 2024

https://github.com/praful932/llmsearch

Find better generation parameters for your LLM

llm llm-evaluation llm-inference nlp

Last synced: 27 Oct 2024

https://github.com/siphulangeni/tortus

A PyPI package for easy text annotation in a Jupyter Notebook.

annotation-tool ipywidgets jupyter-notebook labeling-tool nlp

Last synced: 08 Nov 2024

https://github.com/veler/notepad-based-calculator

A smart calculator using natural language processing

calculator csharp dotnet mef natural-language-processing nlp

Last synced: 29 Oct 2024

https://github.com/trashhalo/logseq-summarizer

Logseq plugin to summarize text

logseq nlp pin

Last synced: 02 Nov 2024

https://github.com/sap-samples/acl2020-commonsense

Source code for paper on commonsense reasoning for 2020 Annual Conference of the Association for Computational Linguistics (ACL) 2020.

commonsense-reasoning contrastive deep-learning machine-learning nlp sample sample-code self-supervised

Last synced: 15 Nov 2024

https://github.com/microsoft/verseagility

Ramp up your custom natural language processing (NLP) task, allowing you to bring your own data, use your preferred frameworks and bring models into production.

classification dstoolkit machine-reading-comprehension ner nlp question-answering summarization transformer

Last synced: 04 Dec 2024

https://github.com/fredriko/bert-tensorflow-pytorch-spacy-conversion

Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.

bert bert-model how-to keras nlp pytorch-transformers spacy spacy-models spacy-nlp spacy-package spacy-pytorch-transformers tensorflow

Last synced: 27 Nov 2024

https://github.com/shamspias/google-reviews-chatbot

The Google Reviews Chatbot fetches reviews via the Google My Business API, analyzes sentiments using GPT-3, and generates tailored responses. Deployed on AWS, it uses Elastic Beanstalk, EC2, and ElastiCache for Redis to run tasks on a schedule, ensuring seamless and efficient chatbot functionality.

ai artificial-intelligence automation celery chatbot flask google google-my-business google-reviews google-reviews-api gpt3 gpt4 natural-language-processing nlp nlp-machine-learning python python3 task-scheduler

Last synced: 31 Jan 2025

https://github.com/houbb/word-cloud

The word cloud tool for java.(java 好用的词云工具-云图)

cloud image nlp word word-cloud wordcloud

Last synced: 07 Nov 2024

https://github.com/suicao/vn-accent-restorer

This project applies multiple deep learning models to the problem of restoring diacritical marks to sentences in Vietnamese.

deep-learning nlp tensorflow tensorflow-experiments

Last synced: 09 Feb 2025

https://github.com/shuakami/amyalmond_bot

👋 QQ机器人,AmyAlmond 是一个基于 Chatgpt 的智能聊天机器人,专为 QQ 群聊设计,支持多语言、上下文感知、长期记忆管理和高级自动化任务。

ai bot chatbot llm nlp openai python qq qqbot tencent

Last synced: 03 Feb 2025

https://github.com/griptape-ai/griptape-tools

Tools for the Griptape Framework.

ai cohere gpt huggingface llm nlp openai python

Last synced: 20 Jan 2025

https://github.com/Praful932/llmsearch

Find better generation parameters for your LLM

llm llm-evaluation llm-inference nlp

Last synced: 08 Nov 2024

https://github.com/swanhtet1992/ReSegment

Burmese (Myanmar) syllable level segmentation with regex.

burmese-nlp myanmar-nlp myanmar-text nlp segmentation

Last synced: 25 Oct 2024

https://github.com/loomchild/maligna

Bilingual sengence aligner

nlp text-alignment translation

Last synced: 11 Jan 2025

https://github.com/jayyip/cws-tensorflow

基于Tensorflow的中文分词模型

nlp tensorflow word-segmentation

Last synced: 11 Nov 2024

https://github.com/agatan/yoin

A Japanese Morphological Analyzer written in pure Rust

japanese nlp rust

Last synced: 05 Nov 2024

https://github.com/philipmay/stsb-multi-mt

Machine translated multilingual STS benchmark dataset.

dataset multilingual nlp

Last synced: 28 Oct 2024

https://github.com/gatenlp/gateplugin-learningframework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.

classification crf machine-learning nlp sequence-tagging

Last synced: 13 Nov 2024

https://github.com/luoyuanlab/text_gcn_tutorial

A tutorial & minimal example (8min on CPU) for Graph Convolutional Networks for Text Classification. AAAI 2019

deep-learning graph-convolutional-networks nlp text-classification

Last synced: 02 Nov 2024

https://github.com/kavgan/clinical-concepts

Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes.

clinical-concepts clinical-nlp clinical-notes concept-graph graph-nlp nlp paper terminologies

Last synced: 09 Feb 2025

https://github.com/generall/entitycategoryprediction

Model for predicting categories of entities by its mentions

allennlp classification mentions nlp

Last synced: 14 Oct 2024

https://github.com/warpy-ai/tgs

Terminal Generative Shell

ai bash nlp shell t5-small terminal

Last synced: 04 Dec 2024

https://github.com/yasinkuyu/turkish.cs

Turkish Suffix Library for C# & .NET- Türkçe Çekim ve Yapım Ekleri

c-sharp nlp stem vowel

Last synced: 06 Nov 2024

https://github.com/shibing624/pinyin-tokenizer

pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。

nlp pinyin pinyin-analysis pinyin4j tokenizer trie-tree

Last synced: 22 Oct 2024

https://github.com/yasinkuyu/Turkish.cs

Turkish Suffix Library for C# & .NET- Türkçe Çekim ve Yapım Ekleri

c-sharp nlp stem vowel

Last synced: 12 Nov 2024

https://github.com/karma9874/seq2seq-chatbot

Chatbot based Seq2Seq model with bidirectional rnn and attention mechanism with tensorflow, trained on Cornell Movie-Dialogs Corpus and deployed on a Flask Server

attention-mechanism bidirectional-lstm chatbot deep-learning flask nlp question-answering seq2seq tensorflow

Last synced: 06 Nov 2024