Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/s-nlp/rudetoxifier

Code and data of "Methods for Detoxification of Texts for the Russian Language" paper

nlp russian-language style-transfer

Last synced: 07 Aug 2024

https://github.com/natasha/naeval

Comparing quality and performance of NLP systems for Russian language

evaluation nlp performance-analysis python russian

Last synced: 07 Aug 2024

https://github.com/chatopera/chatopera.feishu

通过 Feishu 开放平台和 Chatopera 机器人平台上线智能对话机器人服务, 聊天机器人,飞书,lark

ai bot chatbot chatopera dialog feishu lark machine-learning nlp nlu python python3

Last synced: 31 Jul 2024

https://github.com/coosto/dutch-word-embeddings

Dutch word embeddings, trained on a large collection of Dutch social media messages and news/blog/forum posts.

coosto dutch nlp word2vec word2vec-model wordembeddings

Last synced: 03 Aug 2024

https://github.com/aphp/eds-pseudo

EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports

edsnlp nlp pseudonymisation

Last synced: 03 Sep 2024

https://github.com/Lipairui/textgo

Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!

bert nlp text-classification text-preprocessing text-representation text-search text-similarity

Last synced: 07 Aug 2024

https://github.com/nlpcloud/nlpcloud-js

NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and much more...

ad-generator chatbot code-generation conversational-ai embeddings intent-classification keywords-extraction language-detection machine-translation ner nlp paraphrasing question-answering semantic-similarity sentiment-analysis text-classification text-generation text-summarization tokenization

Last synced: 01 Aug 2024

https://github.com/OpenSextant/Xponents

Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.

document-conversion geocoding geonames geoparsing geotagging information-extraction nlp solr tika

Last synced: 01 Aug 2024

https://github.com/edwardcooper/piidetect

A package to build an end-to-end pipeline for detecting personally identifiable information from text.

nlp pii pii-detection word2vec

Last synced: 02 Aug 2024

https://github.com/saareliad/FTPipe

FTPipe and related pipeline model parallelism research.

deep-neural-networks distributed-training fine-tuning nlp pipeline-parallelism t5

Last synced: 01 Aug 2024

https://github.com/sanghviharshit/pocket-tagger

📖👓🏷Tag your getpocket.com articles automatically using natural language processing

articles getpocket google-cloud natural-language-processing nlp pocket scraper tag

Last synced: 31 Jul 2024

https://github.com/IndexFziQ/KMRC-Papers

A list of recent papers regarding knowledge-based machine reading comprehension.

knowledge knowledge-base machine-reading-comprehension nlp paper reading-comprehension

Last synced: 02 Aug 2024

https://github.com/KehaoWu/Jinyong-Corpus

金庸15部小说字典

corpus-data nlp

Last synced: 01 Aug 2024

https://github.com/danieldeutsch/repro

Repro is a library for easily running code from published papers via Docker.

docker machine-learning nlp reproducibility reproducible-research

Last synced: 01 Aug 2024

https://github.com/megagonlabs/t5-japanese

Codes to pre-train Japanese T5 models

natural-language-processing nlp t5 transformer

Last synced: 02 Aug 2024

https://github.com/kinosal/cowriter

Write 10x faster using OpenAI's GPT-3 based Davinci model to autocomplete your text

ai gpt machine-learning nlp

Last synced: 31 Jul 2024

https://github.com/ysy1216/firewallm

By calling FirewaLLM, users can ensure the accuracy of the large model while greatly reducing the risk of privacy leakage when interacting with it. We believe that FirewallLLM is a privacy protected chatgpt interaction platform.

chatbot chatgpt firewall flask llm nlp privacy python web

Last synced: 02 Aug 2024

https://github.com/Flight-School/sentences

A command-line utility that splits natural language text into sentences.

cli macos nlp sentence-tokenizer swift

Last synced: 05 Aug 2024

https://github.com/leoneversberg/llm-chatbot-rag

A local LLM chatbot with RAG for PDF input files

chatbot llm nlp rag

Last synced: 08 Aug 2024

https://github.com/ahmedbesbes/audiolizr

A bentoML-powered API to transcribe audio and make sense of it

bentoml bentoml-service docker nlp openai openai-whisper pytube speech-recognition t5 torch transformers

Last synced: 07 Aug 2024

https://github.com/dair-ai/notebooks

🔬 Sharing your data science notebooks with the community has never been this easy.

artificial-intelligence deep-learning machine-learning nlp

Last synced: 01 Aug 2024

https://github.com/ropensci-archive/geoparser

:no_entry: ARCHIVED :no_entry:

geocoding geoparser nlp peer-reviewed r r-package rstats

Last synced: 05 Aug 2024

https://github.com/GermanT5/wikipedia2corpus

Wikipedia text corpus for self-supervised NLP model training

corpus german-nlp machine-learning nlp somajo wikipedia wikipedia-corpus

Last synced: 31 Jul 2024

https://github.com/syzer/sentiment-analyser

ML that can extract german and english sentiment

english german nlp nlp-library node-js nodejs sentiment-analyser sentiment-analysis

Last synced: 01 Aug 2024

https://github.com/ardauzunoglu/rte-speech-generator

Natural Language Processing to generate new speeches for the President of Turkey.

natural-language-processing nlp politics python speech-processing tensorflow turkce turkish turkish-nlp

Last synced: 02 Aug 2024

https://github.com/MiuLab/FlowDelta

FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension

machine-comprehension nlp pytorch question-answering

Last synced: 07 Aug 2024

https://github.com/bastienbot/nlp-js-tools-french

POS Tagger, lemmatizer and stemmer for french language in javascript

lemmatization lemmatizer nlp postagging postgresql stemmer stemming tokenization tokenizer

Last synced: 28 Aug 2024

https://github.com/thisisiron/nmt-attention-tf2

👫 Effective Approaches to Attention-based Neural Machine Translation implemented as Tensorflow 2.0

attention lstm natural-language-processing neural-machine-translation nlp nmt tensorflow tensorflow2 tf2 translation

Last synced: 31 Jul 2024

https://github.com/mchmarny/tsignal

Analyzing social media sentiment and its impact on stock market

analytics golang nasdaq nlp sentiment-analysis twitter

Last synced: 03 Aug 2024

https://github.com/cocoa-ai/NamesCoreMLDemo

🏷 iOS11 demo application for predicting gender from first names.

classification coreml coreml-models gender-classification ios machine-learning nlp swift swift4

Last synced: 09 Aug 2024

https://github.com/aws-solutions/content-localization-on-aws

Automatically generate multi-language subtitles using AWS AI/ML services. Machine generated subtitles can be edited to improve accuracy and downstream tracks will automatically be regenerated based on the edits. Built on Media Insights Engine (https://github.com/awslabs/aws-media-insights-engine)

amazon-comprehend amazon-polly amazon-transcribe amazon-translate audio aws-media-insights-engine captions content-analysis localisation localization media mie nlp nlp-machine-learning speech-to-text subtitles video video-on-demand vod

Last synced: 01 Aug 2024

https://github.com/news-r/gensimr

📝 Topic Modeling for Humans

nlp r rstats topic-modeling

Last synced: 05 Aug 2024

https://github.com/johncmunson/react-taggy

A simple zero-dependency React component for tagging user-defined entities within a block of text.

component entities named-entity-recognition natural-language ner nlp react react-component

Last synced: 28 Aug 2024

https://github.com/rainmaker712/nlp_ryan

Study for Natural Language Processing & Deep Learning Framework

chatbot deep-learning machine-comprehension machine-learning nlp python pytorch scala spark tensorflow

Last synced: 02 Aug 2024

https://github.com/miroozyx/BERT_with_keras

A Keras version of Google's BERT model

bert deep-learning nlp tensorflow

Last synced: 01 Aug 2024

https://github.com/Ermlab/PoLitBert

Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.

nlp polish roberta text-corpus

Last synced: 02 Aug 2024

https://github.com/wri-dssg-omdena/policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers

Last synced: 31 Jul 2024

https://github.com/JackHCC/Chinese-Tokenization

利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】

bert-crf bilstm-crf hmm-viterbi-algorithm ngram nlp tokenization

Last synced: 03 Aug 2024

https://github.com/dzieciou/pystempel

Python port of Stempel, an algorithmic stemmer for Polish language.

nlp

Last synced: 31 Jul 2024

https://github.com/hellohaptik/HINT3

This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020's Insights Workshop https://insights-workshop.github.io/ Preprint for the paper is available here https://arxiv.org/abs/2009.13833

conversational-ai datasets dialogue-systems nlp

Last synced: 03 Aug 2024

https://github.com/mananshah99/sentR

Simple sentiment analysis framework for R

nlp r sentiment-analysis

Last synced: 05 Aug 2024

https://github.com/Furyton/awesome-language-model-analysis

This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papers in this list investigate the learning behavior, generalization ability, and other properties of language models through theoretical analysis, empirical analysis, or a combination of both.

ai analysis analytics awesome chatgpt deep-learning generative-ai large-language-models llm nlp theory transformers

Last synced: 19 Sep 2024

https://github.com/Smat26/Roman-Urdu-Dataset

Compilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources

data-science dataset hindi hindi-language natural-language-processing nlp urdu urdu-language urdu-nlp

Last synced: 04 Aug 2024

https://github.com/pooya-mohammadi/persian-spell-checker-kenlm

A complete instruction for training a Persian spell checker and a language model based on SymSpell and KenLM, respectively using Wikipedia dataset.

bash kenlm language-model nlp persian python spellcheck spellchecker symspell

Last synced: 04 Aug 2024

https://github.com/benjaminvdb/DBRD

110k Dutch Book Reviews Dataset for Sentiment Analysis

dataset dataset-creation dutch nlp nlp-machine-learning python python3 scraped-data scraper

Last synced: 03 Aug 2024

https://github.com/stevenay/myan-word-breaker

Myanmar Word Segmentation Tool

burmese nlp word-segmentation

Last synced: 30 Jul 2024

https://github.com/Qznan/QizNLP

Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)

beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow

Last synced: 03 Aug 2024

https://github.com/sedthh/lara-hungarian-nlp

NLP class for rapid ChatBot development in Hungarian language

chatbot hungarian hungarian-language lemmatizer nlp python3 stemmer

Last synced: 03 Aug 2024

https://github.com/andreaferretti/charade

A server for multilanguage, composable NLP API in Python

nlp nlp-apis python

Last synced: 05 Aug 2024

https://github.com/akosbalasko/obsidian-autotagger-plugin

This plugin offers smart tags for notes by performing Named Entity Recognition (NER) on the content

natural-language-processing nlp obsidian-md obsidian-plugin

Last synced: 07 Aug 2024

https://github.com/eimg/myanmar-text-breaker

Syllable and word, breaker/boundary-segmentation for Myanmar text in JavaScript

javascript nlp

Last synced: 30 Jul 2024

https://github.com/eimg/burmese-text-classifier

A neural network based text classification system for Burmese

deep-learning javascript nlp

Last synced: 30 Jul 2024

https://github.com/fredriko/bert-tensorflow-pytorch-spacy-conversion

Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.

bert bert-model how-to keras nlp pytorch-transformers spacy spacy-models spacy-nlp spacy-package spacy-pytorch-transformers tensorflow

Last synced: 07 Aug 2024

https://github.com/swanhtet1992/ReSegment

Burmese (Myanmar) syllable level segmentation with regex.

burmese-nlp myanmar-nlp myanmar-text nlp segmentation

Last synced: 30 Jul 2024

https://github.com/agatan/yoin

A Japanese Morphological Analyzer written in pure Rust

japanese nlp rust

Last synced: 01 Aug 2024

https://github.com/yasinkuyu/Turkish.cs

Turkish Suffix Library for C# & .NET- Türkçe Çekim ve Yapım Ekleri

c-sharp nlp stem vowel

Last synced: 02 Aug 2024

https://github.com/luoyuanlab/text_gcn_tutorial

A tutorial & minimal example (8min on CPU) for Graph Convolutional Networks for Text Classification. AAAI 2019

deep-learning graph-convolutional-networks nlp text-classification

Last synced: 01 Aug 2024

https://github.com/warpy-ai/tgs

Terminal Generative Shell

ai bash nlp shell t5-small terminal

Last synced: 13 Aug 2024

https://github.com/Praful932/llmsearch

Find better generation parameters for your LLM

llm llm-evaluation llm-inference nlp

Last synced: 01 Aug 2024

https://github.com/shrebox/Personified-Chatbot

A personified chatbot responding to a query based on the answering pattern of Dr. APJ Abdul Kalam using Information Retrieval, Natural Language Processing, and Deep Learning techniques.

apj-abdul-kalam chatbot deep-learning information-retrieval lstm natural-language-processing nlp ranking-algorithm seq2seq-chatbot seq2seq-model summarization word2vec

Last synced: 02 Aug 2024

https://github.com/quickgrid/AI-Resources

Research Paper Summaries, Setup & Performance Notes, Resource Links on AI, Deep Learning, NLP, Computer Vision for my learning.

ai ai-notes ai-research blender computer-vision deep-learning nlp paper-summaries papers research-paper research-paper-summaries

Last synced: 01 Aug 2024

https://github.com/quickgrid/ai-resources

Research Paper Summaries, Setup & Performance Notes, Resource Links on AI, Deep Learning, NLP, Computer Vision for my learning.

ai ai-notes ai-research blender computer-vision deep-learning nlp paper-summaries papers research-paper research-paper-summaries

Last synced: 07 Aug 2024

https://github.com/davidsvy/Neural-Scam-Artist

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

dataset deduplication fine-tuning fraud gpt2 huggingface lsh minhash nlp pytorch readability scam transformer web-scraping

Last synced: 05 Aug 2024

https://github.com/PhilipMay/stsb-multi-mt

Machine translated multilingual STS benchmark dataset.

dataset multilingual nlp

Last synced: 03 Aug 2024

https://github.com/talkdai/dialog

Humanized Conversation API (using LLM)

chatgpt langchain llm nlp nltk

Last synced: 02 Aug 2024

https://github.com/AmrHendy/programming-language-translator

An easy way to use the released TransCoder by Facebook AI Research to convert code from one programming language to another using unsupervised neural machine translation (NMT) systems that use deep-learning to translate text from one natural language to another and is trained only on monolingual source data.

machine-translation nlp programming-language transcoder transformer unsupervised-deep-learning unsupervised-translation

Last synced: 06 Aug 2024

https://github.com/KGCP/MEL-TNNT/

Metadata Extractor & Loader (MEL) ■ The NLP-NER Toolkit (TNNT)

metadata-extraction named-entity-recognition natural-language-processing nlp nlp-ner pipeline

Last synced: 14 Aug 2024

https://github.com/undertheseanlp/slp3-vietnamese

Speech and Language Processing 3rd edition Vietnamese Translation

book-translation nlp vietnamese-nlp

Last synced: 01 Aug 2024

https://github.com/jawahar273/practNLPTools-lite

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), semantic role labeling (SRL) and syntactic parsing (PSG) with skip-gram all in Python and still more features will be added. The website give is for downlarding Senna tool

nlp practnlptools3 senna senna-nlp

Last synced: 07 Aug 2024

https://github.com/hscspring/ALL4AI

AI Related Tools/Projects

ai jupyter linux machine-learning nlp python ssh toolbox

Last synced: 01 Aug 2024

https://github.com/kampersanda/tongrams-rs

Rust library providing fast language model queries in compressed space

compression elias-fano language-model ngrams nlp trie

Last synced: 02 Aug 2024

https://github.com/senthilchandrasegaran/textplorer

Visual analytics application for qualitative text analysis

nlp text-visualization visual-analytics

Last synced: 31 Jul 2024

https://github.com/amsqr/NaiveSumm

NaiveSumm is a naive summarization approach based on Luhn1958 work "The Automatic Creation of Literature Abstracts" It uses the frequencies of words in the document in order to calculate and extract the sentences that include the most frequent words.

natural-language-processing nlp python summarization

Last synced: 31 Jul 2024

https://github.com/KGCP/MEL-TNNT

Metadata Extractor & Loader (MEL) ■ The NLP-NER Toolkit (TNNT)

metadata-extraction named-entity-recognition natural-language-processing nlp nlp-ner pipeline

Last synced: 03 Aug 2024

https://github.com/korpling/pepper

A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.

annotations converter format java linguistic-formats linguistics nlp pepper

Last synced: 03 Aug 2024

https://github.com/yhy1117/x-mixup

Implementation of ICLR 2022 paper "Enhancing Cross-lingual Transfer by Manifold Mixup".

cross-lingual-transfer manifold-mixup nlp

Last synced: 28 Aug 2024

https://github.com/janekb04/py2gpt

Convert Python code into JSON consumable by OpenAI's function API.

ai api chatgpt converter function gpt gpt-4 json nlp openai openai-api python schema transcoding

Last synced: 01 Aug 2024