Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/cdpierse/script_buddy_v2

Script Buddy v2 is a film script text generation tool built using film scripts from the world's most popular film scripts and GPT2.

artificial-intelligence gpt-2 language-generation machine-learning nlp pytorch transformers

Last synced: 08 Nov 2024

https://github.com/lexiestleszek/namegen

Self-contained, minimalistic implementation of a language model that generates coherent and normal sounding names. It uses an input dataset of names and probability distribution to generate new names based on the sequences of four characters.

language-model machine-learning markov-chain name-generation natural-language-processing nlp

Last synced: 14 Nov 2024

https://github.com/LanguageMachines/PICCL

A set of workflows for corpus building through OCR, post-correction and normalisation

computational-linguistics corpus-linguistics corpus-tools folia nlp ocr workflow

Last synced: 03 Nov 2024

https://github.com/welding-torch/excel-anonymizer

A Python script that anonymizes an Excel file and synthesizes new data in its place.

data-science microsoft nlp pandas presidio privacy

Last synced: 07 Nov 2024

https://github.com/khakhulin/compressed-transformer

Compression of NMT transformer model with tensor methods

compression deep-learning mnist nlp nmt pytorch tensor-train transformer translation tucker

Last synced: 19 Nov 2024

https://github.com/kaleidophon/nlp-uncertainty-zoo

Model zoo for different kinds of uncertainty quantification methods used in Natural Language Processing, implemented in PyTorch.

deep-learning lstm nlp nlp-machine-learning package python pytorch rnn transformers uncertainty-estimation uncertainty-neural-networks uncertainty-quantification

Last synced: 10 Oct 2024

https://github.com/gunale0926/sorsa

SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

deep-learning fine-tuning llama lora machine-learning nlp peft python pytorch rwkv sorsa svd transformer

Last synced: 17 Nov 2024

https://github.com/ChenghaoMou/pytorch-pQRNN

Implementation of pQRNN in PyTorch

nlp pqrnn pytorch text-classification

Last synced: 16 Nov 2024

https://github.com/kootenpv/spacy_api

Server/Client around Spacy to load spacy only once

api machine-learning nlp spacy

Last synced: 14 Oct 2024

https://github.com/kennethenevoldsen/spacy-wrap

spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.

deep-learning huggingface huggingface-transformers language-model machine-learning natural-language-processing nlp pytorch spacy spacy-extension spacy-extensions spacy-models spacy-nlp spacy-pipeline spacy-transformers text-classification transformers

Last synced: 12 Oct 2024

https://github.com/skoltech-nlp/rudetoxifier

Code and data of "Methods for Detoxification of Texts for the Russian Language" paper

nlp russian-language style-transfer

Last synced: 08 Nov 2024

https://github.com/hsm207/bert_attn_viz

Visualize BERT's self-attention layers on text classification tasks

attention bert explainable-ai nlp tensorflow

Last synced: 28 Oct 2024

https://github.com/onesuper/HuggingFace-Datasets-Text-Quality-Analysis

Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas

dataset huggingface-datasets llm machine-learning nlp streamlit text-processing

Last synced: 09 Aug 2024

https://github.com/nlpcloud/nlpcloud-js

NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and much more...

ad-generator chatbot code-generation conversational-ai embeddings intent-classification keywords-extraction language-detection machine-translation ner nlp paraphrasing question-answering semantic-similarity sentiment-analysis text-classification text-generation text-summarization tokenization

Last synced: 07 Nov 2024

https://github.com/ailln/nlp-roadmap

🗺️ 一个自然语言处理的学习路线图

natural-language-processing nlp roadmap sequence-labeling word-embedding word-segmentation

Last synced: 18 Nov 2024

https://github.com/obss/trapper

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

allennlp deep-learning natural-language-processing nlp python pytorch pytorch-transformers transformer transformers

Last synced: 27 Oct 2024

https://github.com/christabor/namebot

A company/project name generator for Python. Uses NLTK and diverse techniques derived from existing corporate etymologies and naming agencies for sophisticated word generation and ideation.

language name-generation naming naming-agencies nlp nltk

Last synced: 07 Nov 2024

https://github.com/s-nlp/rudetoxifier

Code and data of "Methods for Detoxification of Texts for the Russian Language" paper

nlp russian-language style-transfer

Last synced: 07 Aug 2024

https://github.com/deshwalmahesh/phudge

Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the available tool, methods, repo, code etc to detect hallucination, LLM evaluation, grading and much more.

ai custom-dataset evaluation feedback-collection finetuning hallucination hallucination-detection judge llm llm-evaluation ml nlp phi-3 pytorch sota

Last synced: 18 Nov 2024

https://github.com/teticio/llama-squad

Train Llama 2 & 3 on the SQuAD v2 task as an example of how to specialize a generalized (foundation) model.

decoder fine-tuning llama2 llama3 nlp question-answering squad

Last synced: 10 Oct 2024

https://github.com/natasha/naeval

Comparing quality and performance of NLP systems for Russian language

evaluation nlp performance-analysis python russian

Last synced: 10 Nov 2024

https://github.com/tuanacelik/should-i-follow

🦄 An NLP application just for the lols: built with Haystack to get an overview of what a user is posting about on Twitter

haystack llm nlp twitter

Last synced: 22 Oct 2024

https://github.com/chatopera/chatopera.feishu

通过 Feishu 开放平台和 Chatopera 机器人平台上线智能对话机器人服务, 聊天机器人,飞书,lark

ai bot chatbot chatopera dialog feishu lark machine-learning nlp nlu python python3

Last synced: 30 Oct 2024

https://github.com/undertheseanlp/ner

Vietnamese Named Entity Recognition

named-entity-recognition nlp

Last synced: 11 Nov 2024

https://github.com/kavgan/word_cloud

Python word cloud library for use within Jupyter notebook and Python apps.

cloud-library jupyter-notebook nlp python visualization word-cloud wordcloud

Last synced: 30 Oct 2024

https://github.com/yoshoku/suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby

morphological-analysis nlp postagger ruby tokenizer

Last synced: 10 Nov 2024

https://github.com/stas00/porting

Helper scripts and notes that were used while porting various nlp models

nlp porting python

Last synced: 22 Oct 2024

https://github.com/xavidop/dialogflow-cx-cli

The missing Dialogflow CX CLI to interact with your projects

cli cxcli dialogflow dialogflow-cx dialogflowcx golang nlp nlu test-automation testing-tools

Last synced: 15 Nov 2024

https://github.com/explosion/assets

💥 Explosion Assets

machine-learning nlp spacy spacy-nlp

Last synced: 07 Oct 2024

https://github.com/aphp/eds-pseudo

EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports

edsnlp nlp pseudonymisation

Last synced: 03 Sep 2024

https://github.com/edwardcooper/piidetect

A package to build an end-to-end pipeline for detecting personally identifiable information from text.

nlp pii pii-detection word2vec

Last synced: 11 Nov 2024

https://github.com/Lipairui/textgo

Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!

bert nlp text-classification text-preprocessing text-representation text-search text-similarity

Last synced: 07 Aug 2024

https://github.com/coosto/dutch-word-embeddings

Dutch word embeddings, trained on a large collection of Dutch social media messages and news/blog/forum posts.

coosto dutch nlp word2vec word2vec-model wordembeddings

Last synced: 17 Nov 2024

https://github.com/jaykef/avachat

AvaChat - is a realtime AI chat demo with animated talking heads - it uses Large Language Models (GPT, API2D GPT4, Cluade) as text inputs to D-ID's image-to-video talking head model (via D-ID stream api)

avatar llms nlp

Last synced: 29 Oct 2024

https://github.com/sciss/ws4j

WordNet Similarity for Java provides an API for several Semantic Relatedness/Similarity algorithms. Mirror of https://codeberg.org/sciss/ws4j

nlp wordnet

Last synced: 09 Nov 2024

https://github.com/dongjunlee/dmn-tensorflow

TensorFlow implementation of 'Ask Me Anything: Dynamic Memory Networks for Natural Language Processing (2015)'

babi-tasks dynamic-memory-network hb-experiment natural-language-processing nlp question-answering tensorflow

Last synced: 08 Nov 2024

https://github.com/kinosal/cowriter

Write 10x faster using OpenAI's GPT-3 based Davinci model to autocomplete your text

ai gpt machine-learning nlp

Last synced: 27 Oct 2024

https://github.com/osu-nlp-group/amplegcg

AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM

adversarial-attacks gcg nlp safety

Last synced: 11 Nov 2024

https://github.com/prihoda/golem

Open-source chatbot framework for python developers. Batteries included 🔋🔋

bot chatbot dialog-management messenger nlp python telegram witai

Last synced: 16 Nov 2024

https://github.com/OpenSextant/Xponents

Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.

document-conversion geocoding geonames geoparsing geotagging information-extraction nlp solr tika

Last synced: 05 Nov 2024

https://github.com/explosion/spacy-huggingface-hub

🤗 Push your spaCy pipelines to the Hugging Face Hub

huggingface machine-learning ml-models models natural-language-processing nlp spacy

Last synced: 07 Oct 2024

https://github.com/salesforce/query-focused-sum

Official code repository for "Exploring Neural Models for Query-Focused Summarization".

deep-learning machine-learning neural-network nlp question-answering summarization

Last synced: 08 Nov 2024

https://github.com/skroutz/turkish_stemmer

A simple Turkish stemming library

nlp stemmer

Last synced: 11 Nov 2024

https://github.com/kenlimmj/rouge

A Javascript implementation of the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation metric for summaries.

bootstrapping-statistics evaluation-metric jackknifing nlp rouge summarization

Last synced: 10 Nov 2024

https://github.com/argosopentech/metaltranslate

Customizable machine translation in C++

machine-learning nlp nlp-machine-learning translation

Last synced: 08 Nov 2024

https://github.com/yasinkuyu/turkish.js

Turkish Suffix Library for Javascript - Türkçe Çekim ve Yapım Ekleri

javascript nlp stem vowel

Last synced: 06 Nov 2024

https://github.com/saareliad/FTPipe

FTPipe and related pipeline model parallelism research.

deep-neural-networks distributed-training fine-tuning nlp pipeline-parallelism t5

Last synced: 07 Nov 2024

https://github.com/yuvalpinter/m3gm

Max-Margin Markov Graph Models for WordNet (EMNLP 2018)

markov-model nlp relation-extraction semantics wordnet

Last synced: 27 Oct 2024

https://github.com/KehaoWu/Jinyong-Corpus

金庸15部小说字典

corpus-data nlp

Last synced: 07 Nov 2024

https://github.com/zamgi/lingvo--ner-ru

Named entity recognition (NER) in Russian texts / Определение именованных сущностей (NER) в тексте на русском языке

linguistics lingvo named-entity-recognition natural-language-processing ner nlp nlp-machine-learning

Last synced: 05 Nov 2024

https://github.com/ecohealthalliance/epitator

EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and EIDR Connect.

disease-surveillance epidemiology geonames nlp spacy toponym-resolution

Last synced: 14 Oct 2024

https://github.com/lunarwhite/covid-social-analysis

Apply ML on weibo sentiment. 疫情背景下微博文本情感分析与可视化

crawling data-analysis machine-learning nlp python vizualization

Last synced: 06 Nov 2024

https://github.com/tokestermw/spacy_grammar

:black_nib: Language Tool style grammar handling with spaCy 2.0

grammar nlp spacy spacy-nlp

Last synced: 07 Nov 2024

https://github.com/kehaowu/jinyong-corpus

金庸15部小说字典

corpus-data nlp

Last synced: 13 Oct 2024

https://github.com/IndexFziQ/KMRC-Papers

A list of recent papers regarding knowledge-based machine reading comprehension.

knowledge knowledge-base machine-reading-comprehension nlp paper reading-comprehension

Last synced: 13 Nov 2024

https://github.com/greenelab/pubtator

Retrieve and process PubTator annotations

data nlp pubmed pubtator snorkel text-mining tool

Last synced: 13 Nov 2024

https://github.com/rangilyu/llama.mmengine

Training LLaMA language model with MMEngine! It supports LoRA fine-tuning!

alpaca fine-tuning language-model llama lora nlp

Last synced: 22 Oct 2024

https://github.com/dpressel/textrank-js

TextRank algorithm implementation in Javascript

nlp textrank

Last synced: 28 Oct 2024

https://github.com/danieldeutsch/repro

Repro is a library for easily running code from published papers via Docker.

docker machine-learning nlp reproducibility reproducible-research

Last synced: 06 Nov 2024

https://github.com/jina-ai/example-multimodal-fashion-search

Input text or image, get back matching image fashion results, using Jina, DocArray, and CLIP

computer-vision deep-learning neural-search nlp python

Last synced: 16 Nov 2024

https://github.com/sanghviharshit/pocket-tagger

📖👓🏷Tag your getpocket.com articles automatically using natural language processing

articles getpocket google-cloud natural-language-processing nlp pocket scraper tag

Last synced: 30 Oct 2024

https://github.com/applenob/simple_crf

simple Conditional Random Field implementation in Python

crf nlp python3

Last synced: 07 Nov 2024

https://github.com/perone/feste

Feste is a free and open-source framework allowing scalable composition of NLP tasks using a graph execution model that is optimized and executed by specialized schedulers.

deep-learning language-model machine-learning nlp

Last synced: 28 Oct 2024

https://github.com/machinelearningzh/simply-simplify-language

Use machine learning to make your institutional communication more understandable and inclusive.

anthropic einfachesprache leichtesprache llm llms mistral mistralai natural-language-processing nlp openai plainlanguage python spacy streamlit

Last synced: 14 Oct 2024

https://github.com/nschneid/arabic-tagger

AQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training

arabic arabic-language arabic-nlp arabic-wikipedia java named-entities nlp nlp-machine-learning sequence-tagger tagger

Last synced: 08 Nov 2024

https://github.com/tmalsburg/txl.el

Emacs extension providing direct access to DeepL's machine translation API.

emacs language language-technology machine-translation nlp

Last synced: 27 Oct 2024

https://github.com/bentoml/transformers-nlp-service

Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more

llm llmops mlops model-deployment model-inference-service model-serving nlp nlp-machine-learning online-inference transformer

Last synced: 13 Nov 2024

https://github.com/megagonlabs/t5-japanese

Codes to pre-train Japanese T5 models

natural-language-processing nlp t5 transformer

Last synced: 10 Nov 2024

https://github.com/shibing624/text-feature

文本特征提取,适用于小说,论文,议论文等文本,提取词语、句子、依存关系等特征。python开发。

article feature nlp textrank

Last synced: 22 Oct 2024

https://github.com/palewire/storysniffer

Inspect a URL and estimate if it contains a news story

data-journalism journalism jupyter-notebook machine-learning news nlp python scikit-learn

Last synced: 11 Oct 2024

https://github.com/sfischer13/python-arpa

:snake: Python library for n-gram models in ARPA format

arpa computational-linguistics language-model library lm nlp python python-3

Last synced: 01 Nov 2024

https://github.com/microsoft/vistalk

A JavaScript toolkit for Natural Language-based Visualization Authoring

nlp nx reactjs tensorflowjs transformer vega vega-lite visualization

Last synced: 07 Oct 2024