Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/messense/jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

chinese-word-segmentation jieba jieba-chinese nlp wasm

Last synced: 02 Aug 2024

https://github.com/inspirehep/magpie

Deep neural network framework for multi-label text classification

classification deep-learning machine-learning multi-label-classification neural-network nlp prediction word2vec

Last synced: 04 Aug 2024

https://github.com/koursaros-ai/nboost

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)

cloud deep-learning docker elasticsearch helm kubernetes machine-learning microservices nboost nlp proxy python pytorch search-api search-engine semantic-search tensorflow

Last synced: 01 Aug 2024

https://github.com/mila-iqia/babyai

BabyAI platform. A testbed for training agents to understand and execute language commands.

imitation-learning nlp nlp-machine-learning openai-gym reinforcement-learning-environments

Last synced: 07 Aug 2024

https://github.com/thunlp/OpenAttack

An Open-Source Package for Textual Adversarial Attack.

adversarial-attacks adversarial-example natural-language-processing nlp pytorch

Last synced: 01 Aug 2024

https://github.com/cbaziotis/ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

nlp nlp-library semeval spell-corrector spelling-correction text-processing text-segmentation tokenization tokenizer word-normalization word-segmentation

Last synced: 01 Aug 2024

https://github.com/dgarnitz/vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.

ai data-engineering embeddings machine-learning nlp vectors

Last synced: 03 Sep 2024

https://github.com/polyrabbit/WeCron

:heavy_check_mark: 微信上的定时提醒 - Cron on WeChat

angular cron crontab ionic nlp postgresql python timer wechat weixin

Last synced: 31 Jul 2024

https://github.com/tomaarsen/attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

llm llms nlp python transformers

Last synced: 31 Jul 2024

https://github.com/abadojack/whatlanggo

Natural language detection library for Go

go language nlp text-processing

Last synced: 30 Jul 2024

https://github.com/wyounas/homer

Homer, a text analyser in Python, can help make your text more clear, simple and useful for your readers.

nlp nlp-library python python-library python-script python3 text-analysis

Last synced: 01 Aug 2024

https://github.com/ICLRandD/Blackstone

:black_circle: A spaCy pipeline and model for NLP on unstructured legal text.

caselaw law legaltech nlp spacy-models

Last synced: 02 Aug 2024

https://github.com/HIT-SCIR/Chinese-Mixtral-8x7B

中文Mixtral-8x7B(Chinese-Mixtral-8x7B)

large-language-models llm mixtral-8x7b nlp

Last synced: 31 Jul 2024

https://github.com/akoumjian/datefinder

Find dates inside text using Python and get back datetime objects

datetime nlp parser

Last synced: 02 Aug 2024

https://github.com/smoothnlp/SmoothNLP

专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference

depedency-parsing nlp nlp-pipeline postagging python tokenizer

Last synced: 03 Aug 2024

https://github.com/huggingface/dataset-viewer

Lightweight web API for visualizing and exploring any dataset - computer vision, speech, text, and tabular - stored on the Hugging Face Hub

api-rest data datasets huggingface machine-learning nlp

Last synced: 09 Aug 2024

https://github.com/ymcui/MacBERT

Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)

bert language-model macbert nlp pytorch tensorflow transformers

Last synced: 03 Aug 2024

https://github.com/michaelthwan/searchGPT

Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

ai chatgpt grounded-api grounded-bot language-model llm machine-learning nlp nlp-machine-learning openai python retrieval retrieval-model

Last synced: 02 Aug 2024

https://github.com/gutfeeling/word_forms

Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

adjective adverb dictionary lemmatizer natural-language-processing nlp noun parts-of-speech stemmer verb-conjugations wordnet words

Last synced: 31 Jul 2024

https://github.com/BlackSamorez/tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference

deep-learning machine-learning natural-language-processing nlp python pytorch pytorch-transformers

Last synced: 09 Aug 2024

https://github.com/samtecspg/articulate

A platform for building conversational interfaces with intelligent agents (chatbots)

chatbot nlp nlu react

Last synced: 31 Jul 2024

https://github.com/michaelthwan/searchgpt

Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

ai chatgpt grounded-api grounded-bot language-model llm machine-learning nlp nlp-machine-learning openai python retrieval retrieval-model

Last synced: 02 Aug 2024

https://github.com/junruxiong/incarnamind

Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMs

ai chatbot generative-ai gpt langchain llm nlp openai pdf

Last synced: 02 Aug 2024

https://github.com/mit-han-lab/lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

nlp pytorch transformer

Last synced: 03 Aug 2024

https://github.com/princeton-nlp/DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624

information-retrieval knowledge-base nlp open-domain-qa passage-retrieval slot-filling

Last synced: 01 Aug 2024

https://github.com/bminixhofer/nlprule

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

grammar grammatical-error-correction machine-learning natural-language-processing nlp proofreading rust spellcheck style-checker

Last synced: 31 Jul 2024

https://github.com/HKUST-KnowComp/R-Net

Tensorflow Implementation of R-Net

machine-comprehension nlp r-net squad tensorflow

Last synced: 07 Aug 2024

https://github.com/jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

chinese finance large-language-models llama nlp qa rlhf sft text-generation transformers

Last synced: 01 Aug 2024

https://github.com/graykode/xlnet-Pytorch

Simple XLNet implementation with Pytorch Wrapper

bert natural-language-processing nlp pytorch xlnet xlnet-pytorch

Last synced: 01 Aug 2024

https://github.com/mozilla/firefox-translations

Firefox Translations is a webextension that enables client side translations for web browsers.

deep-neural-networks firefox javascript nlp nmt translation webextension

Last synced: 31 Jul 2024

https://github.com/Liquid-Legal-Institute/Legal-Text-Analytics

A list of selected resources, methods, and tools dedicated to Legal Text Analytics.

legal legal-text-analytics nlp

Last synced: 01 Aug 2024

https://github.com/rinnakk/japanese-pretrained-models

Code for producing Japanese pretrained models provided by rinna Co., Ltd.

gpt2 japanese nlp roberta

Last synced: 01 Aug 2024

https://ucinlp.github.io/autoprompt/

AutoPrompt: Automatic Prompt Construction for Masked Language Models.

evaluation language-model nlp

Last synced: 04 Aug 2024

https://github.com/ucinlp/autoprompt

AutoPrompt: Automatic Prompt Construction for Masked Language Models.

evaluation language-model nlp

Last synced: 04 Aug 2024

https://github.com/zhang17173/Event-Extraction

基于法律裁判文书的事件抽取及其应用,包括数据的分词、词性标注、命名实体识别、事件要素抽取和判决结果预测等内容

cnn-classification deep-learning event-extraction judgment nlp word2vec

Last synced: 06 Aug 2024

https://github.com/dmitrizzle/chat-bubble

Simple chatbot UI for the Web with JSON scripting 👋🤖🤙

bot bot-framework chat-bots chatbot chatbot-ui javascript natural-language-classifiers nlp

Last synced: 04 Aug 2024

https://github.com/yassouali/ML-paper-notes

:notebook: Notes and summaries of various ML, Computer Vision & NLP papers.

computer-vision deep-learning machine-learning natural-language-processing nlp summary

Last synced: 01 Aug 2024

https://github.com/tasdikrahman/vocabulary

[Not Maintained anymore] Python Module to get Meanings, Synonyms and what not for a given word

antonym api dictionary glosbe nlp pronunciation python synonyms wordnik

Last synced: 31 Jul 2024

https://github.com/google-research/bigbird

Transformers for Longer Sequences

bert deep-learning longer-sequences nlp transformer

Last synced: 03 Aug 2024

https://github.com/jerryji1993/DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

deep-learning dnabert-model genome gpu kmer kmer-format machine-learning natural-language-processing nlp sequence

Last synced: 08 Aug 2024

https://github.com/shuaihuaiyi/QA

使用深度学习算法实现的中文问答系统

lstm nlp

Last synced: 04 Aug 2024

https://github.com/ymcui/Chinese-Mixtral

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

32k 64k large-language-models llm mixtral mixture-of-experts moe nlp

Last synced: 31 Jul 2024

https://github.com/web-arena-x/webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

agent nlp

Last synced: 03 Aug 2024

https://github.com/ChenghaoMou/text-dedup

All-in-one text de-duplication

data-processing de-duplication nlp text-processing

Last synced: 01 Aug 2024

https://github.com/voidful/textrl

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf

Last synced: 02 Aug 2024

https://github.com/voidful/TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf

Last synced: 31 Jul 2024

https://github.com/pysentimiento/pysentimiento

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

nlp sentiment-analysis transformers

Last synced: 31 Jul 2024

https://github.com/neilgupta/Sherlock

Natural-language event parser for Javascript

datetime event-parser javascript natural-language-processing nlp regex

Last synced: 03 Aug 2024

https://github.com/udibr/headlines

Automatically generate headlines to short articles

generation keras nlp rnn summarization

Last synced: 07 Aug 2024

https://github.com/TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

large-multimodal-models llama llava nlp tinyllama transformers vision-language

Last synced: 02 Aug 2024

https://github.com/OpenLemur/Lemur

[ICLR 2024] Lemur: Open Foundation Models for Language Agents

code-generation language-model machine-learning natural-language-processing nlp text-reasoning

Last synced: 03 Aug 2024

https://github.com/Shark-NLP/OpenICL

OpenICL is an open-source framework to facilitate research, development, and prototyping of in-context learning.

in-context-learning language-model nlp

Last synced: 03 Aug 2024

https://github.com/Teamlinker/Teamlinker

Teamlinker is a team collaboration platform that integrates multi-functional modules. Users can process tasks in parallel, including six functional modules: project, wiki, calendar, meeting, chat and network disk, achieving seamless integration and improving team collaboration efficiency.

arco-design artificial-intelligence calendar chat confluence cooperation documentation javascript mediasoup meeting nlp nodejs project-management teamwork typescript video-conferencing vue webos wiki workflow

Last synced: 01 Aug 2024

https://github.com/dair-ai/pytorch_notebooks

🔥 A collection of PyTorch notebooks for learning and practicing deep learning

deep-learning machine-learning nlp pytorch

Last synced: 03 Sep 2024

https://github.com/junruxiong/IncarnaMind

Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMs

ai chatbot generative-ai gpt langchain llm nlp openai pdf

Last synced: 01 Aug 2024

https://github.com/airalcorn2/Deep-Semantic-Similarity-Model

My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.

deep-learning information-retrieval keras natural-language-processing nlp

Last synced: 07 Aug 2024

https://github.com/neuml/codequestion

🔎 Semantic search for developers

machine-learning nlp python search txtai

Last synced: 31 Jul 2024

https://github.com/allenai/allennlp-models

Officially supported AllenNLP models

allennlp nlp pytorch

Last synced: 01 Aug 2024

https://github.com/CornellNLP/ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.

computational-social-science conversational-ai conversational-analysis conversations dataset dialogs machine-learning nlp toolkit

Last synced: 30 Jul 2024

https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.

computational-social-science conversational-ai conversational-analysis conversations dataset dialogs machine-learning nlp toolkit

Last synced: 05 Aug 2024

https://github.com/subho406/OmniNet

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

artificial-intelligence deep-learning image-captioning machine-learning multimodal-learning multitask-learning neural-network nlp transformer video-recognition

Last synced: 07 Aug 2024

https://github.com/ukairia777/tensorflow-nlp-tutorial

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

bert bert-ner dpo huggingface keras-tutorial llama llm lora named-entity-recognition natural-language-processing nlp nlp-tutorial question-answering sft tensorflow trainer transformers

Last synced: 02 Aug 2024

https://github.com/allenai/tango

Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.

ai machine-learning nlp python python3 pytorch

Last synced: 02 Aug 2024

https://github.com/synyi/poplar

A web-based annotation tool for natural language processing (NLP)

annotation nlp svg

Last synced: 31 Jul 2024

https://github.com/fhamborg/Giveme5W1H

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

5w 5w1h answering event-detection event-extraction fivew fivewoneh news news-articles nlp nlp-library question question-answering text-analysis

Last synced: 31 Jul 2024

https://github.com/princeton-nlp/LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

efficiency llama llama2 llm nlp pre-training pruning

Last synced: 01 Aug 2024

https://github.com/medspacy/medspacy

Library for clinical NLP with spaCy.

clinical-nlp medspacy nlp nlp-library pipeline spacy

Last synced: 04 Aug 2024

https://github.com/salesforce/matchbox

Write PyTorch code at the level of individual examples, then run it efficiently on minibatches.

deep-learning minibatch nlp pytorch

Last synced: 03 Aug 2024