Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/princeton-nlp/DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624

information-retrieval knowledge-base nlp open-domain-qa passage-retrieval slot-filling

Last synced: 03 Nov 2024

https://github.com/samtecspg/articulate

A platform for building conversational interfaces with intelligent agents (chatbots)

chatbot nlp nlu react

Last synced: 29 Oct 2024

https://github.com/mit-han-lab/lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

nlp pytorch transformer

Last synced: 03 Aug 2024

https://github.com/bminixhofer/nlprule

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

grammar grammatical-error-correction machine-learning natural-language-processing nlp proofreading rust spellcheck style-checker

Last synced: 28 Oct 2024

https://github.com/ChenghaoMou/text-dedup

All-in-one text de-duplication

data-processing de-duplication nlp text-processing

Last synced: 04 Nov 2024

https://github.com/timbmg/sentence-vae

PyTorch Re-Implementation of "Generating Sentences from a Continuous Space" by Bowman et al 2015 https://arxiv.org/abs/1511.06349

deep-learning generative-model neural-network nlp ptb pytorch vae

Last synced: 30 Oct 2024

https://github.com/titipata/pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset

article doi medline-xml nlp parse parser pmid pubmed-central pubmed-parser python xml

Last synced: 30 Oct 2024

https://github.com/ymcui/Chinese-Mixtral

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

32k 64k large-language-models llm mixtral mixture-of-experts moe nlp

Last synced: 29 Oct 2024

https://github.com/jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

chinese finance large-language-models llama nlp qa rlhf sft text-generation transformers

Last synced: 02 Nov 2024

https://github.com/HKUST-KnowComp/R-Net

Tensorflow Implementation of R-Net

machine-comprehension nlp r-net squad tensorflow

Last synced: 07 Aug 2024

https://github.com/rinnakk/japanese-pretrained-models

Code for producing Japanese pretrained models provided by rinna Co., Ltd.

gpt2 japanese nlp roberta

Last synced: 06 Nov 2024

https://github.com/graykode/xlnet-pytorch

Simple XLNet implementation with Pytorch Wrapper

bert natural-language-processing nlp pytorch xlnet xlnet-pytorch

Last synced: 30 Oct 2024

https://github.com/graykode/xlnet-Pytorch

Simple XLNet implementation with Pytorch Wrapper

bert natural-language-processing nlp pytorch xlnet xlnet-pytorch

Last synced: 04 Nov 2024

https://github.com/mozilla/firefox-translations

Firefox Translations is a webextension that enables client side translations for web browsers.

deep-neural-networks firefox javascript nlp nmt translation webextension

Last synced: 27 Oct 2024

https://github.com/ymcui/chinese-mixtral

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

32k 64k large-language-models llm mixtral mixture-of-experts moe nlp

Last synced: 28 Oct 2024

https://ucinlp.github.io/autoprompt/

AutoPrompt: Automatic Prompt Construction for Masked Language Models.

evaluation language-model nlp

Last synced: 04 Aug 2024

https://github.com/ucinlp/autoprompt

AutoPrompt: Automatic Prompt Construction for Masked Language Models.

evaluation language-model nlp

Last synced: 04 Aug 2024

https://github.com/zhang17173/Event-Extraction

基于法律裁判文书的事件抽取及其应用,包括数据的分词、词性标注、命名实体识别、事件要素抽取和判决结果预测等内容

cnn-classification deep-learning event-extraction judgment nlp word2vec

Last synced: 06 Aug 2024

https://github.com/dmitrizzle/chat-bubble

Simple chatbot UI for the Web with JSON scripting 👋🤖🤙

bot bot-framework chat-bots chatbot chatbot-ui javascript natural-language-classifiers nlp

Last synced: 04 Aug 2024

https://github.com/google-research/bigbird

Transformers for Longer Sequences

bert deep-learning longer-sequences nlp transformer

Last synced: 10 Nov 2024

https://github.com/yassouali/ML-paper-notes

:notebook: Notes and summaries of various ML, Computer Vision & NLP papers.

computer-vision deep-learning machine-learning natural-language-processing nlp summary

Last synced: 04 Nov 2024

https://github.com/tasdikrahman/vocabulary

[Not Maintained anymore] Python Module to get Meanings, Synonyms and what not for a given word

antonym api dictionary glosbe nlp pronunciation python synonyms wordnik

Last synced: 31 Oct 2024

https://github.com/jerryji1993/DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

deep-learning dnabert-model genome gpu kmer kmer-format machine-learning natural-language-processing nlp sequence

Last synced: 08 Aug 2024

https://github.com/princeton-nlp/LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

efficiency llama llama2 llm nlp pre-training pruning

Last synced: 08 Nov 2024

https://github.com/pysentimiento/pysentimiento

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

nlp sentiment-analysis transformers

Last synced: 27 Oct 2024

https://github.com/shuaihuaiyi/QA

使用深度学习算法实现的中文问答系统

lstm nlp

Last synced: 04 Aug 2024

https://github.com/princeton-nlp/llm-shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

efficiency llama llama2 llm nlp pre-training pruning

Last synced: 10 Oct 2024

https://github.com/Teamlinker/Teamlinker

Teamlinker is a team collaboration platform that integrates multi-functional modules. Users can process tasks in parallel, including six functional modules: project, wiki, calendar, meeting, chat and network disk, achieving seamless integration and improving team collaboration efficiency.

arco-design artificial-intelligence calendar chat confluence cooperation documentation javascript mediasoup meeting nlp nodejs project-management teamwork typescript video-conferencing vue webos wiki workflow

Last synced: 07 Nov 2024

https://github.com/web-arena-x/webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

agent nlp

Last synced: 03 Aug 2024

https://github.com/voidful/textrl

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf

Last synced: 09 Nov 2024

https://github.com/voidful/TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf

Last synced: 31 Oct 2024

https://github.com/ukairia777/tensorflow-nlp-tutorial

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

bert bert-ner dpo huggingface keras-tutorial llama llm lora named-entity-recognition natural-language-processing nlp nlp-tutorial question-answering sft tensorflow trainer transformers

Last synced: 09 Nov 2024

https://github.com/neilgupta/Sherlock

Natural-language event parser for Javascript

datetime event-parser javascript natural-language-processing nlp regex

Last synced: 03 Aug 2024

https://github.com/udibr/headlines

Automatically generate headlines to short articles

generation keras nlp rnn summarization

Last synced: 07 Aug 2024

https://github.com/TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

large-multimodal-models llama llava nlp tinyllama transformers vision-language

Last synced: 02 Aug 2024

https://github.com/neuml/codequestion

🔎 Semantic search for developers

machine-learning nlp python search txtai

Last synced: 28 Oct 2024

https://github.com/medspacy/medspacy

Library for clinical NLP with spaCy.

clinical-nlp medspacy nlp nlp-library pipeline spacy

Last synced: 14 Oct 2024

https://github.com/OpenLemur/Lemur

[ICLR 2024] Lemur: Open Foundation Models for Language Agents

code-generation language-model machine-learning natural-language-processing nlp text-reasoning

Last synced: 03 Aug 2024

https://github.com/Shark-NLP/OpenICL

OpenICL is an open-source framework to facilitate research, development, and prototyping of in-context learning.

in-context-learning language-model nlp

Last synced: 03 Aug 2024

https://github.com/dair-ai/pytorch_notebooks

🔥 A collection of PyTorch notebooks for learning and practicing deep learning

deep-learning machine-learning nlp pytorch

Last synced: 03 Sep 2024

https://github.com/airalcorn2/Deep-Semantic-Similarity-Model

My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.

deep-learning information-retrieval keras natural-language-processing nlp

Last synced: 07 Aug 2024

https://github.com/allenai/allennlp-models

Officially supported AllenNLP models

allennlp nlp pytorch

Last synced: 26 Sep 2024

https://github.com/stanfordnlp/python-stanford-corenlp

Python interface to CoreNLP using a bidirectional server-client interface.

corenlp corenlp-server nlp

Last synced: 08 Nov 2024

https://github.com/CornellNLP/ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.

computational-social-science conversational-ai conversational-analysis conversations dataset dialogs machine-learning nlp toolkit

Last synced: 26 Oct 2024

https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.

computational-social-science conversational-ai conversational-analysis conversations dataset dialogs machine-learning nlp toolkit

Last synced: 05 Aug 2024

https://github.com/subho406/OmniNet

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

artificial-intelligence deep-learning image-captioning machine-learning multimodal-learning multitask-learning neural-network nlp transformer video-recognition

Last synced: 07 Aug 2024

https://github.com/allenai/tango

Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.

ai machine-learning nlp python python3 pytorch

Last synced: 01 Oct 2024

https://github.com/fhamborg/Giveme5W1H

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

5w 5w1h answering event-detection event-extraction fivew fivewoneh news news-articles nlp nlp-library question question-answering text-analysis

Last synced: 28 Oct 2024

https://github.com/synyi/poplar

A web-based annotation tool for natural language processing (NLP)

annotation nlp svg

Last synced: 30 Oct 2024

https://github.com/fhamborg/giveme5w1h

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

5w 5w1h answering event-detection event-extraction fivew fivewoneh news news-articles nlp nlp-library question question-answering text-analysis

Last synced: 11 Oct 2024

https://github.com/chunelfeng/caiss

一款简单好用的 跨平台/多语言的 相似向量/相似词/相似句 高性能检索引擎。欢迎star & fork。Build together! Power another !

ai ann chatbot deep-learning faiss hnsw mrpt nlp search-engine similarity-search

Last synced: 28 Oct 2024

https://github.com/Brokenwind/BertSimilarity

Computing similarity of two sentences with google's BERT algorithm。利用Bert计算句子相似度。语义相似度计算。文本相似度计算。

bert nlp python semantic similarity tensorflow

Last synced: 02 Nov 2024

https://github.com/phantominsights/subreddit-analyzer

A comprehensive Data and Text Mining workflow for submissions and comments from any given public subreddit.

matplotlib nlp pandas python3 seaborn spacy wordcloud

Last synced: 30 Oct 2024

https://github.com/salesforce/matchbox

Write PyTorch code at the level of individual examples, then run it efficiently on minibatches.

deep-learning minibatch nlp pytorch

Last synced: 03 Aug 2024

https://github.com/shibing624/pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类,LR,Xgboost,TextCNN,FastText,TextRNN,BERT等分类模型实现,开箱即用。

bert classification focalloss-pytorch hierarchical machine-learning nlp pytextclassifier python pytorch softmax text-classification text-classifier

Last synced: 11 Oct 2024

https://github.com/phantominsights/mexican-government-report

Text Mining on the 2019 Mexican Government Report, covering from extracting text from a PDF file to plotting the results.

geopandas matplotlib nlp numpy pandas seaborn spacy

Last synced: 30 Oct 2024

https://github.com/dccuchile/beto

BETO - Spanish version of the BERT model

bert bert-model nlp spanish transformers transformers-library

Last synced: 05 Aug 2024

https://github.com/proycon/pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

computational-linguistics evaluation-metrics folia language-modelling library linguistics machine-learning natural-language-processing nlp nlp-library python search-algorithms text-processing

Last synced: 19 Oct 2024

https://github.com/PrithivirajDamodaran/Styleformer

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

active formal-languages informal-sentences nlp passive slang style-transfer text-style text-style-transfer text-style-transfer-benchmark

Last synced: 03 Nov 2024

https://github.com/CogComp/cogcomp-nlp

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.

big-data cogcomp data-mining dependency-parsing lemmatization lemmatizer named-entity-recognition natural-language-processing natural-language-understanding ner nlp parts-of-speech-tagging pos pos-tagging relation-extraction similarity tokenizer transliteration

Last synced: 30 Oct 2024

https://github.com/Beomi/KcBERT

🤗 Pretrained BERT model & WordPiece tokenizer trained on Korean Comments 한국어 댓글로 프리트레이닝한 BERT 모델과 데이터셋

bert bert-model korean-nlp nlp transformers

Last synced: 09 Nov 2024

https://github.com/koaning/whatlies

Toolkit to help understand "what lies" in word embeddings. Also benchmarking!

embeddings nlp visualisations

Last synced: 29 Oct 2024

https://github.com/huggingface/node-question-answering

Fast and production-ready question answering in Node.js

bert nlp nodejs question-answering tensorflow transformers typescript

Last synced: 30 Oct 2024

https://github.com/ematvey/hierarchical-attention-networks

Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is currently unmaintained, issues will probably not be addressed.

deep-learning document-classification hierarchical-attention-networks machine-learning nlp tensorflow

Last synced: 06 Nov 2024

https://github.com/LingDong-/cope

A modern IDE for writing classical Chinese poetry 格律诗编辑程序

bag-of-words chinese chinese-poetry editor electron ide nlp poetry

Last synced: 01 Nov 2024

https://github.com/ynqa/wego

Word Embeddings (e.g. Word2Vec) in Go!

glove go machine-learning nlp word-embeddings word2vec

Last synced: 29 Oct 2024