Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/PaddlePaddle/ERNIE

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

bert ernie language-understanding natural-language-processing nlp

Last synced: 02 Nov 2024

https://github.com/codertimo/bert-pytorch

Google AI 2018 BERT pytorch implementation

bert language-model nlp pytorch transformer

Last synced: 17 Dec 2024

https://github.com/maartengr/bertopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

bert ldavis machine-learning nlp sentence-embeddings topic topic-modeling topic-modelling topic-models transformers

Last synced: 16 Dec 2024

https://github.com/codertimo/BERT-pytorch

Google AI 2018 BERT pytorch implementation

bert language-model nlp pytorch transformer

Last synced: 02 Nov 2024

https://github.com/zihangdai/xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding

deep-learning nlp tensorflow

Last synced: 18 Dec 2024

https://github.com/MaartenGr/BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

bert ldavis machine-learning nlp sentence-embeddings topic topic-modeling topic-modelling topic-models transformers

Last synced: 06 Nov 2024

https://github.com/clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

computer-vision document-ai eccv-2022 multimodal-pre-trained-model nlp ocr

Last synced: 16 Dec 2024

https://github.com/axa-group/parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 17 Dec 2024

https://github.com/aisingapore/tagui

Free RPA tool by AI Singapore

ai nlp opencv rpa tesseract

Last synced: 17 Dec 2024

https://github.com/axa-group/Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 25 Oct 2024

https://github.com/vi3k6i5/flashtext

Extract Keywords from sentence or Replace keywords in sentences.

data-extraction keyword-extraction nlp search-in-text word2vec

Last synced: 16 Dec 2024

https://github.com/skalskip/courses

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

computer-vision deep-learning deep-neural-networks generative-model machine-learning mlops multimodal natural-language-processing nlp stable-diffusion transformers tutorial

Last synced: 19 Dec 2024

https://github.com/SkalskiP/courses

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

computer-vision deep-learning deep-neural-networks generative-model machine-learning mlops multimodal natural-language-processing nlp stable-diffusion transformers tutorial

Last synced: 30 Oct 2024

https://github.com/dsdanielpark/bard-api

The unofficial python package that returns response of Google Bard through cookie value.

ai-api api bard bard-api chatbot google google-bard google-bard-api google-bard-python google-maps-api googlebard llm nlp

Last synced: 25 Sep 2024

https://github.com/dsdanielpark/Bard-API

The unofficial python package that returns response of Google Bard through cookie value.

ai-api api bard bard-api chatbot google google-bard google-bard-api google-bard-python google-maps-api googlebard llm nlp

Last synced: 29 Oct 2024

https://github.com/aisingapore/TagUI

Free RPA tool by AI Singapore

ai nlp opencv rpa tesseract

Last synced: 25 Oct 2024

https://github.com/chatopera/synonyms

:herb: 中文近义词:聊天机器人,智能问答工具包

chatbot nlp synonyms

Last synced: 17 Dec 2024

https://github.com/chatopera/Synonyms

:herb: 中文近义词:聊天机器人,智能问答工具包

chatbot nlp synonyms

Last synced: 05 Nov 2024

https://github.com/nyandwi/machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

computer-vision data-analysis data-science data-visualization datascience deep-learning keras machine-learning matplotlib neural-networks nlp numpy open-source pandas python scikit-learn seaborn tensorflow

Last synced: 19 Dec 2024

https://github.com/Nyandwi/machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

computer-vision data-analysis data-science data-visualization datascience deep-learning keras machine-learning matplotlib neural-networks nlp numpy open-source pandas python scikit-learn seaborn tensorflow

Last synced: 05 Nov 2024

https://github.com/houbb/sensitive-word

👮‍♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java 敏感词过滤工具框架。请勿发布涉及政治、广告、营销、翻墙、违反国家法律法规等内容。高性能敏感词检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。)

dfa dirty-word nlp sensitive sensitive-word sensitive-word-filter trie-tree

Last synced: 19 Dec 2024

https://github.com/autogptq/autogptq

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

deep-learning inference large-language-models llms nlp pytorch quantization transformer transformers

Last synced: 16 Dec 2024

https://github.com/panqiwei/autogptq

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

deep-learning inference large-language-models llms nlp pytorch quantization transformer transformers

Last synced: 13 Dec 2024

https://github.com/scir-hi/huatuo-llama-med-chinese

Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调

aidoctor bloom chinese huozi llama llm medgpt medical medqa nlp

Last synced: 22 Dec 2024

https://github.com/SCIR-HI/Huatuo-Llama-Med-Chinese

Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调

aidoctor bloom chinese huozi llama llm medgpt medical medqa nlp

Last synced: 27 Oct 2024

https://github.com/spro/practical-pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained

natural-language-generation natural-language-processing nlg nlp seq2seq

Last synced: 26 Oct 2024

https://github.com/shibing624/text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

embeddings nlp sentence-embeddings similarity text-similarity text2vec word2vec

Last synced: 17 Dec 2024

https://github.com/trigaten/learn_prompting

Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community

chatgpt chatgpt-api deep-learning gpt-3 gpt-4 gpt-4-api gpt3 large-language-models llm machine-learning nlp openai-api prompt-engineering prompt-toolkit prompt-tuning prompting transformers

Last synced: 17 Dec 2024

https://github.com/AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

deep-learning inference large-language-models llms nlp pytorch quantization transformer transformers

Last synced: 25 Oct 2024

https://github.com/trigaten/Learn_Prompting

Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community

chatgpt chatgpt-api deep-learning gpt-3 gpt-4 gpt-4-api gpt3 large-language-models llm machine-learning nlp openai-api prompt-engineering prompt-toolkit prompt-tuning prompting transformers

Last synced: 29 Oct 2024

https://github.com/dsgiitr/d2l-pytorch

This project reproduces the book Dive Into Deep Learning (https://d2l.ai/), adapting the code from MXNet into PyTorch.

book computer-vision d2l data-science deep-learning dive-into-deep-learning mxnet nlp pytorch pytorch-implmention

Last synced: 17 Dec 2024

https://github.com/errata-ai/vale

:pencil: A markup-aware linter for prose built with speed and extensibility in mind.

linter linting nlp vale

Last synced: 17 Dec 2024

https://github.com/thunlp/promptpapers

Must-read papers on prompt-based tuning for pre-trained language models.

ai bert machine-learning nlp pre-trained-language-models prompt prompt-based prompt-learning prompt-toolkit

Last synced: 04 Dec 2024

https://github.com/mosaicml/llm-foundry

LLM training code for Databricks foundation models

deep-learning llm neural-networks nlp pytorch

Last synced: 16 Dec 2024

https://github.com/thunlp/PromptPapers

Must-read papers on prompt-based tuning for pre-trained language models.

ai bert machine-learning nlp pre-trained-language-models prompt prompt-based prompt-learning prompt-toolkit

Last synced: 05 Nov 2024

https://github.com/openvenues/libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

address address-parser c deduping deduplication international machine-learning natural-language-processing nlp record-linkage

Last synced: 17 Dec 2024

https://github.com/argosopentech/argos-translate

Open-source offline translation library written in Python

language-models linux machine-translation nlp open-source python transformers translation

Last synced: 16 Dec 2024

https://github.com/trickygo/dive-into-dl-tensorflow2.0

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现,项目已得到李沐老师的认可

book chinese-simplified cv deep-learning dive-into-deep-learning jupyter-notebook nlp python3 tensorflow2 tutorials

Last synced: 19 Dec 2024

https://trickygo.github.io/Dive-into-DL-TensorFlow2.0/

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现,项目已得到李沐老师的认可

book chinese-simplified cv deep-learning dive-into-deep-learning jupyter-notebook nlp python3 tensorflow2 tutorials

Last synced: 31 Oct 2024

https://github.com/TrickyGo/Dive-into-DL-TensorFlow2.0

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现,项目已得到李沐老师的认可

book chinese-simplified cv deep-learning dive-into-deep-learning jupyter-notebook nlp python3 tensorflow2 tutorials

Last synced: 06 Nov 2024

https://github.com/microsoft/lmops

General technology for enabling AI capabilities w/ LLMs and MLLMs

agi gpt language-model llm lm lmops nlp pretraining prompt promptist x-prompt

Last synced: 18 Dec 2024

https://github.com/microsoft/LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

agi gpt language-model llm lm lmops nlp pretraining prompt promptist x-prompt

Last synced: 24 Oct 2024

https://github.com/pytorch/text

Models, data loaders and abstractions for language processing, powered by PyTorch

data-loader dataset deep-learning models nlp pytorch

Last synced: 21 Dec 2024

https://github.com/princeton-nlp/simcse

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

nlp sentence-embeddings

Last synced: 17 Dec 2024

https://github.com/fastai/course-nlp

A Code-First Introduction to NLP course

data-science machine-learning nlp python

Last synced: 19 Dec 2024

https://github.com/esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

chinese chinese-language chinese-nlp chinese-simplified corpus-data nlp nlp-machine-learning

Last synced: 03 Nov 2024

https://github.com/esbatmop/mnbvc

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

chinese chinese-language chinese-nlp chinese-simplified corpus-data nlp nlp-machine-learning

Last synced: 19 Dec 2024

https://github.com/princeton-nlp/SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

nlp sentence-embeddings

Last synced: 03 Nov 2024

https://github.com/promptslab/promptify

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research

chatgpt chatgpt-api chatgpt-python gpt-3 gpt-3-prompts gpt-4 gpt-4-api gpt3-library large-language-models machine-learning nlp openai prompt-engineering prompt-toolkit prompt-tuning prompt-versioning prompting prompts promptversioning transformers

Last synced: 19 Dec 2024

https://github.com/ownthink/jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

chinese-word-segmentation cws ner nlp pos

Last synced: 19 Dec 2024

https://github.com/dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

apache2 chinese natural-language-processing ner nlp nlp-parse preprocessing python time-parse time-parsing

Last synced: 27 Oct 2024

https://github.com/ownthink/Jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

chinese-word-segmentation cws ner nlp pos

Last synced: 06 Nov 2024

https://github.com/jaymody/picogpt

An unnecessarily tiny implementation of GPT-2 in NumPy.

deep-learning gpt gpt-2 large-language-models machine-learning neural-network nlp python

Last synced: 20 Dec 2024

https://github.com/promptslab/Promptify

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research

chatgpt chatgpt-api chatgpt-python gpt-3 gpt-3-prompts gpt-4 gpt-4-api gpt3-library large-language-models machine-learning nlp openai prompt-engineering prompt-toolkit prompt-tuning prompt-versioning prompting prompts promptversioning transformers

Last synced: 31 Oct 2024

https://github.com/graykode/nlp-roadmap

ROADMAP(Mind Map) and KEYWORD for students those who have interest in learning NLP

keyword machine-learning natural-language-processing nlp probability-statistics roadmap textmining

Last synced: 30 Nov 2024

https://github.com/jaymody/picoGPT

An unnecessarily tiny implementation of GPT-2 in NumPy.

deep-learning gpt gpt-2 large-language-models machine-learning neural-network nlp python

Last synced: 29 Oct 2024

https://github.com/yanshengjia/ml-road

Machine Learning Resources, Practice and Research

computer-vision deep-learning machine-learning nlp pytorch speech-recognition tensorflow

Last synced: 27 Nov 2024

https://github.com/modelscope/data-juicer

Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

chinese data-analysis data-science data-visualization dataset gpt gpt-4 instruction-tuning large-language-models llama llava llm llms multi-modal nlp opendata pre-training pytorch sora streamlit

Last synced: 22 Dec 2024

https://github.com/ben1234560/ailearning-theory-applying

快速上手AI理论及应用实战:基础知识、Transformer、NLP、ML、DL、竞赛。含大量注释及数据集,力求每一位能看懂并复现。

ai bert dataming deep-learning kaggle-competition learning-by-doing machine-learning nlp

Last synced: 19 Dec 2024

https://github.com/jdkato/prose

:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

natural-language-processing nlp prose

Last synced: 29 Sep 2024

https://github.com/ben1234560/AiLearning-Theory-Applying

快速上手AI理论及应用实战:基础知识、Transformer、NLP、ML、DL、竞赛。含大量注释及数据集,力求每一位能看懂并复现。

ai bert dataming deep-learning kaggle-competition learning-by-doing machine-learning nlp

Last synced: 08 Nov 2024

https://github.com/cvi-szu/linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

bert chatbot chatgpt chinese chinese-nlp gpt-3 language-model llama nlp zero-shot-learning

Last synced: 22 Dec 2024

https://github.com/CVI-SZU/Linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

bert chatbot chatgpt chinese chinese-nlp gpt-3 language-model llama nlp zero-shot-learning

Last synced: 24 Oct 2024

https://github.com/Kyubyong/nlp_tasks

Natural Language Processing Tasks and References

language natural-language-processing nlp

Last synced: 08 Nov 2024

https://github.com/kyubyong/nlp_tasks

Natural Language Processing Tasks and References

language natural-language-processing nlp

Last synced: 01 Dec 2024

https://github.com/yangjianxin1/gpt2-chitchat

GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI思想)

chichat dialogpt dialogue-model gpt-2 gpt2 nlp text-generation transformer

Last synced: 20 Dec 2024

https://github.com/yangjianxin1/GPT2-chitchat

GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI思想)

chichat dialogpt dialogue-model gpt-2 gpt2 nlp text-generation transformer

Last synced: 11 Nov 2024

https://github.com/qdata/textattack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

adversarial-attacks adversarial-examples adversarial-machine-learning data-augmentation machine-learning natural-language-processing nlp security

Last synced: 17 Dec 2024

https://github.com/dsksd/deepnlp-models-pytorch

Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

cs-224n deep-learning deep-nlp-models natural-language-processing neural-network nlp pytorch rnn stanford-univ

Last synced: 20 Dec 2024

https://github.com/DSKSD/DeepNLP-models-Pytorch

Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

cs-224n deep-learning deep-nlp-models natural-language-processing neural-network nlp pytorch rnn stanford-univ

Last synced: 25 Oct 2024

https://github.com/li-plus/chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

chatglm chatglm2 chatglm3 codegeex2-6b glm4 large-language-models nlp

Last synced: 17 Dec 2024

https://github.com/QData/TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

adversarial-attacks adversarial-examples adversarial-machine-learning data-augmentation machine-learning natural-language-processing nlp security

Last synced: 03 Nov 2024

https://github.com/IntelLabs/nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

bert deep-learning deeplearning dynet nlp nlu pytorch quantization tensorflow transformers

Last synced: 30 Oct 2024

https://github.com/intellabs/nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

bert deep-learning deeplearning dynet nlp nlu pytorch quantization tensorflow transformers

Last synced: 26 Sep 2024