Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/thunlp/PromptPapers

Must-read papers on prompt-based tuning for pre-trained language models.

ai bert machine-learning nlp pre-trained-language-models prompt prompt-based prompt-learning prompt-toolkit

Last synced: 01 Aug 2024

https://github.com/openvenues/libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

address address-parser c deduping deduplication international machine-learning natural-language-processing nlp record-linkage

Last synced: 30 Jul 2024

https://github.com/PanQiWei/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

deep-learning inference large-language-models llms nlp pytorch quantization transformer transformers

Last synced: 03 Aug 2024

https://github.com/AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

deep-learning inference large-language-models llms nlp pytorch quantization transformer transformers

Last synced: 30 Jul 2024

https://github.com/houbb/sensitive-word

👮‍♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java 敏感词过滤工具框架。请勿发布涉及政治、广告、营销、翻墙、违反国家法律法规等内容。高性能敏感词检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。)

dfa dirty-word nlp sensitive sensitive-word sensitive-word-filter trie-tree

Last synced: 31 Jul 2024

https://github.com/TrickyGo/Dive-into-DL-TensorFlow2.0

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现,项目已得到李沐老师的认可

book chinese-simplified cv deep-learning dive-into-deep-learning jupyter-notebook nlp python3 tensorflow2 tutorials

Last synced: 01 Aug 2024

https://trickygo.github.io/Dive-into-DL-TensorFlow2.0/

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现,项目已得到李沐老师的认可

book chinese-simplified cv deep-learning dive-into-deep-learning jupyter-notebook nlp python3 tensorflow2 tutorials

Last synced: 31 Jul 2024

https://github.com/mosaicml/llm-foundry

LLM training code for MosaicML foundation models

deep-learning llm neural-networks nlp pytorch

Last synced: 31 Jul 2024

https://github.com/whitead/paper-qa

LLM Chain for answering questions from documents with citations

chatgpt nlp question-answering

Last synced: 31 Jul 2024

https://github.com/microsoft/lmops

General technology for enabling AI capabilities w/ LLMs and MLLMs

agi gpt language-model llm lm lmops nlp pretraining prompt promptist x-prompt

Last synced: 02 Aug 2024

https://github.com/pytorch/text

Models, data loaders and abstractions for language processing, powered by PyTorch

data-loader dataset deep-learning models nlp pytorch

Last synced: 31 Jul 2024

https://github.com/argosopentech/argos-translate

Open-source offline translation library written in Python

language-models linux machine-translation nlp open-source python transformers translation

Last synced: 30 Jul 2024

https://github.com/fastai/course-nlp

A Code-First Introduction to NLP course

data-science machine-learning nlp python

Last synced: 31 Jul 2024

https://github.com/microsoft/LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

agi gpt language-model llm lm lmops nlp pretraining prompt promptist x-prompt

Last synced: 30 Jul 2024

https://github.com/princeton-nlp/SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

nlp sentence-embeddings

Last synced: 01 Aug 2024

https://github.com/esbatmop/mnbvc

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

chinese chinese-language chinese-nlp chinese-simplified corpus-data nlp nlp-machine-learning

Last synced: 02 Aug 2024

https://github.com/ownthink/Jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

chinese-word-segmentation cws ner nlp pos

Last synced: 01 Aug 2024

https://github.com/esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

chinese chinese-language chinese-nlp chinese-simplified corpus-data nlp nlp-machine-learning

Last synced: 01 Aug 2024

https://github.com/graykode/nlp-roadmap

ROADMAP(Mind Map) and KEYWORD for students those who have interest in learning NLP

keyword machine-learning natural-language-processing nlp probability-statistics roadmap textmining

Last synced: 01 Aug 2024

https://github.com/dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

apache2 chinese natural-language-processing ner nlp nlp-parse preprocessing python time-parse time-parsing

Last synced: 31 Jul 2024

https://github.com/promptslab/promptify

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research

chatgpt chatgpt-api chatgpt-python gpt-3 gpt-3-prompts gpt-4 gpt-4-api gpt3-library large-language-models machine-learning nlp openai prompt-engineering prompt-toolkit prompt-tuning prompt-versioning prompting prompts promptversioning transformers

Last synced: 02 Aug 2024

https://github.com/jaymody/picogpt

An unnecessarily tiny implementation of GPT-2 in NumPy.

deep-learning gpt gpt-2 large-language-models machine-learning neural-network nlp python

Last synced: 02 Aug 2024

https://github.com/promptslab/Promptify

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research

chatgpt chatgpt-api chatgpt-python gpt-3 gpt-3-prompts gpt-4 gpt-4-api gpt3-library large-language-models machine-learning nlp openai prompt-engineering prompt-toolkit prompt-tuning prompt-versioning prompting prompts promptversioning transformers

Last synced: 31 Jul 2024

https://github.com/jaymody/picoGPT

An unnecessarily tiny implementation of GPT-2 in NumPy.

deep-learning gpt gpt-2 large-language-models machine-learning neural-network nlp python

Last synced: 31 Jul 2024

https://github.com/jdkato/prose

:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

natural-language-processing nlp prose

Last synced: 30 Jul 2024

https://github.com/Kyubyong/nlp_tasks

Natural Language Processing Tasks and References

language natural-language-processing nlp

Last synced: 01 Aug 2024

https://github.com/cvi-szu/linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

bert chatbot chatgpt chinese chinese-nlp gpt-3 language-model llama nlp zero-shot-learning

Last synced: 02 Aug 2024

https://github.com/CVI-SZU/Linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

bert chatbot chatgpt chinese chinese-nlp gpt-3 language-model llama nlp zero-shot-learning

Last synced: 30 Jul 2024

https://github.com/yanshengjia/ml-road

Machine Learning Resources, Practice and Research

computer-vision deep-learning machine-learning nlp pytorch speech-recognition tensorflow

Last synced: 07 Aug 2024

https://github.com/yangjianxin1/GPT2-chitchat

GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI思想)

chichat dialogpt dialogue-model gpt-2 gpt2 nlp text-generation transformer

Last synced: 02 Aug 2024

https://github.com/DSKSD/DeepNLP-models-Pytorch

Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

cs-224n deep-learning deep-nlp-models natural-language-processing neural-network nlp pytorch rnn stanford-univ

Last synced: 30 Jul 2024

https://github.com/IntelLabs/nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

bert deep-learning deeplearning dynet nlp nlu pytorch quantization tensorflow transformers

Last synced: 31 Jul 2024

https://github.com/NervanaSystems/nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

bert deep-learning deeplearning dynet nlp nlu pytorch quantization tensorflow transformers

Last synced: 18 Aug 2024

https://github.com/ben1234560/AiLearning-Theory-Applying

快速上手Ai理论及应用实战:基础知识、Transformer、NLP、ML、DL、竞赛。含大量注释及数据集,力求每一位能看懂并复现。

ai bert dataming deep-learning kaggle-competition learning-by-doing machine-learning nlp

Last synced: 01 Aug 2024

https://github.com/li-plus/chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4

chatglm chatglm2 chatglm3 codegeex2-6b glm4 large-language-models nlp

Last synced: 31 Jul 2024

https://github.com/QData/TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

adversarial-attacks adversarial-examples adversarial-machine-learning data-augmentation machine-learning natural-language-processing nlp security

Last synced: 01 Aug 2024

https://github.com/eugeneyan/ml-surveys

📋 Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.

computer-vision deep-learning embeddings machine-learning nlp recommender-system reinforcement-learning survey

Last synced: 30 Jul 2024

https://github.com/textlint/textlint

The pluggable natural language linter for text and markdown.

javascript lint linter markdown natural-language nlp textlint

Last synced: 31 Jul 2024

https://github.com/huggingface/knockknock

🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code

computer-vision cv deep-learning machine-learning natural-language-processing neural-networks nlp nlproc python python36 train

Last synced: 30 Jul 2024

https://github.com/TeamHG-Memex/eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions

crfsuite data-science explanation inspection lightgbm machine-learning nlp python scikit-learn xgboost

Last synced: 02 Aug 2024

https://github.com/baidu/Familia

A Toolkit for Industrial Topic Modeling

lda nlp sentence-lda topic-modeling topic-models twe

Last synced: 01 Aug 2024

https://github.com/thisandagain/sentiment

AFINN-based sentiment analysis for Node.js.

afinn analysis javascript nlp sentiment sentiment-analysis

Last synced: 31 Jul 2024

https://github.com/go-ego/gse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.

chinese english go gse hmm hmm-viterbi-algorithm japanese jieba nlp segment trie

Last synced: 30 Jul 2024

https://github.com/readbeyond/aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

alignment audio cli dtw espeak espeak-ng festival ffmpeg forced-alignment linux macos nlp python smil speech srt text text-to-speech tts windows

Last synced: 01 Aug 2024

https://github.com/datawhalechina/Daily-interview

Datawhale成员整理的面经,内容包括机器学习,CV,NLP,推荐,开发等,欢迎大家star

cv interview-questions machine-learning nlp

Last synced: 02 Aug 2024

https://github.com/guillaume-be/rust-bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

bart bert deep-learning electra gpt gpt-2 language-generation machine-learning ner nlp question-answering roberta rust rust-lang sentiment-analysis transformer translation

Last synced: 31 Jul 2024

https://github.com/adapter-hub/adapters

A Unified Library for Parameter-Efficient and Modular Transfer Learning

adapters bert lora natural-language-processing nlp parameter-efficient-learning parameter-efficient-tuning pytorch transformers

Last synced: 01 Aug 2024

https://github.com/BrikerMan/Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

bert bert-model gpt-2 machine-learning named-entity-recognition ner nlp nlp-framework seq2seq sequence-labeling text-classification text-labeling transfer-learning

Last synced: 31 Jul 2024

https://github.com/datawhalechina/daily-interview

Datawhale成员整理的面经,内容包括机器学习,CV,NLP,推荐,开发等,欢迎大家star

cv interview-questions machine-learning nlp

Last synced: 31 Jul 2024

https://github.com/blmoistawinde/HarvestText

文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法

dependency-parser gitee harvesttext keyword-extraction named-entity-recognition new-word-discovery nlp pyhanlp sentiment-analysis text-cleaning text-segmentation text-summarization unsupervised

Last synced: 31 Jul 2024

https://github.com/km1994/NLP-Interview-Notes

该仓库主要记录 NLP 算法工程师相关的面试题

bert deel-learning ner nlp transformer

Last synced: 01 Aug 2024

https://github.com/RasaHQ/rasa_core

Rasa Core is now part of the Rasa repo: An open source machine learning framework to automate text-and voice-based conversations

bot bot-framework botkit bots chatbot chatbot-framework conversational-agents conversational-ai conversational-bots machine-learning machine-learning-library nlp rasa

Last synced: 01 Aug 2024

https://github.com/google-research/electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

deep-learning nlp tensorflow

Last synced: 01 Aug 2024

https://github.com/duoergun0729/nlp

兜哥出品 <一本开源的NLP入门书籍>

ai fasttext nlp security word2vec

Last synced: 01 Aug 2024

https://github.com/curiousily/Getting-Things-Done-with-Pytorch

Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoencoders, Object Detection with YOLO v5, Build your first Neural Network, Time Series forecasting for Coronavirus daily cases, Sentiment Analysis with BERT.

anomaly-detection bert computer-vision coronavirus deep-learning face-detection face-recognition lstm machine-learning nlp object-detection pytorch sentiment-analysis time-series time-series-anomaly-detection time-series-forecasting transfer-learning transformer tutorial yolo

Last synced: 01 Aug 2024

https://github.com/xusenlinzy/api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口

baichuan chatglm code-llama docker internlm langchain llama llama2 llms nlp openai qwen sqlcoder xverse

Last synced: 01 Aug 2024

https://github.com/TigerResearch/TigerBot

TigerBot: A multi-language multi-task LLM

chinese data llama2 llm nlp

Last synced: 01 Aug 2024

https://github.com/crownpku/Information-Extraction-Chinese

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

chinese-nlp information-extraction named-entity-recognition nlp relation-extraction

Last synced: 07 Aug 2024

https://github.com/chartbeat-labs/textacy

NLP, before and after spaCy

natural-language-processing nlp python spacy

Last synced: 31 Jul 2024

https://github.com/chiphuyen/lazynlp

Library to scrape and clean web pages to create massive datasets.

artificial-intelligence data-science language-model natural-language-processing nlp open python text-mining

Last synced: 01 Aug 2024

https://github.com/NLP-LOVE/Introduction-NLP

HanLP作者的新书《自然语言处理入门》详细笔记!业界良心之作,书中不是枯燥无味的公式罗列,而是用白话阐述的通俗易懂的算法模型。从基本概念出发,逐步介绍中文分词、词性标注、命名实体识别、信息抽取、文本聚类、文本分类、句法分析这几个热门问题的算法原理与工程实现。

ai deep-learning mechine-learing nlp

Last synced: 01 Aug 2024

https://github.com/DerwenAI/pytextrank

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction

graph-algorithms machine-learning natural-language natural-language-processing nlp python spacy spacy-extension summarization textgraphs textrank

Last synced: 31 Jul 2024

https://github.com/asappresearch/sru

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

deep-learning nlp pytorch recurrent-neural-networks

Last synced: 06 Aug 2024

https://github.com/koth/kcws

Deep Learning Chinese Word Segment

chinese-text-segmentation deep-learning nlp pos-tagger tensorflow

Last synced: 01 Aug 2024

https://github.com/huggingface/course

The Hugging Face course on Transformers

deep-learning hacktoberfest nlp transformers

Last synced: 01 Aug 2024

https://github.com/TingFree/NLPer-Arsenal

收录NLP竞赛策略实现、各任务baseline、相关竞赛经验贴(当前赛事、往期赛事、训练赛)、NLP会议时间、常用自媒体、GPU推荐等,持续更新中

baselines gpu nlp nlp-competition nlp-conference nlp-media pytorch

Last synced: 01 Aug 2024

https://github.com/lonePatient/BERT-NER-Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

adversarial-training albert bert chinese crf focal-loss labelsmoothing ner nlp pytorch softmax span

Last synced: 03 Aug 2024