{"id":15027291,"url":"https://github.com/harderthenharder/transformers_tasks","last_synced_at":"2025-05-15T11:03:12.519Z","repository":{"id":63189108,"uuid":"565372076","full_name":"HarderThenHarder/transformers_tasks","owner":"HarderThenHarder","description":"⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.","archived":false,"fork":false,"pushed_at":"2023-09-29T10:58:39.000Z","size":74567,"stargazers_count":2309,"open_issues_count":59,"forks_count":397,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-04-07T09:09:56.737Z","etag":null,"topics":["information-extraction","nlp","reinforcement-learning","text-classification","text-generation","text-matching","transformers"],"latest_commit_sha":null,"homepage":"https://www.zhihu.com/column/c_1451236880973426688","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HarderThenHarder.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-13T07:21:21.000Z","updated_at":"2025-04-07T03:23:01.000Z","dependencies_parsed_at":"2023-02-14T15:15:37.597Z","dependency_job_id":"66bdc834-dbbe-43a9-9bc4-d696dfe85e90","html_url":"https://github.com/HarderThenHarder/transformers_tasks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarderThenHarder%2Ftransformers_tasks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarderThenHarder%2Ftransformers_tasks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarderThenHarder%2Ftransformers_tasks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarderThenHarder%2Ftransformers_tasks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HarderThenHarder","download_url":"https://codeload.github.com/HarderThenHarder/transformers_tasks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248912283,"owners_count":21182245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["information-extraction","nlp","reinforcement-learning","text-classification","text-generation","text-matching","transformers"],"created_at":"2024-09-24T20:06:08.419Z","updated_at":"2025-04-14T15:55:04.843Z","avatar_url":"https://github.com/HarderThenHarder.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=center\u003e \n\n\u003cimg src=\"assets/icon.png\" width=500\u003e\n\n[![Author](https://img.shields.io/badge/Author-Pankeyu-green.svg \"Author\")](https://www.zhihu.com/column/c_1451236880973426688) [![OS](https://img.shields.io/badge/OS-Linux/Windows/Mac-red.svg \"OS\")](./) [![Based](https://img.shields.io/badge/Based-huggingface_transformers-blue.svg \"OS\")](./)\n\n[![Status](https://img.shields.io/badge/Status-WIP-darkslateblue.svg \"Status\")](./) [![Stars](https://img.shields.io/badge/Stars-1.3k-yellow.svg \"Stars\")](./) [![Fork](https://img.shields.io/badge/Fork-271-sandybrown.svg \"Stars\")](./) [![Python](https://img.shields.io/badge/Python-3.6+-darkseagreen.svg \"Python\")](./)\n\n[![Typing SVG](https://readme-typing-svg.demolab.com?font=Fira+Code\u0026pause=200\u0026color=1AF783\u0026center=true\u0026vCenter=true\u0026width=435\u0026lines=Transformers+\u003e\u003e\u003e\u003e\u003e\u003e\u003e\u003e\u003e\u003e+GO!)](https://git.io/typing-svg)\n\n\u003c/div\u003e\n\n---\n\n该项目集成了基于 [transformers](https://huggingface.co/docs/transformers/index) 库实现的多种 NLP 任务。\n\n[huggingface transformers](https://huggingface.co/docs/transformers/index) 是一个非常棒的开源框架，支持非常方便的加载/训练 transformer 模型，你可以在[这里](https://huggingface.co/docs/transformers/quicktour)看到该库的安装方法和入门级调用，该库也能支持用户非常便捷的[微调一个属于自己的模型](https://huggingface.co/docs/transformers/training)。\n\n在该项目中我们集成了一些主流的NLP任务，你可以找到对应的任务，将代码中的`训练数据集`更换成`你自己任务下的数据集`从而训练一个符合你自己任务下的模型。\n\n\u003cbr\u003e\n\n目前已经实现的NLP任务如下（更新中）：\n\n#### 1. 文本匹配（Text Matching）\n\n\u003e 计算文本间的相似度，多用于：`搜索召回`、`文本检索`、`蕴含识别` 等任务。\n\n| 模型  | 传送门  |\n|---|---|\n| 【监督】概览  | [[这里]](./text_matching/supervised/readme.md) |\n| 【监督】PointWise（单塔）  | [[这里]](./text_matching/supervised/train_pointwise.sh) |\n| 【监督】DSSM（双塔）  | [[这里]](./text_matching/supervised/train_dssm.sh) |\n| 【监督】Sentence Bert（双塔）  | [[这里]](./text_matching/supervised/train_sentence_transformer.sh) |\n| 【无监督】SimCSE  | [[这里]](./text_matching/unsupervised/simcse/readme.md) |\n\n\u003cbr\u003e\n\n#### 2. 信息抽取（Information Extraction）\n\n\u003e 在给定的文本段落中抽取目标信息，多用于：`命名实体识别（NER）`，`实体关系抽取（RE）` 等任务。\n\n| 模型  | 传送门  |\n|---|---|\n| 通用信息抽取（Universe Information Extraction, UIE）  | [[这里]](./UIE/readme.md) |\n\n\u003cbr\u003e\n\n#### 3. Prompt任务（Prompt Tasks）\n\n\u003e 通过设计提示（prompt）模板，实现使用更少量的数据在预训练模型（Pretrained Model）上得到更好的效果，多用于：`Few-Shot`，`Zero-Shot` 等任务。\n\n| 模型  | 传送门  |\n|---|---|\n| PET（基于人工定义 prompt pattern 的方法）  | [[这里]](./prompt_tasks/PET/readme.md) |\n| p-tuning（机器自动学习 prompt pattern 的方法）  | [[这里]](./prompt_tasks/p-tuning/readme.md) |\n\n\u003cbr\u003e\n\n#### 4. 文本分类（Text Classification）\n\n\u003e 对给定文本进行分类，多用于：`情感识别`，`文章分类识别` 等任务。\n\n| 模型  | 传送门  |\n|---|---|\n| BERT-CLS（基于 BERT 的分类器）  | [[这里]](./text_classification/train.sh) |\n\n\u003cbr\u003e\n\n#### 5. 强化学习 \u0026 语言模型（Reinforcement Learning \u0026 Language Model）\n\n\u003e RLHF（Reinforcement Learning from Human Feedback）通过人类的反馈，将强化学习（RL）用于更新语言生成模型（LM），从而达到更好的生成效果（代表例子：ChatGPT）；通常包括：`奖励模型（Reward Model）` 训练和 `强化学习（Reinforcement Learning）` 训练两个阶段。\n\n| 模型  | 传送门  |\n|---|---|\n| RLHF（Reward Model 训练，PPO 更新 GPT2）  | [[这里]](./RLHF/readme.md) |\n\n\u003cbr\u003e\n\n#### 6. 文本生成（Text Generation）\n\n\u003e 文本生成（NLG），通常用于：`小说续写`，`智能问答`，`对话机器人` 等任务。\n\n| 模型  | 传送门  |\n|---|---|\n| 中文问答模型（T5-Based） | [[这里]](./answer_generation/readme.md) |\n| Filling 模型（T5-Based） | [[这里]](./data_augment/filling_model/readme.md) |\n\n\u003cbr\u003e\n\n#### 7. 大模型应用（LLM Application）\n\n\u003e 构建大模型（LLM）zero-shot 解决多种任务所需的 prompt pattern(s)。\n\n| 模型  | 传送门  |\n|---|---|\n| 文本分类（chatglm-6b-Based） | [[这里]](./LLM/zero-shot/readme.md) |\n| 文本匹配（chatglm-6b-Based） | [[这里]](./LLM/zero-shot/readme.md) |\n| 信息抽取（chatglm-6b-Based） | [[这里]](./LLM/zero-shot/readme.md) |\n| 大模型性格测试（LLMs MBTI） | [[这里]](./LLM/llms_mbti/readme.md) |\n\n\n\u003cbr\u003e\n\n#### 8. 大模型训练（LLM Training）\n\n\u003e 大模型训练相关，涵盖预训练，指令微调，奖励模型，强化学习。\n\n| 模型  | 传送门  |\n|---|---|\n| ChatGLM-6B Finetune | [[这里]](./LLM/chatglm_finetune/readme.md) |\n| 从零开始训练大模型 | [[这里]](./LLM/LLMsTrainer/readme.md) |\n\n\n\u003cbr\u003e\n\n#### 9. 工具类（Tools）\n\n\u003e 一些常用工具集合。\n\n| 工具名  | 传送门  |\n|---|---|\n| Tokenizer Viewer | [[这里]](./tools/tokenizer_viewer/readme.md) |","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharderthenharder%2Ftransformers_tasks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharderthenharder%2Ftransformers_tasks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharderthenharder%2Ftransformers_tasks/lists"}