{"id":23923906,"url":"https://github.com/zhanlaoban/text_classification","last_synced_at":"2025-04-12T01:54:39.692Z","repository":{"id":119523636,"uuid":"184865324","full_name":"zhanlaoban/Text_Classification","owner":"zhanlaoban","description":"Summary of Text Classification in deep learning techniques implemented by PyTorch and TensorFlow.    深度学习文本分类技术总结，以PyTorch实现。","archived":false,"fork":false,"pushed_at":"2019-12-18T08:07:33.000Z","size":6265,"stargazers_count":13,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-12T01:54:34.571Z","etag":null,"topics":["pytorch","tensorflow","text-classification"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zhanlaoban.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-04T07:30:27.000Z","updated_at":"2024-04-29T08:48:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"ca76763c-b28f-4389-8d87-c13a1f14079c","html_url":"https://github.com/zhanlaoban/Text_Classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhanlaoban%2FText_Classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhanlaoban%2FText_Classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhanlaoban%2FText_Classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhanlaoban%2FText_Classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zhanlaoban","download_url":"https://codeload.github.com/zhanlaoban/Text_Classification/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248505872,"owners_count":21115354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pytorch","tensorflow","text-classification"],"created_at":"2025-01-05T18:51:20.280Z","updated_at":"2025-04-12T01:54:39.687Z","avatar_url":"https://github.com/zhanlaoban.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text_Classification\nHighlights：\n\n- 深度学习**中文文本分类**任务的各种模型实现  \n- 以PyTorch和TensorFlow两种形式实现  \n- 每个模型均以THUCNews作为benchmark数据集\n- 每种模型的实现原理和细节在各个模型文件夹的README.MD中\n\n\n\n# Dataset\n\n[THUCNews数据集](http://thuctc.thunlp.org/#中文文本分类数据集THUCNews)\n\n\u003e THUCNews是根据新浪新闻RSS订阅频道2005~2011年间的历史数据筛选过滤生成，包含74万篇新闻文档（2.19 GB），均为UTF-8纯文本格式。我们在原始新浪新闻分类体系的基础上，重新整合划分出14个候选分类类别：财经、彩票、房产、股票、家居、教育、科技、社会、时尚、时政、体育、星座、游戏、娱乐。\n\n原数据集是以一个类别名作为一个文件夹名，在每个文件夹下，单条语料又是以一个单独的txt文件存在的。为了方便模型中对数据集的预处理，减小整体语料数量，预先对该数据集进行了处理，减少了后续的工作量。\n\n**本项目所用benchmark数据集介绍：**\n\n本次训练使用了其中的5个分类，每个分类5000条数据。\n\n- 数据集划分如下：\n\n  训练集: 4000 * 5\n  验证集: 500 * 5\n  测试集: 500 * 5\n\n- Train/Dev/Test：8/1/1\n\n- classes：5个类别，即：体育, 财经, 房产, 家居, 教育\n\n下载(长期有效)：链接：https://pan.baidu.com/s/1-g2M47lwL9DoZTHCEqfCAA  提取码：ztxt\n\n\n\n# Contents\n\n### 01. FastText: TODO\n\n### 02. [TextCNN](https://github.com/zhanlaoban/Text_Classification/tree/master/02_TextCNN)\n\n### 03. [TextLSTM](https://github.com/zhanlaoban/Text_Classification/tree/master/03_TextLSTM)\n\n### 04. [TextLSTM_Attention](https://github.com/zhanlaoban/Text_Classification/tree/master/04_TextLSTM_Attention)\n\n### 05. [TextGRU](https://github.com/zhanlaoban/Text_Classification/tree/master/05_TextGRU)\n\n### 06. [TextRCNN](https://github.com/zhanlaoban/Text_Classification/tree/master/06_TextRCNN)\n\n### 07. [Transformers](https://github.com/zhanlaoban/Text_Classification/tree/master/07_Transformers)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhanlaoban%2Ftext_classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzhanlaoban%2Ftext_classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhanlaoban%2Ftext_classification/lists"}