{"id":21011565,"url":"https://github.com/catqaq/opentextclassification","last_synced_at":"2025-05-07T15:21:09.983Z","repository":{"id":162479230,"uuid":"160777539","full_name":"catqaq/OpenTextClassification","owner":"catqaq","description":"OpenTextClassification is all you need for text classification! Open text classification for everyone, enjoy your NLP journey! 这可能是目前为止最全面的开源文本分类项目，支持中英双语、多种模型、多种任务。","archived":false,"fork":false,"pushed_at":"2024-05-03T09:49:26.000Z","size":323,"stargazers_count":204,"open_issues_count":4,"forks_count":20,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-31T11:21:10.163Z","etag":null,"topics":["bert","bert-text-classification","chinese-text-classification","dpcnn","ernie","gru","lstm","multi-label-classification","naive-bayes","pytorch","svm","text-classification","textcnn","textrnn","torchtext","transformers"],"latest_commit_sha":null,"homepage":"https://zhuanlan.zhihu.com/p/617133715?","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/catqaq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-07T05:52:45.000Z","updated_at":"2025-03-23T02:41:28.000Z","dependencies_parsed_at":"2024-06-28T15:33:43.861Z","dependency_job_id":null,"html_url":"https://github.com/catqaq/OpenTextClassification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catqaq%2FOpenTextClassification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catqaq%2FOpenTextClassification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catqaq%2FOpenTextClassification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catqaq%2FOpenTextClassification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/catqaq","download_url":"https://codeload.github.com/catqaq/OpenTextClassification/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252902718,"owners_count":21822288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","bert-text-classification","chinese-text-classification","dpcnn","ernie","gru","lstm","multi-label-classification","naive-bayes","pytorch","svm","text-classification","textcnn","textrnn","torchtext","transformers"],"created_at":"2024-11-19T09:29:34.167Z","updated_at":"2025-05-07T15:21:09.956Z","avatar_url":"https://github.com/catqaq.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv style=\"font-size: 1.5rem;\"\u003e\r\n  \u003ca href=\"./README.md\"\u003e中文\u003c/a\u003e |\r\n  \u003ca href=\"./docs/readme_en.md\"\u003eEnglish\u003c/a\u003e\r\n\u003c/div\u003e\r\n\u003c/br\u003e\r\n\r\n\u003ch1 align=\"center\"\u003eOpenTextClassification\u003c/h1\u003e\r\n\u003cdiv align=\"center\"\u003e\r\n  \u003ca href=\"https://github.com/catqaq/OpenTextClassification\"\u003e\r\n    \u003cimg src=\"https://pic4.zhimg.com/80/v2-f63d74cf9859eea57b0a78c9da00c9f3_720w.webp\" alt=\"Logo\" height=\"210\"\u003e\r\n  \u003c/a\u003e\r\n\r\n  \u003cp align=\"center\"\u003e\r\n    \u003ch3\u003eOpen text classification for you, Start your NLP journey\u003c/h3\u003e\r\n      \u003ca href=\"https://github.com/catqaq/OpenTextClassification/graphs/contributors\"\u003e\r\n        \u003cimg alt=\"GitHub Contributors\" src=\"https://img.shields.io/github/contributors/catqaq/OpenTextClassification\" /\u003e\r\n      \u003c/a\u003e\r\n      \u003ca href=\"https://github.com/catqaq/OpenTextClassification/issues\"\u003e\r\n        \u003cimg alt=\"Issues\" src=\"https://img.shields.io/github/issues/catqaq/OpenTextClassification?color=0088ff\" /\u003e\r\n      \u003c/a\u003e\r\n      \u003ca href=\"https://github.com/catqaq/OpenTextClassification/discussions\"\u003e\r\n        \u003cimg alt=\"Issues\" src=\"https://img.shields.io/github/discussions/catqaq/OpenTextClassification?color=0088ff\" /\u003e\r\n      \u003c/a\u003e\r\n      \u003ca href=\"https://github.com/catqaq/OpenTextClassification/pulls\"\u003e\r\n        \u003cimg alt=\"GitHub pull requests\" src=\"https://img.shields.io/github/issues-pr/catqaq/OpenTextClassification?color=0088ff\" /\u003e\r\n      \u003ca href=\"https://github.com/catqaq/OpenTextClassification/stargazers\"\u003e\r\n        \u003cimg alt=\"GitHub stars\" src=\"https://img.shields.io/github/stars/catqaq/OpenTextClassification?color=ccf\" /\u003e\r\n      \u003c/a\u003e\r\n      \u003cbr/\u003e\r\n      \u003cem\u003e开源实现 / 简单 / 全面 / 实践 \u003c/em\u003e\r\n      \u003cbr/\u003e\r\n      \u003ca href=\"https://zhuanlan.zhihu.com/p/596112080/\"\u003e\u003cstrong\u003e文章解读\u003c/strong\u003e\u003c/a\u003e\r\n        ·\r\n      \u003ca href=\"https://zhuanlan.zhihu.com/p/617133715?\"\u003e\u003cstrong\u003e视频解读\u003c/strong\u003e\u003c/a\u003e\r\n    \u003c/p\u003e\r\n\r\n\r\n\r\n \u003c/p\u003e\r\n\u003c/div\u003e\r\n\r\n\u003e **功能免费，代码开源，大家放心使用，欢迎贡献！**\r\n\r\n\r\n- [💥最新讯息](#最新讯息)\r\n- [💫OpenNLP计划](#OpenNLP计划)\r\n- [💫OpenTextCLS](#OpenTextClassification项目)\r\n- [⛏️使用步骤](#使用步骤)\r\n- [📄运行示例](#运行示例)\r\n- [📄结果展示](#结果展示)\r\n- [🛠️常见报错](#常见报错)\r\n- [💐参考资料\u0026致谢](#参考资料\u0026致谢)\r\n- [🌟赞助我们](#赞助我们)\r\n- [🌈Starchart](#Starchart)\r\n- [🏆Contributors](#Contributors)\r\n\r\n\r\n\r\n\r\n## 最新讯息\r\n\r\n- 2023/03/23：OpenTextClassification V0.0.1版正式开源，版本特性：\r\n  - 支持中英双语的文本分类\r\n  - 支持多种文本分类模型：传统机器学习浅层模型、深度学习模型和transformers类模型\r\n  - 支持多标签文本分类\r\n  - 支持多种embedding方式：inner/outer/random\r\n\r\n## OpenNLP计划\r\n\r\n我们是谁？\r\n\r\n我们是**羡鱼智能**【xianyu.ai】，主要成员是一群来自老和山下、西湖边上的咸鱼们，塘主叫作羡鱼，想在LLMs时代做点有意义的事！我们的口号是：**做OpenNLP和OpenX！希望在CloseAI卷死我们之前退出江湖！**\r\n\r\n也许有一天，等到GPT-X发布的时候，有人会说NLP不存在了，但是我们想证明有人曾经来过、热爱过！在以ChatGPT/GPT4为代表的LLMs时代，在被CloseAI卷死之前，我们发起了OpenNLP计划，宗旨是OpenNLP for everyone! \r\n\r\n- 【P0】OpenTextClassification：打造一流的文本分类项目，已开源\r\n\t- 综述：done\r\n\t- 开源项目：done\r\n\t- papers解读：doing\r\n\t- 炼丹术：doing\r\n- 【P0】OpenSE：句嵌入，自然语言处理的核心问题之一，doing\r\n- 【P0】OpenChat：筹备中，贫穷使人绝望，无卡使人悲伤\r\n- 【P1】OpenLLMs：大语言模型，doing\r\n- 【P2】OpenTextTagger：文本标注，分词、NER、词性标注等\r\n- OpenX：任重而道远\r\n\r\n## OpenTextClassification项目\r\n\r\nOpenTextClassification项目为OpenNLP计划的第一个正式的开源项目，旨在Open NLP for everyone！在以ChatGPT/GPT4为代表的LLMs时代，在被OpenAI卷死之前，做一点有意义的事情！未来有一天，等到GPT-X发布的时候，或许有人会说NLP不存在了，但是我们想证明有人曾来过！\r\n\r\n### 开发计划\r\n\r\n本项目的开发宗旨，打造全网最全面和最实用的文本分类项目和教程。如果有机会，未来希望可以做成开箱即用的文本分类工具，文本分类任务非常特殊，大部分情况下被认为是简单且基础的，然而却很难找到比较通用的文本分类工具，往往都是针对具体任务进行训练和部署。在NLP逐渐趋于大一统的今天，这一点非常不优雅，而且浪费资源。：***Open text classification for you, Start your NLP journey!\\***\r\n\r\n**简要的开发计划**：\r\n\r\n1. 【P3】支持中英双语的文本分类：100%，也欢迎支持其他语种\r\n2. 【P0】支持多种文本分类模型：基本完成，欢迎补充\r\n\t1. 浅层文本分类模型：done\r\n\t2. 【P1】DNN类模型：已支持常见模型\r\n\t3. 【P0】transformer类模型：Bert/ERNIE等\r\n\t4. 【P0】prompt learning for Text Classification：TODO\r\n\t5. 【P0】ChatGPT for Text Classification：TODO\r\n3. 【P1】支持多标签文本分类：\r\n\t1. 多种多标签分类loss：done，如有遗漏，欢迎补充\r\n\t2. 复杂的多标签分类：比如层次化等，TODO\r\n4. 【P0】支持不同的文本分类数据集/任务：文本分类任务又多又散，这是好事儿也是坏事儿。欢迎基于本项目报告各种数据集上的效果\r\n5. 【P4】支持简明易用的文本分类API：终极目标为实现一个足够通用和强大的文本分类模型，并实现自然语言交互的文本分类接口text_cls(text, candidate_labels)-\u003elabel，给定文本和候选类别(有默认值)，输出文本所属的类别；同时支持可无成本或尽可能小的成本向特定领域泛化\r\n\r\n### 加入我们\r\n\r\nOpenNLP计划的其他内容尚在筹备中，暂时只开源了本项目。欢迎大家积极参与OpenTextClassification的建设和讨论，一起变得更强！\r\n\r\n加入方式：\r\n\r\n- **项目建设**：可以在前面列出的开发计划中选择自己感兴趣的部分进行开发，建议优先选择高优先级的任务，比如添加更多的模型和数据结果。\r\n- 微信交流群：知识在讨论中发展，待定\r\n- 技术分享和讨论：输出倒逼输入，欢迎投稿，稿件会同步到本项目的docs目录和知乎专栏OpenNLP. 同时也欢迎大家积极的参与本项目的讨论https://github.com/catqaq/OpenTextClassification/discussions。\r\n\r\n\r\n\r\n## 使用步骤\r\n\r\n1.克隆本项目\r\n\r\n`git clone https://github.com/catqaq/OpenTextClassification.git`\r\n\r\n2.数据集下载和预处理\r\n\r\n请自行下载数据集，将其放到data目录下，数据统一处理成text+label格式，以\\t或逗号分隔。有空我再来补一个自动化脚本，暂时请自行处理或者参考preprocessing.py。\r\n\r\n最好将数据统一放到data目录下，比如data/dbpedia，然后分3个子目录，input存放原始数据集（你下载的数据集），data存放预处理后的格式化的数据集（text-label格式），saved_dict存放训练结果（模型和日志等）。\r\n\r\n3.运行示例\r\n\r\n经过测试的开发环境如下，仅供参考，差不多的环境应该都可以运行。\r\n\r\n- python：3.6/3.7\r\n- torch：1.6.0\r\n- transformers：4.18.0\r\n- torchtext：0.7.0\r\n- scikit-learn： 0.24.2\r\n- tensorboardX：2.6\r\n- nltk：3.6.7\r\n- numpy：1.18.5\r\n- pandas：1.1.5\r\n\r\n\r\n\r\n根据自己的需要选择模块运行，详见下一节。\r\n\r\n` python run.py`\r\n\r\n## 运行示例\r\n\r\n1.运行DNN/transformers类模型做文本分类\r\n\r\n` python run.py`\r\n\r\n2.运行传统浅层机器学习模型做文本分类\r\n\r\n`python run_shallow.py`\r\n\r\n3.运行DNN/transformers类模型做多标签文本分类\r\n\r\n`python run_multi_label.py`\r\n\r\n\r\n\r\n下表是直接运行demo的参考结果：\r\n\r\n运行环境：python3.6 + T4\r\n\r\n| demo               | 数据集      | 示例模型 | Acc    | 耗时      | 备注               |\r\n| ------------------ | ----------- | -------- | ------ | --------- | ------------------ |\r\n| run.py             | THUCNews/cn | TextCNN  | 89.94% | ~2mins    |                    |\r\n| run_multi_label.py | rcv1/en     | bert     | 61.04% | ~40mins   | 其他指标见运行结果 |\r\n| run_shallow.py     | THUCNews/cn | NB       | 89.44% | 105.34 ms |                    |\r\n\r\n## 结果展示：持续更新中\r\n\r\n笔者提供了从浅到深再到多标签的详细实验结果，可供大家参考。但受限于时间和算力，很多实验可能未达到最优，望知悉！因此，非常欢迎大家积极贡献，补充相关实验、代码和新的模型等等，一起建设OpenTextClassification。\r\n\r\n暂时只提供部分汇总的结果，详细的实验结果及参数等我有空再补，比较多，需要一些时间整理。\r\n### 1.传统浅层文本分类模型\r\n\r\n| Data        | Model                    | tokenizer | 最小词长 | Min_df | ngram | binary | Use_idf | Test acc | 备注                                                         |\r\n| ----------- | ------------------------ | --------- | -------- | ------ | ----- | ------ | ------- | -------- | ------------------------------------------------------------ |\r\n| THUCNews/cn | LR                       | lcut      | 1        | 2      | (1,1) | False  | True    | 90.61%   | C=1.0, max_iter=1000  词表61549；  train score:  94.22%  valid score:  89.84%  test score: 90.61%  training time:  175070.97 ms |\r\n|             | MultinomialNB(alpha=0.3) | lcut      | 1        | 2      | (1,1) | False  | True    | 89.86%   | 词表61549；  training time: 94.18ms                          |\r\n|             | ComplementNB(alpha=0.8)  | lcut      | 1        | 2      | (1,1) | False  | True    | 89.88%   | 词表61549；  training time: 98.31ms                          |\r\n|             | SVC(C=1.0)               | lcut      | 1        | 2      | (1,1) | False  | True    | 81.49%   | 词表61549；  维度200  training time:  7351155.59 ms  train score:  85.95%  valid score:  80.07%  test score: 81.49% |\r\n|             | DT                       | lcut      | 1        | 2      | (1,1) | False  | True    | 71.19%   | max_depth=None     training time:  149216.53 ms  train score:  99.97%  valid score:  70.57%  test score: 71.19% |\r\n|             | xgboost                  | lcut      | 1        | 2      | (1,1) | False  | True    | 90.08%   | XGBClassifier(n_estimators=2000,eta=0.3,gamma=0.1,max_depth=6,subsample=1,colsample_bytree=0.8,  nthread=10)  training time:  1551260.28 ms  train score:  99.00%  valid score:  89.34%  test score: 90.08% |\r\n|             | KNN                      | lcut      | 1        | 2      | (1,1) | False  | True    | 85.17%   | k=10  training time:  21.24 ms  train score:  89.05%  valid score:  84.53%  test score: 85.17% |\r\n|             |                          |           |          |        |       |        |         |          |                                                              |\r\n| dbpedia/en  | LR                       | None      | 2        | 2      | (1,1) | False  | True    | 98.26%   | C=1.0, max_iter=100  词表237777  training time:  220177.59 ms  train score:  98.85%  valid score:  98.19%  test score: 98.26% |\r\n|             | MultinomialNB(alpha=1.0) | None      | 2        | 2      | (1,1) | False  | True    | 95.35%   | training time:  786.24 ms  train score:  96.36%  valid score:  95.34%  test score: 95.35% |\r\n|             | ComplementNB(alpha=1.0)  | None      | 2        | 2      | (1,1) | False  | True    | 93.73%   | training time:  805.69 ms  train score:  95.30%  valid score:  93.79%  test score: 93.73% |\r\n|             | SVC(C=1.0)               | None      | 2        | 2      | (1,1) | False  | True    | 94.67%   | 维度200；  max_iter=100     training time:  144163.81 ms  train score:  94.75%  valid score:  94.59%  test score: 94.67%  注意：SVM的计算和存储成本正比于样本数的平方； |\r\n|             | DT                       | None      | 2        | 2      | (1,1) | False  | True    | 92.41%   | max_depth=100,  min_samples_leaf=5     training  time: 639744.56 ms  train  score: 95.79%  valid  score: 92.43%  test  score: 92.41% |\r\n|             | xgboost                  | None      | 2        | 2      | (1,1) | False  | True    | 97.99%   | XGBClassifier(n_estimators=200,eta=0.3,gamma=0.1,max_depth=6,subsample=1,colsample_bytree=0.8,  nthread=10,reg_alpha=0,reg_lambda=1)     training time:  1838434.42 ms  train score:  99.35%  valid score:  97.96%  test score: 97.99% |\r\n|             | KNN                      | None      | 2        | 2      | (1,1) | False  | True    | 80.05%   | k=10  training time:  137.72 ms  train score:  84.66%  valid score:  80.20%  test score: 80.05% |\r\n|             |                          |           |          |        |       |        |         |          |                                                              |\r\n\r\n###  2.深度学习文本分类模型\r\n\r\n| Data        | Model       | Embed | Bz   | Lr   | epochs | acc    | 备注              |\r\n| ----------- | ----------- | ----- | ---- | ---- | ------ | ------ | ----------------- |\r\n| THUCNews/cn | TextCNN     | outer | 128  | 1e-3 | 3/20   | 90.45% |                   |\r\n|             | TextRNN     | -     | -    | 1e-3 | 5/10   | 90.38% |                   |\r\n|             | TextRNN_Att |       |      | 1e-3 | 2/10   | 90.55% |                   |\r\n|             | TextRCNN    |       |      | 1e-3 | 3/10   | 91.01% |                   |\r\n|             | DPCNN       |       |      | 1e-3 | 3/20   | 90.12% |                   |\r\n|             | FastText    |       |      | 1e-3 | 5/20   | 90.48% |                   |\r\n|             | bert        | inner |      | 5e-5 | 2/3    | 94.10% | bert-base-chinese |\r\n|             | ERNIE       | inner |      | 5e-5 | 3/3    | 94.58% | ernie-3.0-base-zh |\r\n|             | bert_CNN    |       |      | -    | 3/3    | 94.14% |                   |\r\n|             | bert_RNN    |       |      | -    | 3/3    | 93.92% |                   |\r\n|             | bert_RNN    |       |      | -    | 3/3    | 94.45% |                   |\r\n|             | bert_RCNN   |       |      | -    | 3/3    | 94.32% |                   |\r\n|             | bert_DPCNN  |       |      | -    | 3/3    | 94.17% |                   |\r\n|             |             |       |      |      |        |        |                   |\r\n| dbpedia/en  | TextCNN     | outer | 128  | 5e-5 | 9/20   | 98.35% | glove             |\r\n|             | TextRNN     | -     | -    | -    | 6/10   | 97.97% |                   |\r\n|             | TextRNN_Att |       |      | -    | 4/10   | 97.80% |                   |\r\n|             | TextRCNN    |       |      | -    | 3/10   | 97.71% |                   |\r\n|             | DPCNN       |       |      | -    | 3/20   | 97.86% |                   |\r\n|             | FastText    |       |      | -    | 10/20  | 97.84% |                   |\r\n|             | bert        | inner |      | 5e-5 | 2/3    | 97.78% | bert-base-uncased |\r\n|             | ERNIE       |       |      |      | 2/10   | 97.75% | ernie-2.0-base-en |\r\n|             | bert_CNN    |       |      | -    | 2/3    | 97.91% |                   |\r\n|             | bert_RNN    |       |      | -    | 2/3    | 97.87% |                   |\r\n|             | bert_RCNN   |       |      | -    | 2/3    | 98.04% |                   |\r\n|             | bert_DPCNN  |       |      | -    | 2/3    | 97.95% |                   |\r\n|             |  gpt        |       |      |      | 3/3    | 97.03  |                   |\r\n|             |  gpt2       |       |      |      | 3/3    | 97.00  |                   |\r\n|             |  T5         |       |      |      | 3/3    | 96.57  |                   |\r\n|             |             |       |      |      |        |        |                   |\r\n\r\n### 3.多标签文本分类\r\n\r\n| Data    | Model       | 分层 | 样本数 | Embed | loss                    | Bz   | Lr   | epochs | Test acc  (绝对匹配率） | Micro-F1 | Macro-F1 | 备注                                    |\r\n| ------- | ----------- | ---- | ------ | ----- | ----------------------- | ---- | ---- | ------ | ----------------------- | -------- | -------- | --------------------------------------- |\r\n| Rcv1/en | TextCNN     | -    | all    | outer | multi_label_circle_loss | 128  | 1e-3 | 9/20   | 51.02%                  | 0.7904   | 0.4515   | eval_activate = None  cls_threshold = 0 |\r\n|         | TextRNN     |      |        | -     |                         | -    | -    | 13/20  | 54.00%                  | 0.7950   | 0.4358   |                                         |\r\n|         | TextRNN_Att |      |        |       |                         |      | -    | 11/20  | 53.97%                  | 0.8011   | 0.4538   |                                         |\r\n|         | TextRCNN    |      |        |       |                         |      | -    | 10/20  | 53.62%                  | 0.8111   | 0.4900   |                                         |\r\n|         | DPCNN       |      |        |       |                         |      | -    | 10/20  | 51.66%                  | 0.7890   | 0.4111   |                                         |\r\n|         | FastText    |      |        |       |                         |      | -    | 12/20  | 51.31%                  | 0.7936   | 0.4728   |                                         |\r\n|         | bert        |      | all    | inner | -                       | 128  | 2e-5 | 20/20  | 61.04%                  | 0.8454   | 0.5729   | bert-base-cased                         |\r\n|         | ERNIE       |      | all    | inner | -                       | 128  | 2e-5 | 20/20  | 61.67%                  | 0.8486   | 0.5861   | ernie-2.0-base-en                       |\r\n|         | Bert_CNN    |      | all    | inner | -                       | 128  | 2e-5 | 12/20  | 58.31%                  | 0.8364   | 0.5736   | 同bert配置                              |\r\n|         | Bert_RNN    |      | all    | inner | -                       | 128  | 2e-5 | 17/20  | 60.48%                  | 0.8371   | 0.5640   |                                         |\r\n|         | Bert_RCNN   |      | all    | inner | -                       | 128  | 2e-5 | 15/20  | 60.54%                  | 0.8457   | 0.5969   |                                         |\r\n|         | Bert_DPCNN  |      | all    | inner | -                       | 128  | 2e-5 | 13/20  | 56.52%                  | 0.8082   | 0.4273   |                                         |\r\n|         |             |      |        |       |                         |      |      |        |                         |          |          |                                         |\r\n\r\n\r\n\r\n \r\n\r\n\r\n## 常见报错\r\n\r\n\r\n\r\n## 参考资料\u0026致谢\r\n\r\nA Survey on Text Classification: From Shallow to Deep Learning：https://arxiv.org/pdf/2008.00364.pdf?utm_source=summari\r\n\r\nDeep Learning--based Text Classification: A Comprehensive Review：https://arxiv.org/pdf/2004.03705.pdf\r\n\r\nhttps://github.com/649453932/Chinese-Text-Classification-Pytorch\r\n\r\nhttps://github.com/649453932/Bert-Chinese-Text-Classification-Pytorch\r\n\r\nhttps://github.com/facebookresearch/fastText\r\n\r\nhttps://github.com/brightmart/text_classification\r\n\r\nhttps://github.com/kk7nc/Text_Classification\r\n\r\nhttps://github.com/Tencent/NeuralNLP-NeuralClassifier\r\n\r\nhttps://github.com/vandit15/Class-balanced-loss-pytorch\r\n\r\nhttps://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics\r\n\r\n\r\n\r\n## 赞助我们\r\n\r\n我们是谁？\r\n\r\n我们是羡鱼智能【xianyu.ai】，主要成员是一群来自老和山下、西湖边上的咸鱼们，塘主叫作羡鱼，想在LLMs做点有意义的事！我们的口号是：做OpenNLP和OpenX！希望在OpenAI卷死我们之前退出江湖！\r\n\r\nOpenTextClassification项目为羡鱼智能【xianyu.ai】发起的OpenNLP计划的第一个正式的开源项目，旨在Open NLP for everyone！在以ChatGPT/GPT4为代表的LLMs时代，在被OpenAI卷死之前，做一点有意义的事情！未来有一天，等到GPT-X发布的时候，或许有人会说NLP不存在了，但是我们想证明有人曾来过！\r\n\r\n本项目第一版由本羡鱼利用业务时间（熬夜）独立完成，受限于精力和算力，拖延至今，好在顺利完成了。如果大家觉得本项目对你的NLP学习/研究/工作有所帮助的话，求一个免费的star! 富哥富姐们可以考虑赞助一下！尤其是算力，**租卡的费用已经让本不富裕的鱼塘快要无鱼可摸了**！\r\n\r\n\u003cimg src=\"https://xianyunlp.oss-cn-hangzhou.aliyuncs.com/uPic/image-20230324010955205.png\" alt=\"image-20230324010955205\" style=\"zoom: 25%;\" /\u003e\r\n\r\n## Starchart\r\n\r\n[![Star History Chart](https://api.star-history.com/svg?repos=catqaq/OpenTextClassification\u0026type=Date)](https://star-history.com/#catqaq/OpenTextClassification\u0026Date)\r\n\r\n## Contributors\r\n\r\n\u003ca href=\"https://github.com/catqaq/OpenTextClassification/graphs/contributors\"\u003e\r\n  \u003cimg src=\"https://contrib.rocks/image?repo=catqaq/OpenTextClassification\" /\u003e\r\n\u003c/a\u003e\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatqaq%2Fopentextclassification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcatqaq%2Fopentextclassification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatqaq%2Fopentextclassification/lists"}