{"id":13487768,"url":"https://github.com/fmyblack/textClassify","last_synced_at":"2025-03-27T23:31:42.256Z","repository":{"id":206231083,"uuid":"82292354","full_name":"fmyblack/textClassify","owner":"fmyblack","description":"此文本分类项目主要面向机器学习初学者和文本分类效果测试者，项目内部含有朴素贝叶斯，余弦定理，逻辑回归多种分类算法以及mm，rmm分词器，同时从某新闻站点爬取了多个分类共6000多篇文章，以及一个中文词典。项目方便自由拓展各种分类器和分词器，并通过组装测试分类效果。","archived":false,"fork":false,"pushed_at":"2017-09-29T12:29:18.000Z","size":32793,"stargazers_count":34,"open_issues_count":1,"forks_count":17,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-10-30T23:35:39.506Z","etag":null,"topics":["machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fmyblack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-02-17T11:48:32.000Z","updated_at":"2024-07-20T11:24:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"a8512375-bdee-4ec4-a608-d1e10254d83e","html_url":"https://github.com/fmyblack/textClassify","commit_stats":null,"previous_names":["fmyblack/textclassify"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmyblack%2FtextClassify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmyblack%2FtextClassify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmyblack%2FtextClassify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmyblack%2FtextClassify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fmyblack","download_url":"https://codeload.github.com/fmyblack/textClassify/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944019,"owners_count":20697945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning"],"created_at":"2024-07-31T18:01:03.567Z","updated_at":"2025-03-27T23:31:37.222Z","avatar_url":"https://github.com/fmyblack.png","language":"Java","funding_links":[],"categories":["Java"],"sub_categories":[],"readme":"文本分类多种算法的实现与效果测试\n========\n\n\n\n## 项目说明\n\n+ 项目主要面向机器学习初学者或是各文本分类算法的效果测试者，下载项目后即可直接运行测试，查看各个分类算法以及中文分词的效果，效果由运行结果中的预测准确率体现；\n+ 为了更清晰的看到各个分类算法的实现，项目没有引入其他机器学习方向的jar包，所有算法原理实现都在项目代码中查看即可。\n+ 项目使用工厂模式组织，可以很方便的拓展分类器和分词器，并且查看效果。\n\n\n\n## 材料说明\n\n`./seeds`文件夹中包含了17个不同分类的训练集，每个分类都包含数百篇新闻（抓取自某新闻网站，同一分类取自该站点同一栏目下的文章），合计共6000多篇新闻。如需添加或替换新的训练集，只需按照同一层级放置文件即可。\n\n`./dic`文件夹中包含一个中文词典文件，词典含有45万+个中文词语。\n\n\n\n## 使用及部分代码说明\n\n1. `com.fmyblack.ClassifyTest`为入口类，`main`方法中完成了将所有文本按一定比例随机分为训练集，测试集，使用工厂类获取对应的分类器，训练分类器，使用分类器测试测试集获得分类效果；\n2. `com.fmyblack.ClassifierFactory`为工厂类，使用`getClassifyModel`方法组装分类算法和分词算法即可获得分类器；\n3. 项目目前包含朴素贝叶斯（`com.fmyblack.textClassify.naiveBaye`），逻辑回归（`com.fmyblack.textClassify.lr`），余弦定理（`com.fmyblack.textClassify.cosine`）多个分类模型，也包含逆向最大匹配（`com.fmyblack.word.rmm`）多个分词算法，测试效果时可自由组装。\n\n\n\n## 拓展说明\n\n1. 如需添加新的分类算法，请继承`com.fmyblack.textClassify.ClassifyModel`接口；\n2. 如需添加新的分词算法，请继承`com.fmyblack.word.WordSegmenter`接口；\n3. 将新的算法在工厂`com.fmyblack.ClassifierFactory`中注册；\n4. `com.fmyblack.textClassify.doc`包中实现了对训练集的一些基本操作，`com.fmyblack.textClassify.IDF`实现了idf算法，可供使用。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffmyblack%2FtextClassify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffmyblack%2FtextClassify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffmyblack%2FtextClassify/lists"}