{"id":13620303,"url":"https://github.com/2shou/TextGrocery","last_synced_at":"2025-04-14T19:31:48.062Z","repository":{"id":25254623,"uuid":"28679659","full_name":"2shou/TextGrocery","owner":"2shou","description":"A simple short-text classification tool based on LibLinear","archived":false,"fork":false,"pushed_at":"2021-06-09T06:36:31.000Z","size":471,"stargazers_count":681,"open_issues_count":14,"forks_count":206,"subscribers_count":50,"default_branch":"master","last_synced_at":"2025-04-10T15:45:18.257Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/2shou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-01-01T03:13:50.000Z","updated_at":"2025-03-24T08:24:47.000Z","dependencies_parsed_at":"2022-08-23T04:00:14.308Z","dependency_job_id":null,"html_url":"https://github.com/2shou/TextGrocery","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2shou%2FTextGrocery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2shou%2FTextGrocery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2shou%2FTextGrocery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2shou%2FTextGrocery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/2shou","download_url":"https://codeload.github.com/2shou/TextGrocery/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248945949,"owners_count":21187414,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T21:00:54.311Z","updated_at":"2025-04-14T19:31:43.131Z","avatar_url":"https://github.com/2shou.png","language":"C++","funding_links":[],"categories":["资源列表","C++","Natural Language Processing"],"sub_categories":["自然语言处理"],"readme":"TextGrocery\n===========\n\n[![Build Status](https://travis-ci.org/2shou/TextGrocery.svg?branch=master)](https://travis-ci.org/2shou/TextGrocery)\n\nA simple, efficient short-text classification tool based on LibLinear\n\nEmbed with [jieba](https://github.com/fxsjy/jieba) as default tokenizer to support Chinese tokenize\n\nOther languages: [更详细的中文文档](http://textgrocery.readthedocs.org/zh/latest/index.html)\n\nPerformance\n-----------\n\n- Train set: 48k news titles with 32 labels\n- Test set: 16k news titles with 32 labels\n- Compare with svm and naive-bayes of [scikit-learn](https://github.com/scikit-learn/scikit-learn)\n\n|         Classifier       | Accuracy  |  Time cost(s)  |\n|:------------------------:|:---------:|:--------------:|\n|     scikit-learn(nb)     |   76.8%   |     134        |\n|     scikit-learn(svm)    |   76.9%   |     121        |\n|     **TextGrocery**      | **79.6%** |    **49**      |\n\nSample Code\n-----------\n\n```python\n\u003e\u003e\u003e from tgrocery import Grocery\n# Create a grocery(don't forget to set a name)\n\u003e\u003e\u003e grocery = Grocery('sample')\n# Train from list\n\u003e\u003e\u003e train_src = [\n    ('education', 'Student debt to cost Britain billions within decades'),\n    ('education', 'Chinese education for TV experiment'),\n    ('sports', 'Middle East and Asia boost investment in top level sports'),\n    ('sports', 'Summit Series look launches HBO Canada sports doc series: Mudhar')\n]\n\u003e\u003e\u003e grocery.train(train_src)\n# Or train from file\n# Format: Label\\tText\n\u003e\u003e\u003e grocery.train('train_ch.txt')\n# Save model\n\u003e\u003e\u003e grocery.save()\n# Load model(the same name as previous)\n\u003e\u003e\u003e new_grocery = Grocery('sample')\n\u003e\u003e\u003e new_grocery.load()\n# Predict\n\u003e\u003e\u003e new_grocery.predict('Abbott government spends $8 million on higher education media blitz')\neducation\n# Test from list\n\u003e\u003e\u003e test_src = [\n    ('education', 'Abbott government spends $8 million on higher education media blitz'),\n    ('sports', 'Middle East and Asia boost investment in top level sports'),\n]\n\u003e\u003e\u003e new_grocery.test(test_src)\n# Return Accuracy\n1.0\n# Or test from file\n\u003e\u003e\u003e new_grocery.test('test_ch.txt')\n# Custom tokenize\n\u003e\u003e\u003e custom_grocery = Grocery('custom', custom_tokenize=list)\n```\n\nMore examples: [sample/](sample/)\n\nInstall\n-------\n\n    $ pip install tgrocery\n\n\u003e Only test under Unix-based System\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F2shou%2FTextGrocery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F2shou%2FTextGrocery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F2shou%2FTextGrocery/lists"}