{"id":24912835,"url":"https://github.com/raynardj/category","last_synced_at":"2025-10-16T23:31:29.478Z","repository":{"id":62561008,"uuid":"453109708","full_name":"raynardj/category","owner":"raynardj","description":"Category transformation","archived":false,"fork":false,"pushed_at":"2022-03-02T06:49:19.000Z","size":33,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-03T23:35:53.696Z","etag":null,"topics":["categorical-data","categorical-features","data-science","onehot-encoding"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raynardj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-28T15:02:50.000Z","updated_at":"2022-06-05T08:22:58.000Z","dependencies_parsed_at":"2022-11-03T14:45:41.315Z","dependency_job_id":null,"html_url":"https://github.com/raynardj/category","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raynardj%2Fcategory","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raynardj%2Fcategory/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raynardj%2Fcategory/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raynardj%2Fcategory/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raynardj","download_url":"https://codeload.github.com/raynardj/category/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236756752,"owners_count":19199894,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["categorical-data","categorical-features","data-science","onehot-encoding"],"created_at":"2025-02-02T05:28:44.494Z","updated_at":"2025-10-16T23:31:24.194Z","avatar_url":"https://github.com/raynardj.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# category\n\u003e Categorical transformation for data science\n\n[![PyPI version](https://img.shields.io/pypi/v/category)](https://pypi.org/project/category)\n![Python version](https://img.shields.io/pypi/pyversions/category)\n![License](https://img.shields.io/github/license/raynardj/category)\n[![Test](https://github.com/raynardj/category/actions/workflows/python-package-conda.yml/badge.svg)](https://github.com/raynardj/category/actions/workflows/python-package-conda.yml)\n![PyPI Downloads](https://img.shields.io/pypi/dm/category)\n\n## Installation\npip install works for this library.\n\n```shell\npip install category\n```\n\n## Single Category\n```python\n# using python core\n\u003e\u003e\u003e from category import Category\n# using rust core, faster\n\u003e\u003e\u003e from category.fast import Category \n\u003e\u003e\u003e book = Category(['a', 'b', 'c', 'Category_d', 'e', 'f', 'g', 'h', 'i', 'j'], pad_mst = False)\n\u003e\u003e\u003e book.i2c[2]\n'c'\n\n\u003e\u003e\u003e book.c2i[['Category_d','f']]\narray([3, 5])\n```\n\nYou can set ```pad_mst``` to ```True``` to handle the missing token\n```python\n# using python core\n\u003e\u003e\u003e from category import Category \n# using rust core, faster\n\u003e\u003e\u003e from category.fast import Category \n\u003e\u003e\u003e book = Category(['a', 'b', 'c', 'Category_d', 'e', 'f', 'g', 'h', 'i', 'j'], pad_mst = True)\n\u003e\u003e\u003e book.i2c[2] # the 1st token is the missing token, not 'a' any more\n'b'\n\u003e\u003e\u003e book.c2i[['Stranger','Category_d','Unknown','f']]\narray([0, 4, 0, 6])\n```\n\n## Multi-Category\n```python\n# using python core\n\u003e\u003e\u003e from category import (Category, MultiCategory)\n# using rust core, faster\n\u003e\u003e\u003e from category.fast import (Category, MultiCategory)\n\u003e\u003e\u003e cates = list(f\"category{i}\" for i in range(1000))\n\u003e\u003e\u003e multi_cate = MultiCategory(Category(cates, pad_mst = True))\n\u003e\u003e\u003e multi_cate.string_to_index(\"category42, category108\")\narray([42, 108])\n```\n\nYou can also try to convert a list of strings, containing multicategorical info (which the data input is frequently used in tabular data), to nhot encoded array, and back\n```python\n\u003e\u003e\u003e nhot = multi_cate.batch_strings_to_nhot([\"category42, category108\",\"category999\"])\n\u003e\u003e\u003e multi_cate.nhot_to_list(nhot)[0]\n[\"category42\", \"category108\"]\n```\n\n## Performance\nThe running speed of this library mostly depends on python dictionary and numpy operations. Though python is a 'slow' language, such application is pretty fast, our own rust alternative is faster, by not by a huge lead\n\nHere we compare the this library with the [Rust implementation](https://github.com/raynardj/rust_category)\n\n## References\n* [GitHub](https://github.com/raynardj/category)\n* [PyPI package](https://pypi.org/project/category/)\n* [Rust implementation](https://github.com/raynardj/rust_category)\n* Used in [Tai-Chi engine](https://github.com/tcengine/tai-chi), a verstile user-friendly deep learning library","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraynardj%2Fcategory","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraynardj%2Fcategory","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraynardj%2Fcategory/lists"}