{"id":21033116,"url":"https://github.com/mwydmuch/napkinxc","last_synced_at":"2025-05-15T13:31:43.999Z","repository":{"id":47467888,"uuid":"125501148","full_name":"mwydmuch/napkinXC","owner":"mwydmuch","description":"Extremely simple and fast extreme multi-class and multi-label classifiers.","archived":false,"fork":false,"pushed_at":"2025-04-04T20:02:42.000Z","size":2662,"stargazers_count":67,"open_issues_count":3,"forks_count":7,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-05-11T19:38:39.910Z","etag":null,"topics":["classification","datasets","extreme-classification","hsm","label-tree-classifiers","machine-learning","multi-class-classification","multi-label-classification","plt","probabilistic-label-trees","python","xmlc"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mwydmuch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-03-16T10:27:31.000Z","updated_at":"2025-04-21T09:01:26.000Z","dependencies_parsed_at":"2023-01-26T00:45:41.661Z","dependency_job_id":"95ea77d7-a8c4-4b88-832f-db9795217271","html_url":"https://github.com/mwydmuch/napkinXC","commit_stats":{"total_commits":378,"total_committers":4,"mean_commits":94.5,"dds":0.0714285714285714,"last_synced_commit":"7aeaaf9c481b9ecb55fac2194eb00bedbeb68897"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwydmuch%2FnapkinXC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwydmuch%2FnapkinXC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwydmuch%2FnapkinXC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwydmuch%2FnapkinXC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mwydmuch","download_url":"https://codeload.github.com/mwydmuch/napkinXC/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254349390,"owners_count":22056343,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","datasets","extreme-classification","hsm","label-tree-classifiers","machine-learning","multi-class-classification","multi-label-classification","plt","probabilistic-label-trees","python","xmlc"],"created_at":"2024-11-19T12:51:50.571Z","updated_at":"2025-05-15T13:31:38.973Z","avatar_url":"https://github.com/mwydmuch.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# napkinXC \n[![C++ build](https://github.com/mwydmuch/napkinXC/actions/workflows/cpp-test-build.yml/badge.svg)](https://github.com/mwydmuch/napkinXC/actions/workflows/cpp-test-build.yml)\n[![Python build](https://github.com/mwydmuch/napkinXC/actions/workflows/python-test-build.yml/badge.svg)](https://github.com/mwydmuch/napkinXC/actions/workflows/python-test-build.yml)\n[![Documentation Status](https://readthedocs.org/projects/napkinxc/badge/?version=latest)](https://napkinxc.readthedocs.io/en/latest/?badge=latest)\n[![PyPI version](https://badge.fury.io/py/napkinxc.svg)](https://badge.fury.io/py/napkinxc) \n\nnapkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, \nthat focus on implementing various methods for Probabilistic Label Trees.\nIt allows training a classifier for very large datasets in just a few lines of code with minimal resources.\n\nRight now, napkinXC implements the following features both in Python and C++:\n- Probabilistic Label Trees (PLTs) and Hierarchical softmax (HSM),\n- different types of inference methods (top-k, above a given threshold, etc.),\n- fast prediction with labels weight, e.g., propensity scores,\n- efficient online F-measure optimization (OFO) procedure,\n- different tree building methods, including hierarchical k-means clustering method,\n- training of tree node\n- support for custom tree structures, and node weights, \n- helpers to download and load data from [XML Repository](http://manikvarma.org/downloads/XC/XMLRepository.html),\n- helpers to measure performance (precision@k, recall@k, nDCG@k, propensity-scored precision@k, and more).\n\nPlease note that this library is still under development and also serves as a base for experiments.\nAPI may not be compatible between releases and some of the experimental features may not be documented.\nDo not hesitate to open an issue in case of a question or problem!\n\nThe napkinXC is distributed under the MIT license. \nAll contributions to the project are welcome!\n\n\n## Python Quick Start and Documentation\n\nInstall via pip:\n```\npip install napkinxc\n```\nWe provide precompiled wheels for many Linux distros, macOS, and Windows for Python 3.7+.\nIn case there is no wheel for your OS, it will be quickly compiled from the source.\nCompilation from source requires modern C++17 compiler, CMake, Git, and Python 3.7+ installed.\n\n\nThe latest (master) version can be installed directly from the GitHub repository (not recommended):\n```\npip install git+https://github.com/mwydmuch/napkinXC.git\n```\n\n\nA minimal example of usage:\n```\nfrom napkinxc.datasets import load_dataset\nfrom napkinxc.models import PLT\nfrom napkinxc.measures import precision_at_k\n\nX_train, Y_train = load_dataset(\"eurlex-4k\", \"train\")\nX_test, Y_test = load_dataset(\"eurlex-4k\", \"test\")\nplt = PLT(\"eurlex-model\")\nplt.fit(X_train, Y_train)\nY_pred = plt.predict(X_test, top_k=1)\nprint(precision_at_k(Y_test, Y_pred, k=1)) \n```\n\nMore examples can be found under [`python/examples directory`](https://github.com/mwydmuch/napkinXC/tree/master/python/examples),\nand napkinXC's documentation is available at [https://napkinxc.readthedocs.io](https://napkinxc.readthedocs.io).\n\n\n## Executable\n\nnapkinXC can also be used as executable to train and evaluate models using data in LIBSVM format.\nSee [documentation](https://napkinxc.readthedocs.io/en/latest/exe_usage.html) for more details.\n\n\n## References and acknowledgments\n\nThis library implements methods from the following papers (see `experiments` directory for scripts to replicate the results):\n\n- [Probabilistic Label Trees for Extreme Multi-label Classification](https://arxiv.org/abs/2009.11218)\n- [Online probabilistic label trees](http://proceedings.mlr.press/v130/jasinska-kobus21a.html)\n- [Propensity-scored Probabilistic Label Trees](https://dl.acm.org/doi/10.1145/3404835.3463084)\n- [Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification](https://link.springer.com/article/10.1007/s10618-021-00751-x)\n\nAnother implementation of PLT model is available in [extremeText](https://github.com/mwydmuch/extremeText) library, \nthat implements approach described in this [NeurIPS paper](http://papers.nips.cc/paper/7872-a-no-regret-generalization-of-hierarchical-softmax-to-extreme-multi-label-classification).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmwydmuch%2Fnapkinxc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmwydmuch%2Fnapkinxc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmwydmuch%2Fnapkinxc/lists"}