https://github.com/mwydmuch/napkinXC

Extremely simple and fast extreme multi-class and multi-label classifiers.
https://github.com/mwydmuch/napkinXC

classification datasets extreme-classification hsm label-tree-classifiers machine-learning multi-class-classification multi-label-classification plt probabilistic-label-trees python xmlc

Last synced: 6 months ago
JSON representation

Extremely simple and fast extreme multi-class and multi-label classifiers.

Host: GitHub
URL: https://github.com/mwydmuch/napkinXC
Owner: mwydmuch
License: mit
Created: 2018-03-16T10:27:31.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2025-04-04T20:02:42.000Z (8 months ago)
Last Synced: 2025-04-19T20:45:00.286Z (7 months ago)
Topics: classification, datasets, extreme-classification, hsm, label-tree-classifiers, machine-learning, multi-class-classification, multi-label-classification, plt, probabilistic-label-trees, python, xmlc
Language: C++
Homepage:
Size: 2.54 MB
Stars: 66
Watchers: 10
Forks: 7
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

awesome-decision-tree-papers - [Code

README

          [![C++ build](https://github.com/mwydmuch/napkinXC/actions/workflows/cpp-test-build.yml/badge.svg)](https://github.com/mwydmuch/napkinXC/actions/workflows/cpp-test-build.yml)

[![Python build](https://github.com/mwydmuch/napkinXC/actions/workflows/python-test-build.yml/badge.svg)](https://github.com/mwydmuch/napkinXC/actions/workflows/python-test-build.yml)

[![Documentation Status](https://readthedocs.org/projects/napkinxc/badge/?version=latest)](https://napkinxc.readthedocs.io/en/latest/?badge=latest)

[![PyPI version](https://badge.fury.io/py/napkinxc.svg)](https://badge.fury.io/py/napkinxc) 



  



napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, 

that focuses on implementing various methods for Probabilistic Label Trees.

It allows training a classifier for very large datasets in just a few lines of code with minimal resources.

Right now, napkinXC implements the following features both in Python and C++:

- Probabilistic Label Trees (PLTs) and Hierarchical softmax (HSM),

- different types of inference methods (top-k, above a given threshold, etc.),

- fast prediction with label weights, e.g., propensity scores,

- different tree-building methods, including the hierarchical k-means clustering method,

- training of tree node

- support for custom tree structures, and node weights, 

- helpers to download and load data from [XML Repository](http://manikvarma.org/downloads/XC/XMLRepository.html),

- helpers to measure performance (precision@k, recall@k, nDCG@k, propensity-scored precision@k, and more).

Please note that this library is still under development and also serves as a base for experiments.

API may not be compatible between releases and some of the experimental features may not be documented.

Do not hesitate to open an issue in case of a question or problem!

The napkinXC is distributed under the MIT license. 

All contributions to the project are welcome!

## Python Quick Start and Documentation

Install via pip:

```

pip install napkinxc

```

We provide precompiled wheels for many Linux distros, macOS, and Windows for Python 3.9+.

If there is no wheel for your OS, it will be quickly compiled from the source.

Compilation from source requires a modern C++17 compiler, CMake, Git, and Python 3.9+ installed.

The latest (master) version can be installed directly from the GitHub repository (not recommended):

```

pip install git+https://github.com/mwydmuch/napkinXC.git

```

A minimal example of usage:

```

from napkinxc.datasets import load_dataset

from napkinxc.models import PLT

from napkinxc.metrics import precision_at_k

X_train, Y_train = load_dataset("eurlex-4k", "train")

X_test, Y_test = load_dataset("eurlex-4k", "test")

plt = PLT("eurlex-model")

plt.fit(X_train, Y_train)

Y_pred = plt.predict(X_test, top_k=1)

print(precision_at_k(Y_test, Y_pred, k=1)) 

```

More examples can be found under [`python/examples directory`](https://github.com/mwydmuch/napkinXC/tree/master/python/examples),

and napkinXC's documentation is available at [https://napkinxc.readthedocs.io](https://napkinxc.readthedocs.io).

## Executable

napkinXC can also be used as an executable to train and evaluate models using data in LIBSVM format.

See [documentation](https://napkinxc.readthedocs.io/en/latest/exe_usage.html) for more details.

## References and acknowledgments

This library implements methods from the following papers (see `experiments` directory for scripts to replicate the results):

- [Probabilistic Label Trees for Extreme Multi-label Classification](https://arxiv.org/abs/2009.11218)

- [Online probabilistic label trees](http://proceedings.mlr.press/v130/jasinska-kobus21a.html)

- [Propensity-scored Probabilistic Label Trees](https://dl.acm.org/doi/10.1145/3404835.3463084)

- [Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification](https://link.springer.com/article/10.1007/s10618-021-00751-x)

Another implementation of PLT model is available in [extremeText](https://github.com/mwydmuch/extremeText) library, 

that implements the approach described in this [NeurIPS paper](http://papers.nips.cc/paper/7872-a-no-regret-generalization-of-hierarchical-softmax-to-extreme-multi-label-classification).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mwydmuch/napkinXC

Awesome Lists containing this project

README