https://github.com/boredoms/prone

PRONE - PRojected ONE-dimensional clustering algorithm for k-means
https://github.com/boredoms/prone

cpp data-science k-means-clustering neurips-2023 python

Last synced: 3 months ago
JSON representation

PRONE - PRojected ONE-dimensional clustering algorithm for k-means

Host: GitHub
URL: https://github.com/boredoms/prone
Owner: boredoms
Created: 2023-10-12T14:33:35.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-08-14T18:33:35.000Z (about 1 year ago)
Last Synced: 2025-03-15T07:18:11.173Z (7 months ago)
Topics: cpp, data-science, k-means-clustering, neurips-2023, python
Language: C++
Homepage:
Size: 10.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # `PRONE` (PRojected ONE-dimensional clustering)

This repository contains a ready to use implementation of the `PRONE` algorithm from the paper "Simple, Scalable and Effective Clustering via One-Dimensional Projections". This is a reference implementation published alongside our paper.

It can be installed and used in python using `pip`. Installing can be done using `pip install .` in the prone directory. The dependencies are `numpy` and `cython`. To run the tests for the C++ code, you need to have `CMake` installed. If you want to install the packages for the demo, run `pip install .[demo]`. The demo shows off how to run prone and how to use it to create a coreset and use it with scikit-learn.

## Usage

After installing, import the `prone` function from the `prone` module (See `demo.py`).  It can then be used for k-means clustering. The strength of our algorithm is that it produces an approximate solution with provable guarantees much faster than previous algorithms for seeding k-means. This makes it particularly suitable to be used in conjunction with sensitivity sampling to create a coreset. This approach allows us to significantly "compress" the input data in the first stage of a clustering pipeline, which speeds up downstream tasks.

You can also import the `coreset` function to compute a coreset of your dataset. The parameters are the dataset, the number of clusters and the size of the coreset. It returns two arrays, one containing the indices of coreset points in the dataset, and the other containing the weights corresponding to the coreset points.

## Future work

Parallelism and performance improvements.

## Citing 

If you use this library in your own research, please cite our NeurIPS paper (currently this is the preprint, as the proceedings are not out yet January 5th, 2024): 

```

@inproceedings{charikar2023simple,

  title={Simple, Scalable and Effective Clustering via One-Dimensional Projections},

  author={Charikar, Moses and Henzinger, Monika and Hu, Lunjia and V{\"o}tsch, Maximilian and Waingarten, Erik},

  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},

  year={2023}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/boredoms/prone

Awesome Lists containing this project

README