https://github.com/gchq/coreax
A library for coreset algorithms, written in Jax for fast execution and GPU support.
https://github.com/gchq/coreax
Last synced: 4 months ago
JSON representation
A library for coreset algorithms, written in Jax for fast execution and GPU support.
- Host: GitHub
- URL: https://github.com/gchq/coreax
- Owner: gchq
- License: apache-2.0
- Created: 2023-07-06T08:45:08.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-10T12:23:37.000Z (4 months ago)
- Last Synced: 2025-02-10T12:33:20.905Z (4 months ago)
- Language: Python
- Size: 13.7 MB
- Stars: 26
- Watchers: 5
- Forks: 2
- Open Issues: 67
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
- trackawesomelist - Coreax - Algorithms for finding coresets to compress large datasets while retaining their statistical properties. <img src="https://img.shields.io/github/stars/gchq/coreax?style=social" align="center"> (Recently Updated / [Feb 16, 2025](/content/2025/02/16/README.md))
- awesome-jax - coreax - A library for coreset algorithms, written in Jax for fast execution and GPU support. <img src="https://img.shields.io/github/stars/gchq/coreax?style=social" align="center"> (Libraries)
README
![]()
# Coreax
[](https://github.com/gchq/coreax/actions/workflows/unittests.yml)
[](https://github.com/gchq/coreax/actions/workflows/coverage.yml)
[](https://github.com/gchq/coreax/actions/workflows/pre_commit_checks.yml)
[](https://github.com/pylint-dev/pylint)
[](https://pypi.org/project/coreax)
[](https://pypi.org/project/coreax)
_© Crown Copyright GCHQ_
Coreax is a library for **coreset algorithms**, written in JAX for fast execution and GPU support.
## About Coresets
For $n$ points in $d$ dimensions, a coreset algorithm takes an $n \times d$ data set and
reduces it to $m \ll n$ points whilst attempting to preserve the statistical properties
of the full data set. The algorithm maintains the dimension of the original data set.
Thus the $m$ points, referred to as the **coreset**, are also $d$-dimensional.The $m$ points need not be in the original data set. We refer to the special case where
all selected points are in the original data set as a **coresubset**.Some algorithms return the $m$ points with weights, so that importance can be
attributed to each point in the coreset. The weights, $w_i$ for $i=1,...,m$, are often
chosen from the simplex. In this case, they are non-negative and sum to 1:
$w_i >0$ $\forall i$ and $\sum_{i} w_i =1$.Please see [the documentation](https://coreax.readthedocs.io/en/latest/quickstart.html) for some in-depth examples.
## Example applications
### Choosing pixels from an image
In the example below, we reduce the original 180x215
pixel image (38,700 pixels in total) to a coreset approximately 20% of this size.
(Left) original image.
(Centre) 8,000 coreset points chosen using Stein kernel herding, with point size a
function of weight.
(Right) 8,000 points chosen randomly.
Run `examples/david_map_reduce_weighted.py` to replicate.
### Video event detection
Here we identify representative frames such that most of the
useful information in a video is preserved.
Run `examples/pounce.py` to replicate.| Original | Coreset |
|:------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:|
|  |  |# Setup
Install Coreax from PyPI by adding `coreax` to your project dependencies or running
```shell
pip install coreax
```Coreax uses JAX. It installs the CPU version by default, but if you have a GPU or TPU,
see the
[JAX installation instructions](https://jax.readthedocs.io/en/latest/installation.html)
for options available to take advantage of the power of your system. For example, if you
have an NVIDIA GPU on Linux, add `jax[cuda12]` to your project dependencies or run
```shell
pip install jax[cuda12]
```There are optional sets of additional dependencies:
* `coreax[test]` is required to run the tests;
* `coreax[example]` contains all dependencies for the example scripts;
* `coreax[benchmark]` is required to run benchmarking;
* `coreax[doc]` is for compiling the Sphinx documentation;
* `coreax[dev]` includes all tools and packages a developer of Coreax might need.Note that the `test` and `dev` dependencies include `opencv-python-headless`, which is
the headless version of OpenCV and is incompatible with other versions of OpenCV. If you
wish to use an alternative version, remove `opencv-python-headless` and select an
alternative from the
[OpenCV documentation](https://pypi.org/project/opencv-python-headless/).Should the installation of Coreax fail, you can see the versions used by the Coreax
development team in `uv.lock`. You can transfer these to your own project as follows.
First, [install UV](https://docs.astral.sh/uv/getting-started/installation/). Then,
clone the repo from [GitHub](https://github.com/gchq/coreax). Next, run
```shell
uv export --format requirements-txt
```
which will generate a `requirements.txt`. Install this in your own project before trying
to install Coreax itself,
```shell
pip install -r requirements.txt
pip install coreax
```# Release cycle
We anticipate two release types: feature releases and security releases. Security
releases will be issued as needed in accordance with the
[security policy](https://github.com/gchq/coreax/security/policy). Feature releases will
be issued as appropriate, dependent on the feature pipeline and development priorities.