Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/scikit-learn/pairwise-distances-reductions-asv-suite

A dedicated asv suite for scikit-learn private PairwiseDistancesReductions
https://github.com/scikit-learn/pairwise-distances-reductions-asv-suite

asv benchmarks cython scikit-learn

Last synced: about 1 month ago
JSON representation

A dedicated asv suite for scikit-learn private PairwiseDistancesReductions

Awesome Lists containing this project

README

        

# ASV Benchmark suite for `PairwiseDistancesReductions`

## Context

`PairwiseDistancesReductions` are Cython-based implementations of computational
expensive patterns in many scikit-learn's algorithms.

In order to be able to maintain those on the longer-term, maintainers, and authors and
reviewers of Pull Requests suggesting changes need to be able
to easily and confidently assess performance regressions between revisions.

This independent [`asv`](https://asv.readthedocs.io/en/stable/) benchmark suite is meant to help in this regards.

For more context, see:
- https://github.com/scikit-learn/scikit-learn/pull/24120
- https://github.com/scikit-learn/scikit-learn/issues/22587

## Quick-start

This suite can be installed with:

```commandline
git clone [email protected]:jjerphan/pairwise-distances-reductions-asv-suite.git
cd pairwise-distances-reductions-asv-suite
pip install git+https://github.com/airspeed-velocity/asv
```

This suite can be run with:

```question
# This might take a while (i.e several hours up to a day)
# if all combinations are benchmarked.
asv run
```

For more precised run, see [`asv` commands' documentation](https://asv.readthedocs.io/en/stable/commands.html#asv-continuous).

## Workflow plan

### Needs

#### Have a feedback of performance improvement of regression in timely manner when needed for a scikit-learn Pull Request

In particular:
- have a GitHub actions workflow which would be triggerable by a comment
- specify revisions to compare (forwarded to `asv continuous`)
- be able to indicate configuration to run benchmarks for, in particular regarding
the following parameters' values:
- `PairwiseDistancesReductions`
- `metric`
- format of `(X,Y)` (in `{sparse, dense}²`)
- have the full, verbose, sorted, `asv` textual report

#### Have an overview of performance with respect to theoretical ideal limit

In particular:
- outputs graphs of hardware scalability
- report estimate of sequential code proportion using Amdahl's law

### Trace results overtime

### Important notes

Benchmark are correctly and entirely reproducible, traceable and reportable when the
following constraining requirements are met:
- the same machine is used overtime: in practice, we can't expect CI providers to
allocate the same machines over time, nor to dispatch to specifications-identical
machines at a given time.
- no other process that the benchmarks' are run on the machine: in practice, we can't
expect CI providers to use process isolation
- benchmarks definition aren't changed between revision: this requires not reformatting
benchmarks' python code because asv hashes the content of the file to trace benchmark
overtime