Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scikit-learn/pairwise-distances-reductions-asv-suite
A dedicated asv suite for scikit-learn private PairwiseDistancesReductions
https://github.com/scikit-learn/pairwise-distances-reductions-asv-suite
asv benchmarks cython scikit-learn
Last synced: about 1 month ago
JSON representation
A dedicated asv suite for scikit-learn private PairwiseDistancesReductions
- Host: GitHub
- URL: https://github.com/scikit-learn/pairwise-distances-reductions-asv-suite
- Owner: scikit-learn
- Created: 2022-12-05T10:14:05.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-26T12:27:55.000Z (almost 2 years ago)
- Last Synced: 2024-12-09T13:25:26.540Z (about 1 month ago)
- Topics: asv, benchmarks, cython, scikit-learn
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ASV Benchmark suite for `PairwiseDistancesReductions`
## Context
`PairwiseDistancesReductions` are Cython-based implementations of computational
expensive patterns in many scikit-learn's algorithms.In order to be able to maintain those on the longer-term, maintainers, and authors and
reviewers of Pull Requests suggesting changes need to be able
to easily and confidently assess performance regressions between revisions.This independent [`asv`](https://asv.readthedocs.io/en/stable/) benchmark suite is meant to help in this regards.
For more context, see:
- https://github.com/scikit-learn/scikit-learn/pull/24120
- https://github.com/scikit-learn/scikit-learn/issues/22587## Quick-start
This suite can be installed with:
```commandline
git clone [email protected]:jjerphan/pairwise-distances-reductions-asv-suite.git
cd pairwise-distances-reductions-asv-suite
pip install git+https://github.com/airspeed-velocity/asv
```This suite can be run with:
```question
# This might take a while (i.e several hours up to a day)
# if all combinations are benchmarked.
asv run
```For more precised run, see [`asv` commands' documentation](https://asv.readthedocs.io/en/stable/commands.html#asv-continuous).
## Workflow plan
### Needs
#### Have a feedback of performance improvement of regression in timely manner when needed for a scikit-learn Pull Request
In particular:
- have a GitHub actions workflow which would be triggerable by a comment
- specify revisions to compare (forwarded to `asv continuous`)
- be able to indicate configuration to run benchmarks for, in particular regarding
the following parameters' values:
- `PairwiseDistancesReductions`
- `metric`
- format of `(X,Y)` (in `{sparse, dense}²`)
- have the full, verbose, sorted, `asv` textual report#### Have an overview of performance with respect to theoretical ideal limit
In particular:
- outputs graphs of hardware scalability
- report estimate of sequential code proportion using Amdahl's law### Trace results overtime
### Important notes
Benchmark are correctly and entirely reproducible, traceable and reportable when the
following constraining requirements are met:
- the same machine is used overtime: in practice, we can't expect CI providers to
allocate the same machines over time, nor to dispatch to specifications-identical
machines at a given time.
- no other process that the benchmarks' are run on the machine: in practice, we can't
expect CI providers to use process isolation
- benchmarks definition aren't changed between revision: this requires not reformatting
benchmarks' python code because asv hashes the content of the file to trace benchmark
overtime