https://github.com/dirktoewe/selx

Selection Algorithms for Scala
https://github.com/dirktoewe/selx

scala selection-algorithms

Last synced: 3 months ago
JSON representation

Selection Algorithms for Scala

Host: GitHub
URL: https://github.com/dirktoewe/selx
Owner: DirkToewe
License: apache-2.0
Created: 2019-04-03T15:59:53.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-04-04T22:00:53.000Z (about 6 years ago)
Last Synced: 2025-02-24T05:35:08.498Z (3 months ago)
Topics: scala, selection-algorithms
Language: Scala
Size: 33.2 KB
Stars: 1
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

SelX is an exploratory project that compares the performance and efficiency of different
[Selection Algorithms](https://en.wikipedia.org/wiki/Selection_algorithm). As a baseline
the algorithms are also compared to `java.util.Arrays.sort`. The benchmark results for random
uniform boxed Double inputs can be found [here](https://dirktoewe.github.io/SelX/benchmark_boxed.html).
SelX implements multiple evolutions of the following Selection Algorithms:

Bubble Select: Inspired by Bubble Sort. Moves the maxima
from the left side to the right until the Selection property is instated.
Heap Select: Builds a Binary Heap on the larger
of the two sides and swaps with the other side until the Selection property is instated.
Median-of-Medians (MoM): Splits the input into small groups of 3 (MoM3) or 5 (MoM5) entries, computes the median
of each groups. Of these medians, the median is computed by calling MoM recursively. This
median is then used to split/pivotize the original input into two parts. The entire procedure
is repeated on the correct one of the two parts recursively. MoM5 is guaranteed O(n).
Median-of-Medians-of-Medians (MoMoM3): Also known as repeated step algorithm. Splits the input into groups of 3 and computes their
median. Those medians are again split into groups of 3, their median is computed. Of those
medians, the median is computed by recursively calling MoMoM3. That one resulting median
is then used to split/pivotize the original input into two parts. The entire procedure
is repeated on the correct one of the two parts recursively. MoMoM3 is guaranteed
O(n) and slightly faster than MoM5.
Quick Select: Chooses a random entry from the input and uses it to split/pivotize the input. This is done
recursively until the Selection property is instated.
Mean Select: Like Quick Select but uses the mean value of th inputs to split/pivotize. (Requires numeric keys).

Most of the algorithms have been incrementally tweaked and optimized. The most effectful
optimizations were taken from [this paper](https://arxiv.org/abs/1606.00484):
* Use the median of three random values as pivot for Quick Select
* Avoid splitting/pivotizing the medians of the groups a second time
* Instead of just selecting the median of the medians, select another index of the medians
if it guarantees a better reduction of the problem size while splitting/pivotizing.

Running the Benchmarks
----------------------
To run a quick benchmark that will automatically generate HTML plots of the results, use sbt to call:
```
test:runMain test:runMain selx.Select_comparison
```

To run the proper [JMH](https://openjdk.java.net/projects/code-tools/jmh/) benchmarks, which may take roughly 24 hours, run:
```
jmh:run
```

To run only the unboxed or boxed JMH benchmark, run:
```
jmh:run selx.Select_benchmarks_double
```
or:
```
jmh:run selx.Select_benchmarks_boxed
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dirktoewe/selx

Awesome Lists containing this project

README