Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/marmarelis/rolling-quantiles
Blazing fast, composable, Pythonic quantile filters.
https://github.com/marmarelis/rolling-quantiles
Last synced: about 1 month ago
JSON representation
Blazing fast, composable, Pythonic quantile filters.
- Host: GitHub
- URL: https://github.com/marmarelis/rolling-quantiles
- Owner: marmarelis
- License: apache-2.0
- Created: 2021-03-01T07:36:13.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-05-10T21:11:00.000Z (over 1 year ago)
- Last Synced: 2024-05-14T13:03:25.278Z (6 months ago)
- Language: C
- Size: 249 KB
- Stars: 133
- Watchers: 4
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-blazingly-fast - rolling-quantiles - Blazing fast, composable, Pythonic quantile filters. (C)
README
# Rolling Quantiles for NumPy
[![Python tests](https://github.com/marmarelis/rolling-quantiles/actions/workflows/ci.yml/badge.svg?branch=master&event=push)](https://github.com/marmarelis/rolling-quantiles/actions/workflows/ci.yml)
## Hyper-efficient and composable filters.
* Simple, clean, intuitive interface.
* Streaming or batch processing.
* Python 3 bindings for a lean library written in pure C.### A Quick Tour
Let me give you but a superficial overview of this module's elegance.
```python
import numpy as np
import rolling_quantiles as rqpipe = rq.Pipeline( # rq.Pipeline is the only stateful object
# declare a cascade of filters by a sequence of immutable description objects
rq.LowPass(window=201, portion=100, subsample_rate=2),
# the above takes a median (101th element out of 201) of the most recent 200
# points and then spits out every other one
rq.HighPass(window=10, portion=3))
# that subsampled rolling median is then fed into this filter that takes a
# 30% quantile on a window of size 10, and subtracts it from its raw input# the pipeline exposes a set of read-only attributes that describe it
pipe.lag # = 60.0, the effective number of time units that the real-time output
# is delayed from the input
pipe.stride # = 2, how many inputs it takes to produce an output
# (>1 due to subsampling)input = np.random.randn(1000)
output = pipe.feed(input) # the core, singular exposed method# every other output will be a NaN to demarcate unready values
subsampled_output = output[1::pipe.stride]
```
![Example Signal](example.png)That may be a lot to take in, so let me break it down for you:
* `rq.Pipeline(description...)` constructs a filter pipeline from one or more filter descriptions and initializes internal state.
* `.feed(*)` takes in a Python number or `np.array` and its output is shaped likewise.
* The two filter types are `rq.LowPass` and `rq.HighPass` that compute rolling quantiles and return them as is, and subtract them from the raw signal respectively. Compose them however you like!
* `NaN`s in the output purposefully indicate missing values, usually due to subsampling. If you pass a `NaN` into a `LowPass` filter, it will slowly deplete its reserve and continue to return valid quantiles until the window empties completely.
* `rq.LowPass` and `rq.HighPass` alternatively take in a `quantile=q` argument, `0<=q<=1`. The filters would perform a linear interpolation in this case. In order to control the statistical characteristics of this quantile estimate, parameters `alpha` and `beta` are exposed as well with default values `(1, 1)`. Refer to SciPy's [documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.mquantiles.html) for details on this aspect.
```python
interpolated_pipe = rq.Pipeline(
# attempt to estimate the exact 40%-quantile by the
# default linear interpolation with parameters (1, 1)
rq.LowPass(window=30, quantile=0.4),
# here, the estimate is "approximately unbiased" in
# the case of Gaussian white noise
rq.HighPass(window=10, quantile=0.3,
alpha=3/8, beta=3/8))
```
See this [Wikipedia section](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample) for an elucidating overview.I also expose a convenience function `rq.medfilt(signal, window_size)` at the top-level of the package to directly supplant `scipy.signal.medfilt`.
That's it! I detailed the entire library. Don't let the size of its interface fool you!
## Installation
[![Downloads](https://pepy.tech/badge/rolling-quantiles)](https://pepy.tech/project/rolling-quantiles)If you are running Linux, MacOS, or Windows with Python 3.8+ and NumPy ~1.20, execute the following:
`pip install rolling-quantiles`
These are the conditions under which binaries are built and sent to the Python Package Index, which holds `pip`'s packages. Should the NumPy version be unsuitable, for instance, I suggest building the package from source. This is rather straightforward because the handful of source files in C have absolutely minimal dependencies.
### Building from Source
The meat of this package is a handful of C files with no external dependencies, besides NumPy 1.16+ and Python 3.7+ for the bindings located in `src/python.c`. As such, you may build from source by running the following from the project's root directory:
1. `cd python`
2. Check `pyproject.toml` to make sure the listed NumPy version matches your desired target. The compiled package will be forward- but not backward-compatible.
3. `python -m build` (make sure this invokes Python 3)
4. `pip install dist/.whl`#### Note of Caution on MacOS Big Sur
Make sure to specify `MACOSX_DEPLOYMENT_TARGET=10.X` as a prefix to the build command, e.g. `python -m build`. The placeholder `X` can be any MacOS version earlier than Big Sur (I use `9`.) By default, the build system would attempt to build for MacOS 11 that is incompatible with current Python interpreters that have been compiled against a prior version.### Benchmarking a median filter on 100 million doubles.
I make use of binary heaps that impart desirable guarantees on their amortized runtime. Realistically, their performance may depend on the statistics of the incoming signal. I pummeled the filters with Gaussian Brownian motion to gauge their practical usability under a typical drifting stochastic process.
| `window` | `rolling_quantiles` [1] | `scipy` [2] | `pandas` [3] |
| :------- | ------------------: | ----------: | -----------: |
| 4 | 14 seconds | 22 seconds | 25 seconds |
| 10 | 21 seconds | 47 seconds | 31 seconds |
| 20 | 28 seconds | 95 seconds | 35 seconds |
| 30 | 30 seconds | 140 seconds | 37 seconds |
| 40 | 34 seconds | 190 seconds | 40 seconds |
| 50 | 36 seconds | 242 seconds | 40 seconds |
| 1,000 | 61 seconds | N/A | 62 seconds |Likewise, with simulated Gaussian white noise (no drift in the signal):
| `window` | `rolling_quantiles` [1] | `scipy` [2] | `pandas` [3] |
| :------- | ------------------: | ----------: | -----------: |
| 4 | 14 seconds | 22 seconds | 25 seconds |
| 10 | 20 seconds | 51 seconds | 31 seconds |
| 20 | 25 seconds | 105 seconds | 36 seconds |
| 30 | 27 seconds | 156 seconds | 39 seconds |
| 40 | 30 seconds | 218 seconds | 41 seconds |
| 50 | 30 seconds | 279 seconds | 42 seconds |
| 1,000 | 45 seconds | N/A | 70 seconds |Intel(R) Core(TM) i7-8700T CPU @ 2.40GHz, single-threaded performance on Linux. My algorithm looked even better (relative to pandas) on a 2020 MacBook Pro. Check out this [StackOverflow answer](https://stackoverflow.com/questions/60100276/fastest-way-for-2d-rolling-window-quantile/66482238#66482238) for a particular use case.
[1] `rq.Pipeline(...)`
[2] `scipy.signal.medfilt(...)`
[3] `pd.Series.rolling(*).quantile(...)`
#### Brought to you by [Myrl](https://myrl.marmarel.is)