https://github.com/predict-idlab/tsdownsample

High-performance time series downsampling algorithms for visualization
https://github.com/predict-idlab/tsdownsample

aggregation downsampling fast fpcs lttb m4 minmax performance python simd time-series visualization

Last synced: 12 days ago
JSON representation

High-performance time series downsampling algorithms for visualization

Host: GitHub
URL: https://github.com/predict-idlab/tsdownsample
Owner: predict-idlab
License: mit
Created: 2022-11-22T13:38:41.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-03-05T13:29:53.000Z (about 1 month ago)
Last Synced: 2025-03-21T21:44:57.951Z (25 days ago)
Topics: aggregation, downsampling, fast, fpcs, lttb, m4, minmax, performance, python, simd, time-series, visualization
Language: Jupyter Notebook
Homepage:
Size: 641 KB
Stars: 176
Watchers: 9
Forks: 17
Open Issues: 16
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

awesome-time-series - tsdownsample

README

        # tsdownsample

[![PyPI Latest Release](https://img.shields.io/pypi/v/tsdownsample.svg)](https://pypi.org/project/tsdownsample/)

[![support-version](https://img.shields.io/pypi/pyversions/tsdownsample)](https://img.shields.io/pypi/pyversions/tsdownsample)

[![Downloads](https://static.pepy.tech/badge/tsdownsample)](https://pepy.tech/project/tsdownsample)

[![CodeQL](https://github.com/predict-idlab/tsdownsample/actions/workflows/codeql.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/codeql.yml)

[![Testing](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-downsample_rs.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-downsample_rs.yml)

[![Testing](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml)

[![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?logo=discord&logoColor=white)](https://discord.gg/k2d59GrxPX)

Extremely fast **time series downsampling 📈** for visualization, written in Rust.

## Features ✨

- **Fast**: written in rust with PyO3 bindings

  - leverages optimized [argminmax](https://github.com/jvdd/argminmax) - which is SIMD accelerated with runtime feature detection

  - scales linearly with the number of data points

  

  - multithreaded with Rayon (in Rust)

    

      Why we do not use Python multiprocessing

      Citing the PyO3 docs on parallelism:


      


          CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing.

      

      In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with Rayon) with little to no overhead.

    

- **Efficient**: memory efficient

  - works on views of the data (no copies)

  - no intermediate data structures are created

- **Flexible**: works on any type of data

  - supported datatypes are

    - for `x`: `f32`, `f64`, `i16`, `i32`, `i64`, `u16`, `u32`, `u64`, `datetime64`, `timedelta64`

    - for `y`: `f16`, `f32`, `f64`, `i8`, `i16`, `i32`, `i64`, `u8`, `u16`, `u32`, `u64`, `datetime64`, `timedelta64`, `bool`

    

      !! 🚀 f16 argminmax is 200-300x faster than numpy

      In contrast with all other data types above, f16 is *not* hardware supported (i.e., no instructions for f16) by most modern CPUs!! 


      🐌 Programming languages facilitate support for this datatype by either (i) upcasting to f32 or (ii) using a software implementation. 


      💡 As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a symmetrical ordinal mapping from f16 to i16 is sufficient. This mapping allows to use the hardware supported scalar and SIMD i16 instructions - while not producing any memory overhead 🎉 


      More details are described in argminmax PR #1.

    

- **Easy to use**: simple & flexible API

## Install

```bash

pip install tsdownsample

```

## Usage

```python

from tsdownsample import MinMaxLTTBDownsampler

import numpy as np

# Create a time series

y = np.random.randn(10_000_000)

x = np.arange(len(y))

# Downsample to 1000 points (assuming constant sampling rate)

s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)

# Select downsampled data

downsampled_y = y[s_ds]

# Downsample to 1000 points using the (possible irregularly spaced) x-data

s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)

# Select downsampled data

downsampled_x = x[s_ds]

downsampled_y = y[s_ds]

```

## Downsampling algorithms & API

### Downsampling API 📑

Each downsampling algorithm is implemented as a class that implements a `downsample` method.

The signature of the `downsample` method:

```

downsample([x], y, n_out, **kwargs) -> ndarray[uint64]

```

**Arguments**:

- `x` is optional

- `x` and `y` are both positional arguments

- `n_out` is a mandatory keyword argument that defines the number of output values^*

- `**kwargs` are optional keyword arguments *(see [table below](#downsampling-algorithms-📈))*:

  - `parallel`: whether to use multi-threading (default: `False`)  

     ❗ The max number of threads can be configured with the `TSDOWNSAMPLE_MAX_THREADS` ENV var (e.g. `os.environ["TSDOWNSAMPLE_MAX_THREADS"] = "4"`)

  - ...

**Returns**: a `ndarray[uint64]` of indices that can be used to index the original data.

^\*When there are gaps in the time series, fewer than `n_out` indices may be returned.

### Downsampling algorithms 📈

The following downsampling algorithms (classes) are implemented:

| Downsampler | Description | `**kwargs` |

| ---:| --- |--- |

| `MinMaxDownsampler` | selects the **min and max** value in each bin | `parallel` |

| `M4Downsampler` | selects the [**min, max, first and last**](https://dl.acm.org/doi/pdf/10.14778/2732951.2732953) value in each bin | `parallel` |

| `LTTBDownsampler` | performs the [**Largest Triangle Three Buckets**](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm | `parallel` |

| `MinMaxLTTBDownsampler` | (*new two-step algorithm 🎉*) first selects `n_out` * `minmax_ratio` **min and max** values, then further reduces these to `n_out` values using the **Largest Triangle Three Buckets** algorithm | `parallel`, `minmax_ratio`^* |

^*Default value for `minmax_ratio` is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332

### Handling NaNs

This library supports two `NaN`-policies:

1. Omit `NaN`s (`NaN`s are ignored during downsampling).

2. Return index of first `NaN` once there is at least one present in the bin of the considered data.

|             Omit `NaN`s | Return `NaN`s              |

| ----------------------: | :------------------------- |

|     `MinMaxDownsampler` | `NaNMinMaxDownsampler`     |

|         `M4Downsampler` | `NaNM4Downsampler`         |

| `MinMaxLTTBDownsampler` | `NaNMinMaxLTTBDownsampler` |

|       `LTTBDownsampler` |                            |

> Note that NaNs are not supported for `x`-data.

## Limitations & assumptions 🚨

Assumes;

1. `x`-data is (non-strictly) monotonic increasing (i.e., sorted)

2. no `NaN`s in `x`-data

---



👤 Jeroen Van Der Donckt

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/predict-idlab/tsdownsample

Awesome Lists containing this project

README