{"id":13571134,"url":"https://github.com/predict-idlab/tsdownsample","last_synced_at":"2025-04-04T07:33:02.227Z","repository":{"id":63622535,"uuid":"569283172","full_name":"predict-idlab/tsdownsample","owner":"predict-idlab","description":"High-performance time series downsampling algorithms for visualization","archived":false,"fork":false,"pushed_at":"2025-03-05T13:29:53.000Z","size":656,"stargazers_count":176,"open_issues_count":16,"forks_count":17,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-21T21:44:57.951Z","etag":null,"topics":["aggregation","downsampling","fast","fpcs","lttb","m4","minmax","performance","python","simd","time-series","visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/predict-idlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["jvdd","jonasvdd"]}},"created_at":"2022-11-22T13:38:41.000Z","updated_at":"2025-03-12T20:32:42.000Z","dependencies_parsed_at":"2024-01-14T23:44:27.786Z","dependency_job_id":"fcb918fa-eddf-4bb8-aa7b-3f3fd63cf45d","html_url":"https://github.com/predict-idlab/tsdownsample","commit_stats":{"total_commits":22,"total_committers":5,"mean_commits":4.4,"dds":0.5454545454545454,"last_synced_commit":"a97dc60ab20b64851e4f417f0bea91f69d05af76"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/predict-idlab%2Ftsdownsample","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/predict-idlab%2Ftsdownsample/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/predict-idlab%2Ftsdownsample/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/predict-idlab%2Ftsdownsample/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/predict-idlab","download_url":"https://codeload.github.com/predict-idlab/tsdownsample/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247139587,"owners_count":20890259,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aggregation","downsampling","fast","fpcs","lttb","m4","minmax","performance","python","simd","time-series","visualization"],"created_at":"2024-08-01T14:00:59.023Z","updated_at":"2025-04-04T07:32:57.218Z","avatar_url":"https://github.com/predict-idlab.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/jvdd","https://github.com/sponsors/jonasvdd"],"categories":["📦 Packages"],"sub_categories":["Python"],"readme":"# tsdownsample\n\n[![PyPI Latest Release](https://img.shields.io/pypi/v/tsdownsample.svg)](https://pypi.org/project/tsdownsample/)\n[![support-version](https://img.shields.io/pypi/pyversions/tsdownsample)](https://img.shields.io/pypi/pyversions/tsdownsample)\n[![Downloads](https://static.pepy.tech/badge/tsdownsample)](https://pepy.tech/project/tsdownsample)\n[![CodeQL](https://github.com/predict-idlab/tsdownsample/actions/workflows/codeql.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/codeql.yml)\n[![Testing](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-downsample_rs.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-downsample_rs.yml)\n[![Testing](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml)\n[![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?logo=discord\u0026logoColor=white)](https://discord.gg/k2d59GrxPX)\n\n\u003c!-- TODO: codecov --\u003e\n\nExtremely fast **time series downsampling 📈** for visualization, written in Rust.\n\n## Features ✨\n\n- **Fast**: written in rust with PyO3 bindings\n  - leverages optimized [argminmax](https://github.com/jvdd/argminmax) - which is SIMD accelerated with runtime feature detection\n  - scales linearly with the number of data points\n  \u003c!-- TODO check if it scales sublinearly --\u003e\n  - multithreaded with Rayon (in Rust)\n    \u003cdetails\u003e\n      \u003csummary\u003e\u003ci\u003eWhy we do not use Python multiprocessing\u003c/i\u003e\u003c/summary\u003e\n      Citing the \u003ca href=\"https://pyo3.rs/v0.17.3/parallelism.html\"\u003ePyO3 docs on parallelism\u003c/a\u003e:\u003cbr\u003e\n      \u003cblockquote\u003e\n          CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing.\n      \u003c/blockquote\u003e\n      In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with \u003ca href=\"https://github.com/rayon-rs/rayon\"\u003eRayon\u003c/a\u003e) with little to no overhead.\n    \u003c/details\u003e\n- **Efficient**: memory efficient\n  - works on views of the data (no copies)\n  - no intermediate data structures are created\n- **Flexible**: works on any type of data\n  - supported datatypes are\n    - for `x`: `f32`, `f64`, `i16`, `i32`, `i64`, `u16`, `u32`, `u64`, `datetime64`, `timedelta64`\n    - for `y`: `f16`, `f32`, `f64`, `i8`, `i16`, `i32`, `i64`, `u8`, `u16`, `u32`, `u64`, `datetime64`, `timedelta64`, `bool`\n    \u003cdetails\u003e\n      \u003csummary\u003e\u003ci\u003e!! 🚀 \u003ccode\u003ef16\u003c/code\u003e \u003ca href=\"https://github.com/jvdd/argminmax\"\u003eargminmax\u003c/a\u003e is 200-300x faster than numpy\u003c/i\u003e\u003c/summary\u003e\n      In contrast with all other data types above, \u003ccode\u003ef16\u003c/code\u003e is *not* hardware supported (i.e., no instructions for f16) by most modern CPUs!! \u003cbr\u003e\n      🐌 Programming languages facilitate support for this datatype by either (i) upcasting to \u003cu\u003ef32\u003c/u\u003e or (ii) using a software implementation. \u003cbr\u003e\n      💡 As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a \u003cu\u003esymmetrical ordinal mapping from \u003ccode\u003ef16\u003c/code\u003e to \u003ccode\u003ei16\u003c/code\u003e\u003c/u\u003e is sufficient. This mapping allows to use the hardware supported scalar and SIMD \u003ccode\u003ei16\u003c/code\u003e instructions - while not producing any memory overhead 🎉 \u003cbr\u003e\n      \u003ci\u003eMore details are described in \u003ca href=\"https://github.com/jvdd/argminmax/pull/1\"\u003eargminmax PR #1\u003c/a\u003e.\u003c/i\u003e\n    \u003c/details\u003e\n- **Easy to use**: simple \u0026 flexible API\n\n## Install\n\n```bash\npip install tsdownsample\n```\n\n## Usage\n\n```python\nfrom tsdownsample import MinMaxLTTBDownsampler\nimport numpy as np\n\n# Create a time series\ny = np.random.randn(10_000_000)\nx = np.arange(len(y))\n\n# Downsample to 1000 points (assuming constant sampling rate)\ns_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)\n\n# Select downsampled data\ndownsampled_y = y[s_ds]\n\n# Downsample to 1000 points using the (possible irregularly spaced) x-data\ns_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)\n\n# Select downsampled data\ndownsampled_x = x[s_ds]\ndownsampled_y = y[s_ds]\n```\n\n## Downsampling algorithms \u0026 API\n\n### Downsampling API 📑\n\nEach downsampling algorithm is implemented as a class that implements a `downsample` method.\nThe signature of the `downsample` method:\n\n```\ndownsample([x], y, n_out, **kwargs) -\u003e ndarray[uint64]\n```\n\n**Arguments**:\n\n- `x` is optional\n- `x` and `y` are both positional arguments\n- `n_out` is a mandatory keyword argument that defines the number of output values\u003csup\u003e*\u003c/sup\u003e\n- `**kwargs` are optional keyword arguments *(see [table below](#downsampling-algorithms-📈))*:\n  - `parallel`: whether to use multi-threading (default: `False`)  \n     ❗ The max number of threads can be configured with the `TSDOWNSAMPLE_MAX_THREADS` ENV var (e.g. `os.environ[\"TSDOWNSAMPLE_MAX_THREADS\"] = \"4\"`)\n  - ...\n\n**Returns**: a `ndarray[uint64]` of indices that can be used to index the original data.\n\n\u003csup\u003e\\*\u003c/sup\u003e\u003ci\u003eWhen there are gaps in the time series, fewer than `n_out` indices may be returned.\u003c/i\u003e\n\n### Downsampling algorithms 📈\n\nThe following downsampling algorithms (classes) are implemented:\n\n| Downsampler | Description | `**kwargs` |\n| ---:| --- |--- |\n| `MinMaxDownsampler` | selects the **min and max** value in each bin | `parallel` |\n| `M4Downsampler` | selects the [**min, max, first and last**](https://dl.acm.org/doi/pdf/10.14778/2732951.2732953) value in each bin | `parallel` |\n| `LTTBDownsampler` | performs the [**Largest Triangle Three Buckets**](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm | `parallel` |\n| `MinMaxLTTBDownsampler` | (*new two-step algorithm 🎉*) first selects `n_out` * `minmax_ratio` **min and max** values, then further reduces these to `n_out` values using the **Largest Triangle Three Buckets** algorithm | `parallel`, `minmax_ratio`\u003csup\u003e*\u003c/sup\u003e |\n\n\u003csup\u003e*\u003c/sup\u003e\u003ci\u003eDefault value for `minmax_ratio` is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332\u003c/i\u003e\n\n### Handling NaNs\n\nThis library supports two `NaN`-policies:\n\n1. Omit `NaN`s (`NaN`s are ignored during downsampling).\n2. Return index of first `NaN` once there is at least one present in the bin of the considered data.\n\n|             Omit `NaN`s | Return `NaN`s              |\n| ----------------------: | :------------------------- |\n|     `MinMaxDownsampler` | `NaNMinMaxDownsampler`     |\n|         `M4Downsampler` | `NaNM4Downsampler`         |\n| `MinMaxLTTBDownsampler` | `NaNMinMaxLTTBDownsampler` |\n|       `LTTBDownsampler` |                            |\n\n\u003e Note that NaNs are not supported for `x`-data.\n\n## Limitations \u0026 assumptions 🚨\n\nAssumes;\n\n1. `x`-data is (non-strictly) monotonic increasing (i.e., sorted)\n2. no `NaN`s in `x`-data\n\n---\n\n\u003cp align=\"center\"\u003e\n👤 \u003ci\u003eJeroen Van Der Donckt\u003c/i\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpredict-idlab%2Ftsdownsample","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpredict-idlab%2Ftsdownsample","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpredict-idlab%2Ftsdownsample/lists"}