{"id":18366588,"url":"https://github.com/ajcr/rolling","last_synced_at":"2025-04-05T21:06:07.469Z","repository":{"id":29529632,"uuid":"115745066","full_name":"ajcr/rolling","owner":"ajcr","description":"Computationally efficient rolling window iterators for Python (sum, variance, min/max, etc.)","archived":false,"fork":false,"pushed_at":"2024-03-09T13:41:55.000Z","size":225,"stargazers_count":202,"open_issues_count":19,"forks_count":8,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-29T20:04:57.832Z","etag":null,"topics":["algorithm","efficient-algorithm","iterator","python","rolling-algorithms","rolling-hash-functions","rolling-windows","sliding-windows"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ajcr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-29T18:28:40.000Z","updated_at":"2025-02-07T14:46:37.000Z","dependencies_parsed_at":"2024-12-22T15:11:22.329Z","dependency_job_id":"7312f95e-9aca-4756-8317-e89a2ba8dae2","html_url":"https://github.com/ajcr/rolling","commit_stats":{"total_commits":220,"total_committers":4,"mean_commits":55.0,"dds":"0.018181818181818188","last_synced_commit":"ca1003d973011fd40ca17659d009eb907eaf0142"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcr%2Frolling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcr%2Frolling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcr%2Frolling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcr%2Frolling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ajcr","download_url":"https://codeload.github.com/ajcr/rolling/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399871,"owners_count":20932876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","efficient-algorithm","iterator","python","rolling-algorithms","rolling-hash-functions","rolling-windows","sliding-windows"],"created_at":"2024-11-05T23:18:21.446Z","updated_at":"2025-04-05T21:06:07.450Z","avatar_url":"https://github.com/ajcr.png","language":"Python","readme":"# rolling\n\n![PyPI version](https://img.shields.io/pypi/v/rolling.svg?color=brightgreen)\n\nA collection of computationally efficient rolling window iterators for Python.\n\nUseful arithmetical, logical and statistical operations on rolling windows (including `Sum`, `Min`, `Max`, `Mean`, `Median` and more). Both fixed-length and variable-length windows are supported for most operations. Many operations also support \"indexed\" windows.\n\nTo get started, see the [Overview](https://github.com/ajcr/rolling#overview) section below, or have a look at the some [recipes](https://github.com/ajcr/rolling/blob/master/doc/recipes.md).\n\n## Installation\n\n```\npip install rolling\n```\n\nYou can also install from source if you want the very latest development changes:\n```\ngit clone https://github.com/ajcr/rolling.git\ncd rolling/\npip install .\n```\n\nThere are no external library dependencies for running this module. If you want to run the unit tests, you'll need to install [pytest](https://docs.pytest.org/en/latest/). Once done, just run `pytest` from the base directory.\n\n## Overview\n\nConsider a sequence of integers:\n```python\nseq = [3, 1, 4, 1, 5, 9, 2]\n```\nSuppose we want to find the **maximum** value in each window of five consecutive integers:\n\n![alt tag](https://github.com/ajcr/rolling/blob/master/assets/readme_example_1.png)\n\nOne way to do this would be to use Python's `max` function and apply it to each consecutive slice of five elements:\n\n```python\n\u003e\u003e\u003e [max(seq[i:i+5]) for i in range(len(seq) - (5-1))]\n[5, 9, 9]\n```\n\nHowever, as well as being quite verbose, applying builtin functions (like `max` and `sum`) to a window becomes increasingly slow as the window size gets bigger. This is because all values in the window are visited each time the function is invoked, and so the complexity is typically _linear_ (i.e. **O(k)** where **k** is the size of the window).\n\nIt's clear by looking at the picture above that most of the values remain in the window when it is rolled forward. By keeping track of information about the window and the values that are removed and added, an operation such as finding the maximum value can be completed much more efficiently, often in _constant_ time (i.e. **O(1)**, not dependent on the size of the window).\n\nThis library implements efficient ways to perform useful operations on rolling windows:\n\n```python\n\u003e\u003e\u003e import rolling              # import library\n\u003e\u003e\u003e roll = rolling.Max(seq, 5)  # iterator returning maximum of each window of size 5\n\u003e\u003e\u003e list(roll)\n[5, 9, 9] \n```\n\nNote that these time complexity values apply to \"fixed\" and \"variable\" window types (not the \"indexed\" window type which depends on the index values encountered).\n\n## Operations\n\nThe algorithms implemented so far in this module are summarised below. \n\nThe cost of updating the window (rolling it forward) and the memory footprint of the `rolling` object are given, where `k` denotes the size of the window.\n\nThe 'Builtin' column shows the comparable method that is found in the Python standard library. This method could be applied to the window (at higher computational cost) to get the same result. Note that it may not be equivalent in all cases, for example due to differences in floating point arithmetic.\n\n\n### Arithmetical\n\nRolling objects to apply common aggregation or measurement operations to the window.\n\n| Object           | Update   | Memory | Description                            | Builtin |\n| ---------------- |:--------:|:------:|----------------------------------------|----------------|\n| `Sum`            | O(1)     | O(k)   | Sum of the window values               | [`sum`](https://docs.python.org/3/library/functions.html#sum)  |\n| `Product`        | O(1)     | O(k)   | Product of the window values           | [`math.prod`](https://docs.python.org/3.9/library/math.html#math.prod) |\n| `Nunique`        | O(1)     | O(k)   | Number of unique window values         | N/A |\n| `Min`            | O(1)     | O(k)   | Minimum value of window                | [`min`](https://docs.python.org/3/library/functions.html#min) |\n| `MinHeap`        | O(log(k))| O(k)   | Minimum value (internally uses a heap) | [`min`](https://docs.python.org/3/library/functions.html#min) |\n| `Max`            | O(1)     | O(k)   | Maximum value of window                | [`max`](https://docs.python.org/3/library/functions.html#max) |\n\n### Statistical\n\nRolling objects to apply statistical operations to the window.\n\n| Object           | Update   | Memory | Description                                                     | Builtin |\n| ---------------- |:--------:|:------:|-----------------------------------------------------------------|----------------------|\n| `Mean`           | O(1)     | O(k)   | Arithmetic mean of window values                                | [`statistics.mean`](https://docs.python.org/3.9/library/statistics.html#statistics.mean) |\n| `Median`         | O(log k) | O(k)   | Median value of window: O(log k) update if 'skiplist' used      | [`statistics.median`](https://docs.python.org/3.9/library/statistics.html#statistics.median) |\n| `Mode`           | O(1)     | O(k)   | Set of most frequently appearing values in window               | [`statistics.multimode`](https://docs.python.org/3.9/library/statistics.html#statistics.multimode) |\n| `Var`            | O(1)     | O(k)   | Variance of window, with specified degrees of freedom           | [`statistics.pvariance`](https://docs.python.org/3.9/library/statistics.html#statistics.pvariance) |\n| `Std`            | O(1)     | O(k)   | Standard deviation of window, with specified degrees of freedom | [`statistics.pstdev`](https://docs.python.org/3.9/library/statistics.html#statistics.pstdev) |\n| `Skew`           | O(1)     | O(k)   | Skewness of the window                                          | N/A |\n| `Kurtosis`       | O(1)     | O(k)   | Kurtosis of the window                                          | N/A |\n\n### Logical\n\nRolling objects to apply a logical operation to the window.\n\n| Object           | Update   | Memory | Description                                                                              | Builtin |\n| ---------------- |:--------:|:------:|------------------------------------------------------------------------------------------|---------|\n| `Any`            | O(1)     | O(1)   | True if *any* value in the window is True in a Boolean context, else False               | [`any`](https://docs.python.org/3/library/functions.html#any) |\n| `All`            | O(1)     | O(1)   | True if *all* values in the window are True in a Boolean context, else False             | [`all`](https://docs.python.org/3/library/functions.html#all) |\n| `Monotonic`      | O(1)     | O(1)   | True if *all* values in the window are monotonic increasing or decreasing                | N/A     |\n| `Match`          | O(k)     | O(k)   | True if window is equal to a specified target sequence (O(k) update if match, else O(1)) | N/A     |\n\n### Miscellaneous\n\nRolling objects implementing other operations.\n\n| Object           | Update   | Memory | Description                                                                                             | Builtin |\n| ---------------- |:--------:|:------:|---------------------------------------------------------------------------------------------------------|---------|\n| `Apply`          | ?        | O(k)   | Applies a specified callable object to the window (thus update complexity is dependent on the callable) | N/A |\n| `Entropy`        | O(1)     | O(k)   | Shannon entropy of the window (fixed-size windows only)                                                 | N/A |\n| `JaccardIndex`   | O(1)     | O(k+s) | Jaccard index (similarity coefficient) of window with a target set (s is size of target set)            | N/A |\n| `PolynomialHash` | O(1)     | O(k)   | [Polynomial hash](https://en.wikipedia.org/wiki/Rolling_hash#Polynomial_rolling_hash) of window         | N/A |\n\n\nBy default, fixed length windows are used in all operations. Variable-length windows can be specified using the `window_type` argument.\n\nThis allows windows smaller than the specified size to be evaluated at the beginning and end of the iterable. For instance, here's the `Apply` operation being used to apply Python's `tuple` callable to variable-length windows:\n```python\n\u003e\u003e\u003e seq = [3, 1, 4, 1, 5, 9, 2]\n\u003e\u003e\u003e roll_list = rolling.Apply(seq, 3, operation=tuple, window_type='variable')\n\u003e\u003e\u003e list(roll_list)\n[(3,),\n (3, 1),\n (3, 1, 4),\n (1, 4, 1),\n (4, 1, 5),\n (1, 5, 9),\n (5, 9, 2),\n (9, 2),\n (2,)]\n```\n\nIf values are indexed by a monotoncally-increasing index (e.g. with an integer key, timestamp or datetime) then the indexed window type can be used. The size of the window is the maximum distance between the oldest and newest values (e.g. an integer, or timedelta):\n```python\n\u003e\u003e\u003e idx = [0, 1, 2, 6, 7, 11, 15]\n\u003e\u003e\u003e seq = [3, 1, 4, 1, 5,  9,  2]\n\u003e\u003e\u003e roll_list_idx = rolling.Apply(zip(idx, seq), window_size=3, operation=tuple, window_type='indexed')\n\u003e\u003e\u003e list(roll_list_idx)\n[(3,),\n (3, 1),\n (3, 1, 4),\n (1,),\n (1, 5),\n (9,),\n (2,)]\n```\n\n## References and resources\n\nSome rolling algorithms are widely known (e.g. `Sum`), so I am not sure which source to cite. Some algorithms I made up as I was putting the module together (e.g. `Any`, `All`), but these are relatively simple and probably exist elsewhere.\n\nOther rolling algorithms are very cleverly designed by others and I learned a lot by reading their implementations. Here are the main resources that I used:\n\n- `Max` and `Min` are implemented using the Ascending Minima and Descending Maxima algorithms described by Richard Harter [here](http://www.richardhartersworld.com/cri/2001/slidingmin.html). This algorithm is also used in [pandas](http://pandas.pydata.org/) and [bottleneck](https://github.com/kwgoodman/bottleneck). My attention was first drawn to this algorithm by Jaime Fernandez del Rio's excellent talk _[The Secret Life Of Rolling Pandas](https://www.youtube.com/watch?v=XM_r5La-1tA)_. The algorithm is also described by Keegan Carruthers-Smith [here](https://people.cs.uct.ac.za/~ksmith/articles/sliding_window_minimum.html), along with code examples.\n\n- `Median` uses the indexable skiplist approach presented by Raymond Hettinger [here](http://code.activestate.com/recipes/577073/).\n\n- `Var` and `Std` use [Welford's algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm). I referred to the rolling variance implementation in [pandas](https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/window.pyx#L635-L784) as well as an older edit of the Wikipedia page [Algorithms for calculating variance](https://en.wikipedia.org/w/index.php?title=Algorithms_for_calculating_variance\u0026oldid=617145179).\n\n## Discussion and future work\n\nThe algorithms implemented by this module are chosen to be efficient in the sense that the cost of computing each new window value scales efficiently with the size of window.\n\nIn practice you might find that it is quicker *not* to use the the 'efficient' algorithm, and instead apply a function directly to the window. This is especially true for very small window sizes where the cost of updating a window is relatively complex. For instance, while the window size `k` is less than approximately 50, it may quicker to use `rolling.Apply(array, k, min)` (apply Python's builtin minimum function `min`) rather than using `rolling.Min(array, k)`.\n\nWith this in mind, it might be worth implementing some of the algorithms in this module in more specialised/compiled code to improve performance.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajcr%2Frolling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fajcr%2Frolling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajcr%2Frolling/lists"}