{"id":33937281,"url":"https://github.com/liborty/medians","last_synced_at":"2026-04-02T18:46:11.638Z","repository":{"id":39893691,"uuid":"452156957","full_name":"liborty/medians","owner":"liborty","description":"Fast new algorithms for finding the medians, implemented in Rust","archived":false,"fork":false,"pushed_at":"2024-06-14T01:19:08.000Z","size":245,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-14T03:03:48.528Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/liborty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-26T05:53:01.000Z","updated_at":"2025-09-15T10:57:11.000Z","dependencies_parsed_at":"2023-12-20T15:15:04.504Z","dependency_job_id":"0768c450-1176-4fe0-ba7d-159e6dfe4f62","html_url":"https://github.com/liborty/medians","commit_stats":{"total_commits":92,"total_committers":2,"mean_commits":46.0,"dds":"0.010869565217391353","last_synced_commit":"9800cc7be1c95649569854eb40005e9af59f758e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/liborty/medians","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Fmedians","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Fmedians/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Fmedians/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Fmedians/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/liborty","download_url":"https://codeload.github.com/liborty/medians/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Fmedians/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31313324,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-12T14:56:22.444Z","updated_at":"2026-04-02T18:46:11.628Z","avatar_url":"https://github.com/liborty.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Medians\n\n[![crates.io](https://img.shields.io/crates/v/medians?logo=rust)](https://crates.io/crates/medians) [![crates.io](https://img.shields.io/crates/d/medians?logo=rust)](https://crates.io/crates/medians) [![\"GitHub last commit\"](https://img.shields.io/github/last-commit/liborty/medians/HEAD?logo=github)](https://github.com/liborty/medians) [![Actions Status](https://github.com/liborty/medians/workflows/test/badge.svg)](https://github.com/liborty/random/actions)\n\n## **by Libor Spacek**\n\nFast new algorithms for finding medians, implemented in 100% safe Rust.\n\n```rust\nuse medians::{*,algos::*};\n```\n\n## Introduction\n\nFinding medians is a common task in statistics and data analysis. At least it ought to be, because median is a more stable measure of central tendency than mean. Similarly, `mad` (median of absolute differences) is a more stable measure of data spread than standard deviation, which is dominated by squared outliers. Median and `mad` are not used nearly enough mostly for practical historical reasons: they are more difficult to compute. The fast algorithms presented here provide a remedy for this situation.\n\nWe argued in [`rstats`](https://github.com/liborty/rstats), that using the Geometric Median is the most stable way to characterise multidimensional data. The one dimensional case is addressed in this crate.\n\nSee [`tests.rs`](https://github.com/liborty/medians/blob/main/tests/tests.rs) for examples of usage. Their automatically generated output can also be found by clicking the 'test' icon at the top of this document and then examining the latest log.\n\n## Outline Usage\n\nBest methods/functions to be deployed, depending on the end type of data (i.e. type of the items within the input vector/slice).\n\n- `u8` -\u003e function `medianu8`\n- `u64` -\u003e function `medianu64`\n- `f64` -\u003e methods of trait Medianf64\n- `T` custom quantifiable to u64 -\u003e method `uqmedian` of trait `Median`\n- `T` custom comparable by `c` -\u003e method `qmedian_by` of trait `Median`\n- `T` custom comparable but not quantifiable -\u003e general method `median_by` of trait `Median`.\n\n## Algorithms Analysis\n\nShort primitive types are best dealt with by radix search. We have implemented it for `u8` and for `u64`:\n\n```rust\n/// Medians of u8 end type by fast radix search\npub fn medianu8(s: \u0026[u8]) -\u003e Result\u003cConstMedians\u003cu8\u003e, Me\u003e;\n/// Medians of u64 end type by fast recursive radix search\npub fn medu64(s: \u0026mut [u64]) -\u003e Result\u003c(u64, u64), Me\u003e;\n```\n\nMore complex data types require general comparison search, see `median_by`. Median can be found naively by sorting the list of data and then picking its midpoint. The best comparison sort algorithms have complexity `O(n*log(n))`. However, faster median algorithms with complexity `O(n)` are possible. They are based on the observation that data need to be all fully sorted, only partitioned and counted off. Therefore, the naive sort method can not compete and has been deleted as of version 2.0.0.\n\nFloyd-Rivest (1975): Median of Medians is currently considered to be 'the state of the art' comparison algorithm. It divides the data into groups of five items, finds median of each group by sort, then finds medians of five of these medians, and so on, until only one remains. This is then used as the pivot for partitioning of the original data. Such pivot will produce good partitioning, though not perfect halving. Counting off and iterating is therefore still necessary.\n\nFinding the best possible pivot estimate is not the main objective. The real objective is to eliminate (count off) eccentric data items as fast as possible, overall. Therefore, the time spent estimating the pivot has to be taken into account. It is possible to settle for less optimal pivots, yet to find the medians faster on average. In any case, efficient partitioning is a must.\n\nLet our average ratio of items remaining after one partitioning be `rs` and the Floyd-Rivest's be `rf`. Typically, `1/2 \u003c= rf \u003c= rs \u003c 1`, i.e. `rf` is more optimal, being nearer to the perfect halving (ratio of `1/2`). Suppose that we can perform two partitions in the time it takes Floyd-Rivest to do one (because of their slow pivot selection). Then it is enough for better performance that `rs^2 \u003c rf`, which is perfectly possible and seems to be born out in practice. For example, `rf=0.65` (nearly optimal), `rs=0.8` (deeply suboptimal), yet `rs^2 \u003c rf`. Nonetheless, some computational effort devoted to the pivot selection, proportional to the data length, is worth it.\n\nWe introduce another new algorithm, implemented as function `medianu64`:\n\n```rust\n/// Fast medians of u64 end type by binary partitioning\npub fn medianu64(s: \u0026mut [u64]) -\u003e Result\u003cConstMedians\u003cu64\u003e, Me\u003e\n```\n\n  on `u64` data, this runs about twice as fast as the general purpose pivoting of `median_by`. The data is partitioned by individual bit values, totally sidestepping the expense of the pivot estimation. The algorithm generally converges well. However, when the data happens to be all bunched up within a small range of values, it will slow down.\n\n### Summary of he main features of our general median algorithm\n\n- Linear complexity.\n- Fast (in-place) iterative partitioning into three subranges (lesser,equal,greater), minimising data movements and memory management.\n- Simple pivot selection strategy: median of three samples (requires only three comparisons). Really poor pivots occur only rarely during the iterative process. For longer data, we deploy median of three medians.\n\n## Trait Medianf64\n\n```rust\n/// Fast 1D medians of floating point data, plus related methods\npub trait Medianf64 {\n    /// Median of f64s, NaNs removed\n    fn medf_checked(self) -\u003e Result\u003cf64, Me\u003e;\n    /// Median of f64s, including NaNs\n    fn medf_unchecked(self) -\u003e f64;\n    /// Iterative weighted median\n    fn medf_weighted(self, ws: Self, eps: f64) -\u003e Result\u003cf64, Me\u003e;\n    /// Zero mean/median data produced by subtracting the centre\n    fn medf_zeroed(self, centre: f64) -\u003e Vec\u003cf64\u003e;\n    /// Median correlation = cosine of an angle between two zero median vecs\n    fn medf_correlation(self, v: Self) -\u003e Result\u003cf64, Me\u003e;\n    /// Median of absolute differences (MAD).\n    fn madf(self, centre: f64) -\u003e f64;\n}\n```\n\n## Trait Median\n\nThese methods are provided especially for generic, arbitrarily complex and/or large data end-types. The data is never copied during partitioning, etc.\n\nMost of its methods take a comparison closure `c` which returns an ordering between its arguments of generic type `\u0026T`. This allows comparisons in any number of different ways between any custom types.\n\nMost of its methods take a quantify closure `q`, which converts its generic argument to f64. This facilitate not just standard Rust `as` and `.into()` conversions but also any number of flexible ways of quantifying more complex custom data types.\n\nWeaker partial ordinal comparison is used instead of numerical comparison. The search algorithm remains the same. The only additional cost is the extra layer of referencing to prevent the copying of data.\n\n**`median_by()`**\n\nFor all end-types quantifiable to f64, we simply averaged the two midpoints of even length data to obtain a single median (of type `f64`). When the data items are unquantifiable to `f64`, this is no longer possible. Then `median_by` should be used. It returns both middle values within `Medians` enum type, the lesser one first:\n\n```rust\n/// Enum for results of odd/even medians\npub enum Medians\u003c'a, T\u003e {\n    /// Odd sized data results in a single median\n    Odd(\u0026'a T),\n    /// Even sized data results in a pair of (centered) medians\n    Even((\u0026'a T, \u0026'a T)),\n}\n```\n\n```rust\n/// Fast 1D generic medians, plus related methods\npub trait Median\u003c'a, T\u003e {\n    /// Median by comparison `c`, at the end quantified to a single f64 by `q`\n    fn qmedian_by(\n        self,\n        c: \u0026mut impl FnMut(\u0026T, \u0026T) -\u003e Ordering,\n        q: impl Fn(\u0026T) -\u003e f64,\n    ) -\u003e Result\u003cf64, Me\u003e;\n    /// Median of types quantifiable to u64 by `q`, at the end converted to a single f64.  \n    /// For data that is already `u64`, use function `medianu64`\n    fn uqmedian(\n            self,\n            q: impl Fn(\u0026T) -\u003e u64,\n        ) -\u003e Result\u003cf64, Me\u003e;\n    /// Median by comparison `c`, returns odd/even result\n    fn median_by(self, c: \u0026mut impl FnMut(\u0026T, \u0026T) -\u003e Ordering) -\u003e Result\u003cMedians\u003c'a, T\u003e, Me\u003e;\n    /// Zero mean/median data, produced by subtracting the centre\n    fn zeroed(self, centre: f64, quantify: impl Fn(\u0026T) -\u003e f64) -\u003e Result\u003cVec\u003cf64\u003e, Me\u003e;\n    /// Median correlation = cosine of an angle between two zero median Vecs\n    fn med_correlation(\n        self,\n        v: Self,\n        c: \u0026mut impl FnMut(\u0026T, \u0026T) -\u003e Ordering,\n        q: impl Fn(\u0026T) -\u003e f64,\n    ) -\u003e Result\u003cf64, Me\u003e;\n    /// Median of absolute differences (MAD).\n    fn mad(self, centre: f64, quantify: impl Fn(\u0026T) -\u003e f64) -\u003e f64;\n}\n```\n\n## Release Notes\n\n**Version 3.0.12** - Adding faster `medu64`, even variant is still work in progress. Fixed a bug.\n\n**Version 3.0.11** - Added method `uqmedian` to trait `Median` for types quantifiable to `u64` by some closure `q`. Fixed a recent bug in `oddmedian_by`, whereby the pivot reference was not timely saved.\n\n**Version 3.0.10** - Added `medianu64`. It is faster on u64 data than the general purpose `median_by`. It is using a new algorithm that partitions by bits, thus avoiding the complexities of pivot estimation.\n\n**Version 3.0.9** - Improved pivot estimation for large data sets.\n\n**Version 3.0.8** - Added `implementation.rs` module and reorganized the source.\n\n**Version 3.0.7** - Added `medf_weighted`, applying `\u0026[f64]` weights.\n\n**Version 3.0.6** - Moved `part`, `ref_vec` and `deref_vec` into crate `Indxvec`, to allow their wider use.\n\n**Version 3.0.5** - Obsolete code pruning.\n\n**Version 3.0.4** - Some minor code simplifications.\n\n**Version 3.0.3** - Updated dev dependency `ran` to 2.0.\n\n**Version 3.0.2** - Added function `medianu8` that finds median byte by superfast radix search. More primitive types to follow.\n\n**Version 3.0.1** - Renamed `correlation` to `med_correlation` to avoid name clashes elsewhere.\n\n**Version 3.0.0** - Numerous improvements to speed and generality and renaming.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliborty%2Fmedians","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fliborty%2Fmedians","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliborty%2Fmedians/lists"}