{"id":13475165,"url":"https://github.com/liborty/rstats","last_synced_at":"2026-04-08T13:31:11.462Z","repository":{"id":40675886,"uuid":"291397207","full_name":"liborty/rstats","owner":"liborty","description":"Statistics, Information Measures, Linear Algebra, Cholesky Matrix Decomposition, Mahalanobis Distance, Householder QR Decomposition, Clifford Algebra, Multidimensional Data Analysis, Geometric Median, Hulls, Machine Learning, multithreading implementation...","archived":false,"fork":false,"pushed_at":"2024-07-20T00:53:47.000Z","size":2888,"stargazers_count":56,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-12-14T03:13:09.180Z","etag":null,"topics":["geometric-median","linear-algebra","machine-learning","math","multidimensional-analysis","rust","statistics"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/liborty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-30T04:06:51.000Z","updated_at":"2025-08-27T04:50:53.000Z","dependencies_parsed_at":"2024-01-03T14:28:32.086Z","dependency_job_id":"aab14b28-6e7b-498c-9a1e-ce2b5e2e468d","html_url":"https://github.com/liborty/rstats","commit_stats":{"total_commits":521,"total_committers":2,"mean_commits":260.5,"dds":"0.0057581573896353655","last_synced_commit":"58e7f5fbbc77cfdbc6e94cbb5e210fe6e1cfb393"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/liborty/rstats","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Frstats","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Frstats/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Frstats/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Frstats/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/liborty","download_url":"https://codeload.github.com/liborty/rstats/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liborty%2Frstats/sbom","scorecard":{"id":588041,"data":{"date":"2025-08-11","repo":{"name":"github.com/liborty/rstats","commit":"04b7914584f5be9ab0649722f4dccf4c85729696"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.4,"checks":[{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/tests.yml:10: update your workflow using https://app.stepsecurity.io/secureworkflow/liborty/rstats/tests.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/tests.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/liborty/rstats/tests.yml/master?enable=pin","Info:   0 out of   1 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 third-party GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/tests.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-20T21:06:05.270Z","repository_id":40675886,"created_at":"2025-08-20T21:06:05.270Z","updated_at":"2025-08-20T21:06:05.270Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31558380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T10:21:54.569Z","status":"ssl_error","status_checked_at":"2026-04-08T10:21:38.171Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["geometric-median","linear-algebra","machine-learning","math","multidimensional-analysis","rust","statistics"],"created_at":"2024-07-31T16:01:17.908Z","updated_at":"2026-04-08T13:31:11.438Z","avatar_url":"https://github.com/liborty.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# Rstats [![crates.io](https://img.shields.io/crates/v/Rstats?logo=rust)](https://crates.io/crates/rstats) [![crates.io](https://img.shields.io/crates/d/rstats?logo=rust)](https://crates.io/crates/rstats) [![GitHub last commit](https://img.shields.io/github/last-commit/liborty/Rstats/HEAD?logo=github)](https://github.com/liborty/Rstats) [![Actions Status](https://github.com/liborty/rstats/actions/workflows/tests.yml/badge.svg)](https://github.com/liborty/rstats/actions)\n\n## Author: Libor Spacek\n\n## Usage\n\nThis crate is written in 100% safe Rust.\n\nUse in your source files any of the following structs, as and when needed:\n\n```rust  \nuse Rstats::{RE,RError,Params,TriangMat,MinMax};\n```\n\nand any of the following traits:\n\n```rust\nuse Rstats::{Stats,Vecg,Vecu8,MutVecg,VecVec,VecVecg};\n```\n\nand any of the following auxiliary functions:\n\n```rust\nuse Rstats::{\n    fromop,sumn,tm_stat,unit_matrix,nodata_error,data_error,\n    arith_error,other_error };\n```\n\nor just simply use everything:\n\n```rust\nuse Rstats::*;\n```\n\nThe latest (nightly) version is always available in the github repository [Rstats](https://github.com/liborty/Rstats). Sometimes it may be (only in some details) a little ahead of the `crates.io` release versions.\n\nIt is highly recommended to read and run [tests.rs](https://github.com/liborty/Rstats/blob/master/tests/tests.rs) for examples of usage. To run all the tests, use a single thread in order not to print the results in confusing mixed-up order:\n\n```bash  \ncargo test --release -- --test-threads=1 --nocapture\n```\n\nHowever, `geometric_medians`, which compares multithreading performance, should be run separately in multiple threads, as follows:\n\n```bash\ncargo test -r geometric_medians -- --nocapture\n```\n\nAlternatively, just to get a quick idea of the methods provided and their usage, read the output produced by an [automated test run](https://github.com/liborty/rstats/actions). There are test logs generated for each new push to the github repository. Click the latest (top) one, then `Rstats` and then `Run cargo test` ... The badge at the top of this document lights up green when all the tests have passed and clicking it gets you to these logs as well.\n\nAny compilation errors arising out of `rstats` crate indicate most likely that some of the dependencies are out of date. Issuing `cargo update` command will usually fix this.\n\n## Introduction\n\n`Rstats` has a small footprint. Only the best methods are implemented, primarily with Data Analysis and Machine Learning in mind. They include multidimensional (`nd` or 'hyperspace') analysis, i.e. characterising clouds of n points in space of d dimensions.\n\nSeveral branches of mathematics: statistics, information theory, set theory and linear algebra are combined in this one consistent crate, based on the abstraction that they all operate on the same data objects (here Rust Vecs). The only difference being that an ordering of their components is sometimes assumed (in linear algebra, set theory) and sometimes it is not (in statistics, information theory, set theory).\n\n`Rstats` begins with basic statistical measures, information measures, vector algebra and linear algebra. These provide self-contained tools for the multidimensional algorithms but they are also useful in their own right.\n\n`Non analytical (non parametric) statistics` is preferred, whereby the 'random variables' are replaced by vectors of real data. Probabilities densities and other parameters are in preference obtained from the real data (pivotal quantity), not from some assumed distributions.\n\n`Linear algebra` uses generic data structure `Vec\u003cVec\u003cT\u003e\u003e` capable of representing irregular matrices.\n\n`Struct TriangMat` is defined and used for symmetric, anti-symmetric, and triangular matrices, and their transposed versions, saving memory.\n\nOur treatment of multidimensional sets of points is constructed from the first principles. Some original concepts, not found elsewhere, are defined and implemented here (see the next section).\n\n*Zero median vectors are generally preferred to commonly used zero mean vectors.*\n\nIn n dimensions, many authors 'cheat' by using `quasi medians` (one dimensional (`1d`) medians along each axis). Quasi medians are a poor start to stable characterisation of multidimensional data. Also, they are actually slower to compute than our **gm** ( `geometric median`), as soon as the number of dimensions exceeds trivial numbers.\n\n*Specifically, all such 1d measures are sensitive to the choice of axis and thus are affected by their rotation.*\n\nIn contrast, our methods based on **gm** are axis (rotation) independent. Also, they are more stable, as medians have a maximum possible breakdown point.\n\nWe compute geometric medians by our method `gmedian` and its parallel version `par_gmedian` in trait `VecVec` and their weighted versions `wgmedian` and `par_wgmedian` in trait `VecVecg`. It is mostly these efficient algorithms that make our new concepts described below practical.\n\n### Additional Documentation\n\nFor more detailed comments, plus some examples, see [rstats in docs.rs](https://docs.rs/rstats/latest/rstats). You may have to go directly to the modules source. These traits are implemented for existing 'out of this crate' rust `Vec` type and unfortunately rust docs do not display 'implementations on foreign types' very well.\n\n## New Concepts and their Definitions\n\n* `zero median points` (or vectors) are obtained by moving the origin of the coordinate system to the median (in `1d`), or to the **gm** (in `nd`). They are our alternative to the commonly used `zero mean points`, obtained by moving the origin to the arithmetic mean (in 1d) or to the arithmetic centroid (in `nd`).\n\n* `median correlation` between two 1d sets of the same length.  \nWe define this correlation similarly to Pearson, as cosine of an angle between two normalised samples, interpreted as coordinate vectors. Pearson first normalises each set by subtracting its mean from all components. Whereas we subtract the median (cf. zero median points above). This conceptual clarity is one of the benefits of interpreting a data sample of length `d` as a single vector in `d` dimensional space.\n\n* `gmedian, par_gmedian, wgmedian and par_wgmedian`  \nour fast multidimensional `geometric median` (**gm**) algorithms.\n\n* `madgm` (median of distances from `gm`)  \nis our generalisation of `mad` (**m**edian of **a**bsolute **d**ifferences from median), to n dimensions. `1d` median is replaced in `nd` by **gm**. Where `mad` was a robust measure of 1d data spread, `madgm` becomes a robust measure of `nd` data spread. We define it as: `median(`**|pi-gm|**,`i=1..n)`, where **p1`..`pn** are a sample of n data points, each of which is now a vector.\n\n* `tm_stat`  \n We define our generalized `tm_stat` of a single scalar observation x as: `(x-centre)/spread`, with the recommendation to replace mean by median and `std` by `mad`, whenever possible. Compare with common `t-stat`, defined as `(x-mean)/std`, where `std` is the standard deviation.  \n These are similar to the well known `standard z-score`, except that the central tendency and spread are obtained from the sample (pivotal quantity), rather than from any old assumed population distribution.\n\n* `tm_statistic`  \nwe now generalize `tm_stat` from scalar domain to vector domain of any number of dimensions, defining `tm_statistic` as |**p-gm**|`/madgm`, where **p** is a single observation point in `nd` space. For sample central tendency now serves the `geometric median` **gm** vector and the spread is the `madgm` scalar (see above). The error distance of observation **p** from the median: **|p-gm|**, is also a scalar. Thus the co-domain of `tm_statistic` is a simple positive scalar, regardless of the dimensionality of the vector space in question.\n\n* `contribution`  \none of the key questions of Machine Learning is how to quantify the contribution that each example (typically represented as a member of some large `nd` set) makes to the recognition concept, or outcome class, represented by that set. In answer to this, we define the `contribution` of a point **p** as the magnitude of displacement of `gm`, caused by adding **p** to the set. Generally, outlying points make greater contributions to the `gm` but not as much as to the `centroid`. The contribution depends not only on the radius of **p** but also on the radii of all other existing set points and on their number.\n\n* `comediance`  \nis similar to `covariance`. It is a triangular symmetric matrix, obtained by supplying method `covar` with the geometric median instead of the usual centroid. Thus `zero mean vectors` are replaced by `zero median vectors` in the covariance calculations. The results are similar but more stable with respect to the outliers.\n\n* `outer_hull` is a subset of all zero median points **p**, such that no other points lie outside the normal plane through **p**. The points that do not satisfy this condition are called the `internal` points.\n\n* `inner_hull` is a subset of all zero median points **p**, that do not lie outside the normal plane of any other point. Note that in a highly dimensional space up to all points may belong to both the inner and the outer hulls, as, for example, when they all lie on the same hypersphere.\n\n* `depth` is a measure of likelihood of a zero median point **p** belonging to a data cloud. More specifically, it is the projection onto unit **p** of a sum of unit vectors that lie outside the normal through **p**. For example, all outer hull points have by their definition `depth = 0`, whereas the inner hull points have high values of depth. This is intended as an improvement on Mahalanobis distance which has a similar goal but says nothing about how well enclosed **p** is. Whereas `tm_statistic` only informs about the probability pertaining to the whole cloud, not to its local shape near **p**.\n\n* `sigvec (signature vector)`  \nProportional projections of a cloud of zero median vectors on all hemispheric axis. When a new zero median point **p** needs to be classified, we can quickly estimate how well populated is its direction from **gm**. Similar could be done by projecting all the points directly onto **p** but this is usually impractically slow, as there are typically very many such points. However, `signature_vector` only needs to be precomputed once and is then the only vector to be projected onto **p**.\n\n## Previously Known Concepts and Terminology\n\n* `centroid/centre/mean` of an `nd` set.  \nIs the point, generally non member, that minimises its sum of *squares* of distances to all member points. The squaring makes it susceptible to outliers. Specifically, it is the d-dimensional arithmetic mean. It is sometimes called 'the centre of mass'. Centroid can also sometimes mean the member of the set which is the nearest to the Centre. Here we follow the common usage: Centroid = Centre = Arithmetic Mean.\n\n* `quasi/marginal median`  \nis the point minimising sums of distances separately in each dimension (its coordinates are medians along each axis). It is a mistaken concept which we do not recommend using.\n\n* `Tukey median`  \nis the point maximising `Tukey's Depth`, which is the minimum number of (outlying) points found in a hemisphere in any direction. Potentially useful concept but its advantages over the geometric median are not clear.\n\n* `true geometric median` (**gm**)  \nis the point (generally non member), which minimises the sum of distances to all member points. This is the one we want. It is less susceptible to outliers than the centroid. In addition, unlike quasi median, **gm** is rotation independent.\n\n* `medoid`  \nis the member of the set with the least sum of distances to all other members. Equivalently, the member which is the nearest to the **gm** (has the minimum radius).\n\n* `outlier`  \nis the member of the set with the greatest sum of distances to all other members. Equivalently, it is the point furthest from the **gm** (has the maximum radius).\n\n* `Mahalanobis distance`  \nis a scaled distance, whereby the scaling is derived from the axis of covariances / `comediances` of the data points cloud. Distances in the directions in which there are few points are increased and distances in the directions of significant covariances / `comediances` are decreased. Requires matrix decomposition. Mahalanobis distance is defined as: `m(d) = sqrt(d'inv(C)d) = sqrt(d'inv(LL')d) = sqrt(d'inv(L')inv(L)d)`, \nwhere `inv()` denotes matrix inverse, which is never explicitly computed and ' denotes transposition.  \nLet  `x = inv(L)d` ( and therefore also  `x' = d'inv(L')` ).  \nSubstituting x into the above definition: `m(d) = sqrt(x'x) = |x|.    \nWe obtain x by setting Lx = d and solving by forward substitution.  \nAll these calculations are done in the compact triangular form.\n\n* `Cholesky-Banachiewicz matrix decomposition`  \ndecomposes any positive definite matrix S (often covariance or comediance matrix) into a product of lower triangular matrix L and its transpose L': `S = LL'`. The determinant of S can be obtained from the diagonal of L. We implemented the decomposition on `TriangMat` for maximum efficiency. It is used mainly by `mahalanobis`.\n\n* `Householder's decomposition`  \nin cases where the precondition (positive definite matrix S) for the Cholesky-Banachiewicz decomposition is not satisfied, Householder's (UR) decomposition is often used as the next best method. It is implemented here on our efficient `struct TriangMat`.\n\n* `wedge product, geometric product`  \nproducts of the Grassman and Clifford algebras, respectively. Wedge product is used here to generalize the cross product of two vectors into any number of dimensions, determining the correct sign (sidedness of their common plane).\n\n## Implementation Notes\n\nThe main constituent parts of Rstats are its traits. The different traits are determined by the types of objects to be handled. The objects are mostly vectors of arbitrary length/dimensionality (`d`). The main traits are implementing methods applicable to:\n\n* `Stats`: a single vector (of numbers),\n* `Vecg`: methods operating on two vectors, e.g. scalar product,\n* `Vecu8`: some methods specialized for end-type `u8`,\n* `MutVecg`: some of the above methods, mutating self,\n* `VecVec`: methods operating on n vectors (rows of numbers),\n* `VecVecg`: methods for n vectors, plus another generic argument, typically a vector of n weights, expressing the relative significance of the vectors.\n\nThe traits and their methods operate on arguments of their required categories. In classical statistical parlance, the main categories correspond to the number of 'random variables'.\n\n**`Vec\u003cVec\u003cT\u003e\u003e`** type is used for rectangular matrices (could also have irregular rows).\n  \n**`struct TriangMat`** is used for symmetric / antisymmetric / transposed / triangular matrices and wedge and geometric products. All instances of `TriangMat` store only `n*(n+1)/2` items in a single flat vector, instead of `n*n`, thus almost halving the memory requirements. Their transposed versions only set up a flag `kind \u003e=3` that is interpreted by software, instead of unnecessarily rewriting the whole matrix. Thus saving processing of all transposes (a common operation). All this is put to a good use in our implementation of the matrix decomposition methods.\n\nThe vectors' end types (of the actual data) are mostly generic: usually some numeric type. `Copy` trait bounds on these generic input types have been relaxed to `Clone`, to allow cloning user's own end data types in any way desired. There is no difference for primitive  types.\n\nThe computed results end types are usually `f64`.\n\n## Errors\n\n`Rstats` crate produces custom error `RError`:\n\n```rust\npub enum RError\u003cT\u003e where T:Sized+Debug {\n    /// Insufficient data\n    NoDataError(T),\n    /// Wrong kind/size of data\n    DataError(T),\n    /// Invalid result, such as prevented division by zero\n    ArithError(T),\n    /// Other error converted to RError\n    OtherError(T)\n}\n```\n\nEach of its enum variants also carries a generic payload `T`. Most commonly this will be a `String` message, giving more helpful explanation, e.g.:\n\n```rust\nif dif \u003c= 0_f64 {\n    return Err(RError::ArithError(\"cholesky needs a positive definite matrix\".to_owned())));\n};\n```\n\n`format!(...)` can be used to insert (debugging) run-time values to the payload String. These errors are returned and can then be automatically converted (with `?`) to users' own errors. Some such error conversions are implemented at the bottom of `errors.rs` file and used in `tests.rs`.\n\n There is a type alias shortening return declarations to, e.g.: `Result\u003cVec\u003cf64\u003e,RE\u003e`, where\n\n ```rust\npub type RE = RError\u003cString\u003e;\n```\n\nConvenience functions `nodata_error, data_error, arith_error, other_error` are used to construct and return these errors. Their message argument can be either literal `\u0026str`, or `String` (e.g. constructed by `format!`). They return `ReError\u003cString\u003e` already wrapped up as an `Err` variant of `Result`. cf.:\n\n```rust\nif dif \u003c= 0_f64 {\n    return arith_error(\"cholesky needs a positive definite matrix\");\n};\n```\n\n## Structs\n\n### `struct Params`\n\nholds the central tendency of `1d` data, e.g. any kind of mean, or median, and any spread measure, e.g. standard deviation or 'mad'.\n\n### `struct TriangMat`\n\nholds triangular matrices of all kinds, as described in Implementation section above. Beyond the expansion to their full matrix forms, a number of (the best) Linear Algebra methods are implemented directly on `TriangMat`, in module `triangmat.rs`, such as:\n\n* **Cholesky-Banachiewicz** matrix decomposition: `S = LL'` (where ' denotes the transpose). This decomposition is used by `mahalanobis`,  `determinant`, etc.\n\n* **Mahalanobis Distance** for ML recognition tasks.\n\n* Various operations on `TriangMat`s, including `mult`: matrix multiplication of two triangular or symmetric or antisymmetric matrices in this compact form, without their expansions to full matrices.\n\nAlso, some methods, specifically the covariance/comedience calculations in `VecVec` and `VecVecg` return `TriangMat` matrices. These are positive definite, which makes the most efficient Cholesky-Banachiewicz decomposition applicable to them.\n\nSimilarly, **Householder UR** (M = QR), which is a more general matrix decomposition, also returns `TriangMat`s.\n\n## Quantify Functions (Dependency Injection)\n\nMost methods in `medians` and some in `indxvec` crates, e.g. `find_any` and `find_all`, require explicit closure passed to them, usually to tell them how to quantify input data of any type T into f64. Variety of different quantifying methods can then be dynamically employed.\n\nFor example, in text analysis (`\u0026str` end type), it can be the word length, or the numerical value of its first few letters, or the numerical value of its consonants, etc. Then we can sort them or find their means / medians / spreads under all these different measures. We do not necessarily want to explicitly store all such different values, as input data can be voluminous. It is often preferable to be able to compute any of them on demand, using these closure arguments.\n\nWhen data is already of the required end-type, use the 'dummy' closure:\n\n```rust\n|\u0026f| f\n```\n\nWhen T is a primitive type, such as i64, u64, usize, that can be converted to f64, possibly with some loss of accuracy, use:\n\n```rust\n|\u0026f| f as f64\n```\n\n### `fromop`\n\nWhen T is convertible by an existing custom `From` implementation (and `f64:From\u003cT\u003e, T:Clone` have been duly added everywhere as trait bounds), then simply pass in `fromop`, defined as:\n\n```rust\n/// Convenience From quantification invocation\npub fn fromop\u003cT: Clone + Into\u003cf64\u003e\u003e(f: \u0026T) -\u003e f64 {\n    f.clone().into()\n}|\n```\n\nThe remaining general cases previously required new manual implementations to be written for the (global) `From` trait for each new type and for each different quantification method, plus adding its trait bounds everywhere. Even then, the different implementations of `From` would conflict with each other. Now we can simply implement all the custom quantifications within the closures. This generality is obtained at the price of a small inconvenience: having to supply one of the above closures argument for the primitive types as well.\n\n## Auxiliary Functions\n\n* `fromop`: see above.\n\n* `sumn`: the sum of the sequence `1..n = n*(n+1)/2`. It is also the size of a lower/upper triangular matrix.\n\n* `tm_stat`: (x-centre)/dispersion. Generalised t-statistic in one dimension.\n\n* `unit_matrix`: - generates full square unit matrix.\n\n* `nodata_error, data_error, arith_error, other_error` - construct custom RE errors (see section Errors above).\n\n## Trait Stats\n\nOne dimensional statistical measures implemented for all numeric end types.\n\nIts methods operate on one slice of generic data and take no arguments.\nFor example, `s.amean()?` returns the arithmetic mean of the data in slice `s`.\nThese methods are checked and will report RError(s), such as an empty input. This means you have to apply `?` to their results to pass the errors up, or explicitly match them to take recovery actions, depending on the error variant.\n\nIncluded in this trait are:\n\n* 1d medians (classic, geometric and harmonic) and their spreads\n* 1d means (arithmetic, geometric and harmonic) and their spreads\n* linearly weighted means (useful for time analysis),\n* probability density function (pdf)\n* autocorrelation, entropy\n* linear transformation to [0,1],\n* other measures and basic vector algebra operators\n\nNote that fast implementations of 1d 'classic' medians are, as of version 1.1.0, provided in a separate crate `medians`.\n\n## Trait Vecg\n\nGeneric vector algebra operations between two slices `\u0026[T]`, `\u0026[U]` of any (common) length  (dimensions). Note that it may be necessary to invoke some using the 'turbofish' `::\u003ctype\u003e` syntax to indicate the type U of the supplied argument, e.g.:\n\n```rust\ndatavec.somemethod::\u003cf64\u003e(arg)\n```\n\nMethods implemented by this trait:\n\n* Vector additions, subtractions and products (scalar, Kronecker, outer),\n* Other relationships and measures of difference,\n* Pearson's, Spearman's and Kendall's correlations,\n* Joint pdf, joint entropy, statistical independence (based on mutual information).\n* `Contribution` measure of a point's impact on the geometric median\n\nNote that our `median correlation` is implemented in a separate crate `medians`.\n\nSome simpler methods of this trait may be unchecked (for speed), so some caution with data is advisable.\n\n## Trait MutVecg\n\nA select few of the `Stats` and `Vecg` methods (e.g. mutable vector addition, subtraction and multiplication) are reimplemented under this trait, so that they can mutate `self` in-place. This is more efficient and convenient in some circumstances, such as in vector iterative methods.\n\nHowever, these methods do not fit in with the functional programming style, as they do not explicitly return anything (their calls are statements with side effects, rather than expressions).\n\n## Trait Vecu8\n\nSome vector algebra as above that can be more efficient when the end type happens to be u8 (bytes). These methods have u8 appended to their names to avoid confusion with Vecg methods. These specific algorithms are different to their generic equivalents in Vecg.\n\n* Frequency count of bytes by their values (histogram, pdf, jointpdf)\n* Entropy, jointentropy, independence.\n\n## Trait VecVec\n\nRelationships between n vectors in d dimensions.\nThis (hyper-dimensional) data domain is denoted here as (`nd`). It is in `nd` where the main original contribution of this library lies. True geometric median (gm) is found by fast and stable iteration, using improved Weiszfeld's algorithm `gmedian`. This algorithm solves Weiszfeld's convergence and stability problems in the neighbourhoods of existing set points. Its variant, `par_gmedian`, employs multithreading for faster execution and gives otherwise  the same result.\n\n* centroid, medoid, outliers, gm\n* sums of distances, radius of a point (as its distance from gm)\n* characterisation of a set of multidimensional points by the mean, standard deviation, median and MAD of its points' radii. These are useful recognition measures for the set.\n* transformation to zero geometric median data,\n* multivariate trend (regression) between two sets of `nd` points,\n* covariance and comediance matrices.\n* inner and outer hulls\n\n## Trait VecVecg\n\nMethods which take an additional generic vector argument, such as a vector of weights for computing weighted geometric medians (where each point has its own significance weight). Matrices multiplications.\n\n## Appendix: Recent Releases\n\n* **Version 2.2.12** - Some corrections of Readme.md.\n\n* **Version 2.1.11** - Some minor tidying up of code.\n\n* **Version 2.1.10** - Added `project` of a `TriangMat` to a subspace given by a subspace index.\n\n* **Version 2.1.9** - Added multiplications and more tests for `TriangMat`.\n\n* **Version 2.1.8** - Improved `TriangMat::diagonal()`, restored `TriangMat::determinant()`, tidied up `triangmat` test.\n\n* **Version 2.1.7** - Removed suspect eigen values/vectors computations. Improved 'householder' test.\n\n* **Version 2.1.5** - Added `projection` to trait `VecVecg` to project all self vectors to a new basis. This can be used e.g. for Principal Components Analysis data reduction, using some of the eigenvectors as the new basis.\n\n* **Version 2.1.4** - Tidied up some error processing.\n\n* **Version 2.1.3** - Added `normalize` (normalize columns of a matrix and transpose them to rows).\n\n* **Version 2.1.2** - Added function `project` to project a `TriangMat` to a lower dimensional space of selected dimensions. Removed `rows` which was a duplicate of `dim`.\n\n* **Version 2.1.0** - Changed the type of `mid` argument to covariance methods from U -\u003e f64, making the normal expectation for the type of precise geometric medians explicit. Accordingly, moved `covar` and `serial_covar` from trait `VecVecg` to `VecVec`. This might potentially require changing some `use` declarations in your code.\n\n* **Version 2.0.12** - added `depth_ratio`\n\n* **Version 2.0.11** - removed not so useful `variances`. Tidied up error processing in `vecvecg.rs`. Added to it `serial_covar` and `serial_wcovar` for when heavy loading of all the cores may not be wanted.\n\n* **Version 2.0.9** - Pruned some rarely used methods, simplified `gmparts` and `gmerror`, updated dependencies.\n\n* **Version 2.0.8**' - Changed initial guess in iterative weighted gm methods to weighted mean. This, being more accurate than plain mean, leads to fewer iterations. Updated some dependencies.\n\n* **Version 2.0.7** - Updated to `ran 2.0`.\n\n* **Version 2.0.6** - Added convenience method `medmad` to Stats trait. It packs median and mad into `struct Params`, similarly to `ameanstd` and others. Consequently simplified the printouts in some tests.\n\n* **Version 2.0.5** - Corrected `wsigvec` to also return normalized result. Updated dependency `Medians` to faster version 3.0.1.\n\n* **Version 2.0.4** - Made a corresponding change: `winsideness` -\u003e `wdepth`.\n\n* **Version 2.0.3** - Improved `insideness` to be projection of a sum of unit vectors instead of just a simple count. Renamed it to `depth` to avoid confusion. Also some fixes to `hulls`.\n\n* **Version 2.0.2** - Significantly speeded up `insideness` and added weighted version `winsideness` to `VecVecg` trait.\n\n* **Version 2.0.1** - Added `TriangMat::dim()` and tidied up some comments.\n\n* **Version 2.0.0** - Renamed `MStats` -\u003e `Params` and its variant `dispersion` -\u003e `spread`. This may cause some backwards incompatibilities, hence the new major version. Added 'centre' as an argument to `dfdt`,`dvdt`,`wdvdt`, so that it does not have to be recomputed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliborty%2Frstats","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fliborty%2Frstats","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliborty%2Frstats/lists"}