https://github.com/tupol/online-stats
Online statistics implementations, including average, variance and standard deviation; exponentially weighted versions as well.
https://github.com/tupol/online-stats
covariance exponential-moving-average exponential-moving-variance kurtosis library online-stats scala skewness variance
Last synced: 4 months ago
JSON representation
Online statistics implementations, including average, variance and standard deviation; exponentially weighted versions as well.
- Host: GitHub
- URL: https://github.com/tupol/online-stats
- Owner: tupol
- License: mit
- Created: 2018-01-25T13:35:46.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2022-10-26T05:59:57.000Z (over 3 years ago)
- Last Synced: 2025-08-30T12:27:28.881Z (10 months ago)
- Topics: covariance, exponential-moving-average, exponential-moving-variance, kurtosis, library, online-stats, scala, skewness, variance
- Language: Scala
- Homepage:
- Size: 2.86 MB
- Stars: 10
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# online-stats #
[](https://mvnrepository.com/artifact/org.tupol/online-stats)
[](https://github.com/tupol/online-stats/blob/master/LICENSE)
[](https://travis-ci.com/tupol/online-stats)
[](https://codecov.io/gh/tupol/online-stats)
[](https://www.javadoc.io/doc/org.tupol/online-stats_2.11)
[](https://gitter.im/online-stats/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[](https://twitter.com/_tupol)
## Scope ##
Naive implementation of a few online statistical algorithms.
The idea behind this implementation is to be used as a tool for stateful streaming computations.
Algos covered so far:
- 4 statistical moments and the derived features:
- average
- variance and standard deviation
- skewness
- kurtosis
- covariance
- exponentially weighted moving averages and variance
Algos to be researched:
- exponentially weighted moving skewness
- exponentially weighted moving kurtosis
Using a more formal and mature library like **[Apache Commons Math](http://commons.apache.org/proper/commons-math/)**
is probably a better idea for production applications, but this is also tested against it.
## Description ##
The main concepts introduced in this library are the `Stats`, `EWeightedStats` (exponentially
weighted stats), `VectorStats` and `Covariance`. Each of them can be composed using either the
`append` or the `|+|` functions.
For example, if we have a sequence of numbers, we can compute the statistics like this:
```scala
val xs1 = Seq(1.0, 3.0)
val stats1: Stats = xs1.foldLeft(Stats.Nil)((s, x) => s |+| x)
val xs2 = Seq(5.0, 7.0)
val stats2: Stats = xs2.foldLeft(Stats.Nil)((s, x) => s |+| x)
val totalStats = stats1 |+| stats2
val newStats = totalStats |+| 4.0
```
The `Stats` type with the `|+|` operation also form a *monoid*, since `|+|` has an *identity*
(unit) element, `Stats.Nil`, and it is *associative*.
Also the `|+|` operation is also *commutative*, which makes appealing for distributed computing
as well.
Same goes for `VectorStats` and `Covariance`.
`EWeightedStats` is an exception for now, as two `EWeightedStats` instances can not be composed.
However, the `|+|` works between an `EWeightedStats` instance and a double.
## Complexity ##
| Feature | Space Complexity (*O*) | Time Complexity (*O*) |
| ------------------------------- | :--------------------: | :-------------------: |
| Count, Sum, Min, Max | ***O***(1) (1 * MU) | ***O***(1) |
| Average | ***O***(1) (2 * MU) | ***O***(1) |
| Variance, Standard deviation | ***O***(1) (3 * MU) | ***O***(1) |
| Skewness | ***O***(1) (4 * MU) | ***O***(1) |
| Kurtosis | ***O***(1) (5 * MU) | ***O***(1) |
| Exponentially weighted average | ***O***(1) (2 * MU) | ***O***(1) |
| Exponentially weighted variance | ***O***(1) (2 * MU) | ***O***(1) |
*MU*: Memory Unit, e.g. Int: 4 bytes, Double 8: bytes
## Demos and Examples ##
The [`streaming-anomalies-demos`](https://github.com/tupol/streaming-anomalies-demos) project was created to explore and demonstrate some basic use cases for the `online-stats` library.
## References ##
- [*"Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments"* by Philippe Pebay](https://digital.library.unt.edu/ark:/67531/metadc837537/m1/7/?utm_source=email&utm_medium=client&utm_content=ark_sidebar&utm_campaign=ark_permanent)(http://prod.sandia.gov/techlib/access-control.cgi/2008/086212.pdf)
- [*"Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights"* by Philippe Pebay, Timothy B. Terriberry, Hemanth Kolla, Janine Bennett4](https://zenodo.org/record/1232635/files/article.pdf)
- [*"The Exponentially Weighted Moving Variance"* by J. F. Macgregor and T. J. Harris](https://www.tandfonline.com/doi/abs/10.1080/00224065.1993.11979433)
- [*"Incremental calculation of weighted mean and variance"* by Tony Finch, February 2009](https://fanf2.user.srcf.net/hermes/doc/antiforgery/stats.pdf)
- [Bessel Correction](https://en.wikipedia.org/wiki/Bessel%27s_correction)
- [Skewness](https://en.wikipedia.org/wiki/Skewness)
- [Kurtosis](https://en.wikipedia.org/wiki/Kurtosis)
- [Pearson's Correlation Coefficient](https://en.wikipedia.org/wiki/Correlation_and_dependence#Pearson's_product-moment_coefficient)