https://github.com/dallaylaen/stats-logscale-js

Memory efficient, fast approximate statistical analysis tool
https://github.com/dallaylaen/stats-logscale-js

approximate math statistics univariate

Last synced: 5 months ago
JSON representation

Memory efficient, fast approximate statistical analysis tool

Host: GitHub
URL: https://github.com/dallaylaen/stats-logscale-js
Owner: dallaylaen
Created: 2022-03-11T20:10:51.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2024-03-11T20:51:36.000Z (over 2 years ago)
Last Synced: 2026-01-28T23:51:40.479Z (5 months ago)
Topics: approximate, math, statistics, univariate
Language: JavaScript
Homepage:
Size: 841 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: Changes

Awesome Lists containing this project

README

          # stats-logscale

A memory-efficient approximate statistical analysis tool

using logarithmic binning.

![Example: repeated setTimeout(0) execution times](example/images/settimeout-duration-distribution.png)

_Example: repeated setTimeout(0) execution times_

## Description

* data is split into bins (aka buckets),

linear close to zero and logarithmic for large numbers (hence the name),

thus maintaining desired absolute and relative precision;

* can calculate mean, variance, median, moments, percentiles,

cumulative distribution function (i.e. probability that a value is less than x),

and expected values of arbitrary functions over the sample;

* can generate histograms for plotting the data;

* all calculated values are cached. Cache is reset upon adding new data;

* (almost) every function has a "neat" counterpart which rounds the result

to the shortest possible number within the precision bounds.

E.g. `foo.mean() // 1.0100047`, but `foo.neat.mean() // 1.01`;

* is (de)serializable;

* can split out partial data or combine multiple samples into one.

## Usage

Creating the sample container:

```javascript

const { Univariate } = require( 'stats-logscale' );

const stat = new Univariate();

```

**Specifying absolute and relative precision.**

The defaults are 10^-9 and 1.001, respectivele.

Less precision = less memory usage

and faster data querying (but not insertion).

```javascript

const stat = new Univariate({base: 1.01, precision: 0.001});

```

Use _flat_ switch to avoid using logarithmic binning at all:

```javascript

// this assumes the data is just integer numbers

const stat = new Univariate({precision: 1, flat: true});

```

**Adding data points**, wither one by one,

or as _(value, frequency)_ pairs.

Strings are OK (e.g. after parsing user input)

but non-numeric values will cause an exception:

```javascript

stat.add (3.14);

stat.add ("Foo"); // Nope!

stat.add ("3.14 3.15 3.16".split(" "));

stat.addWeighted([[0.5, 1], [1.5, 3], [2.5, 5]]);

```

**Querying data:**

```javascript

stat.count();           // number of data points

stat.mean();            // average

stat.stdev();           // standard deviation

stat.median();          // half of data is lower than this value

stat.percentile(90);    // 90% of data below this point

stat.quantile(0.9);     // ditto

stat.cdf(0.5);          // Cumulative distribution function, which means

                        // the probability that a data point is less than 0.5

stat.moment(power);     // central moment of an integer power

stat.momentAbs(power);  // < |x-| ** power >, power may be fractional

stat.E( x => x\*x );    // expected value of an arbitrary function

```

Each querying primitive has a _"neat"_ counterpart

that rounds its output to the shortest possible

decimal number in the respective bin:

```javascript

stat.neat.mean();

stat.neat.stdev();

stat.neat.median();

```

**Extract partial samples:**

```javascript

stat.clone( { min: 0.5, max: 0.7 } );

stat.clone( { ltrim: 1, rtrim: 1 });

    // cut off outer 1% of data

stat.clone( { ltrim: 1, rtrim: 1, winsorize: true }});

    // ditto but truncate outliers instead of discarding

```

Serialize, deserialize, and combine data from multiple sources

```javascript

const str = JSON.stringify(stat);

// send over the network here

const copy = new Univariate (JSON.parse(str));

main.addWeighted( partialStat.getBins() );

main.addWeighted( JSON.parse(str).bins ); // ditto

```

Create histograms and plot data:

```javascript

stat.histogram({scale: 768, count:1024});

    // this produces 1024 bars of the form

    // [ bar_height, lower_boundary, upper_boundary ]

    // The intervals are consecutive.

    // The bar heights are limited to 768.

stat.histogram({scale: 70, count:20})

    .map( x => stat.shorten(x[1], x[2]) + '\t' + '+'.repeat(x[0]) )

    .join('\n')

    // "Draw" a vertical histogram for text console

    // You'll use PNG in production instead, right? Right?

```

See the [playground](https://dallaylaen.github.io/stats-logscale-js/).

See also [full documentation](https://dallaylaen.github.io/stats-logscale-js/man/Univariate.html).

## Performance

Data inserts are optimized for speed,

and querying is cached where possible.

The script [example/speed.js](example/speed.js) 

can be used to benchmark the module on your system.

Memory usage for a dense sample spanning 6 orders of magnitude

was around 1.6MB in Chromium,

~230KB for the data itself + ~1.2MB for the cache.  

## Bugs

Please report bugs and request features via the 

[github bugtracker](https://github.com/dallaylaen/stats-logscale-js/issues).

## Copyright and license

Copyright (c) 2022-2023 Konstantin Uvarin

This software is free software available under MIT license.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dallaylaen/stats-logscale-js

Awesome Lists containing this project

README