An open API service indexing awesome lists of open source software.

https://github.com/tomkyle/binning

Determine optimal number of bins š’Œ for histogram creation and optimal bin width š’‰ using various statistical methods.
https://github.com/tomkyle/binning

binning data-analysis distributions doanes-rule freedman-diaconis histogram histogram-binning math php-math rice-rule scotts-rule square-root statistics sturges-rule terrell-scotts-rule

Last synced: about 1 month ago
JSON representation

Determine optimal number of bins š’Œ for histogram creation and optimal bin width š’‰ using various statistical methods.

Awesome Lists containing this project

README

          

# tomkyle/binning

[![Composer Version](https://img.shields.io/packagist/v/tomkyle/binning)](https://packagist.org/packages/tomkyle/binning )
[![PHP version](https://img.shields.io/packagist/php-v/tomkyle/binning)](https://packagist.org/packages/tomkyle/binning )
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/tomkyle/binning/php.yml)](https://github.com/tomkyle/binning/actions/workflows/php.yml)
[![Packagist License](https://img.shields.io/packagist/l/tomkyle/binning)](LICENSE.txt)

**Determine the optimal š’Œ number of bins for histogram creation and optimal bin width š’‰ using various statistical methods. Its unified interface includes implementations of well-known binning rules such as:**

- Square Root Rule (1892)
- Sturges’ Rule (1926)
- Doane’s Rule (1976)
- Scott’s Rule (1979)
- Freedman-Diaconis Rule (1981)
- Terrell-Scott’s Rule (1985)
- Rice University Rule

## Requirements

This library requires PHP 8.3 or newer. Support of older versions like [markrogoyski/math-php](https://github.com/markrogoyski/math-php) provides for PHP 7.2+ is not planned.

## Installation

```bash
composer require tomkyle/binning
```

## Usage

The **BinSelection** class provides several methods for determining the optimal number of bins for histogram creation and optimal bin width. You can either use specific methods directly or the general `suggestBins()` and `suggestBinWidth()` methods with different strategies.

### Determine Bin Width

Use the **suggestBinWidth** method to get the *optimal bin width* based on the selected method. The method returns the bin width, often referred to as š’‰, as a float value.

```php
āš ļø May over‐smooth heavily skewed or multi‐modal data when IQR is small. |
| **Sturges’ Rule** | Very simple, works well for roughly normal, moderate-sized datasets.
āš ļø Ignores outliers and underestimates bin count for large or skewed samples. |
| **Rice Rule** | Independent of data shape and easy to compute.
āš ļø Prone to over‐ or under‐smoothing when the distribution is heavy‐tailed or skewed. |
| **Terrell–Scott** | Similar approach as *Rice Rule* but with asymptotically optimal MISE properties; gives more bins than Sturges and adapts better at large š’.
āš ļø Still ignores skewness and outliers. |
| **Square Root Rule** | Simply the square root, so it requires no distributional estimates.
āš ļø May produce too few bins for complex distributions — or too many for very noisy data. |
| **Doane’s Rule** | Extends *Sturges’ Rule* by adding a skewness correction. Improving performance on asymmetric data.
āš ļø Requires estimating the third moment (skewness), which can be unstable for small š’. |
| **Scott’s Rule** | Uses standard deviation to minimize MISE, providing good balance for unimodal, symmetric data.
āš ļø Sensitive to outliers (inflated $\sigma$) and may underperform on skewed distributions. |

## Literature

Rubia, J.M.D.L. (2024):
**Rice University Rule to Determine the Number of Bins.**
Open Journal of Statistics, 14, 119-149.
DOI: [10.4236/ojs.2024.141006](https://doi.org/10.4236/ojs.2024.141006)

Wikipedia:
**Histogram / Number of bins and width**
https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width

## Practical Example

```php
BinSelection::STURGES,
'Rice University Rule' => BinSelection::RICE,
'Terrell-Scott’s Rule' => BinSelection::TERRELL_SCOTT,
'Square Root Rule' => BinSelection::SQUARE_ROOT,
'Doane’s Rule' => BinSelection::DOANE,
'Scott’s Rule' => BinSelection::SCOTT,
'Freedman-Diaconis Rule' => BinSelection::FREEDMAN_DIACONIS,
];

foreach ($methods as $name => $method) {
$bins = BinSelection::suggestBins($measurements, $method);
echo sprintf("%-18s: %2d bins\n", $name, $bins);
}
```

## Error Handling

All methods will throw `InvalidArgumentException` for invalid inputs:

```php
try {
// This will throw an exception
$bins = BinSelection::sturges([]);
} catch (InvalidArgumentException $e) {
echo "Error: " . $e->getMessage();
// Output: "Dataset cannot be empty to apply the Sturges' Rule."
}

try {
// This will throw an exception
$bins = BinSelection::suggestBins($data, 'invalid-method');
} catch (InvalidArgumentException $e) {
echo "Error: " . $e->getMessage();
// Output: "Unknown binning method: invalid-method"
}
```

## Development

### Clone repo and install requirements

```bash
$ git clone git@github.com:tomkyle/binning.git
$ composer install
$ pnpm install
```

### Watch source and run various tests

This will watch changes inside the **src/** and **tests/** directories and run a series of tests:

1. Find and run the according unit test with *PHPUnit*.
2. Find possible bugs and documentation isses using *phpstan*.
3. Analyse code style and give hints on newer syntax using *Rector*.

```bash
$ npm run watch
```

**Run PhpUnit**

```bash
$ npm run phpunit
```