Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/uwdata/fast-kde-benchmarks
Research archive of methods and benchmarks for fast, approximate Gaussian kernel density estimation.
https://github.com/uwdata/fast-kde-benchmarks
Last synced: 7 days ago
JSON representation
Research archive of methods and benchmarks for fast, approximate Gaussian kernel density estimation.
- Host: GitHub
- URL: https://github.com/uwdata/fast-kde-benchmarks
- Owner: uwdata
- Created: 2021-10-19T23:12:55.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2021-10-19T23:29:07.000Z (about 3 years ago)
- Last Synced: 2024-04-16T06:17:39.080Z (7 months ago)
- Language: JavaScript
- Size: 47.9 KB
- Stars: 5
- Watchers: 5
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fast & Accurate Gaussian Kernel Density Estimation
Methods and benchmarks for fast, approximate Gaussian kernel density estimation. This repo is for archiving research results. _Want to use fast KDE in your own projects?_ **See [github.com/uwdata/fast-kde](https://github.com/uwdata/fast-kde).**
This repo contains code (`src/`) and benchmark (`benchmarks/`) scripts for the IEEE VIS 2021 Short Paper [Fast & Accurate Gaussian Kernel Density Estimation](https://idl.cs.washington.edu/papers/fast-kde). All code is written as ESM modules. Running benchmarks requires Node.js version 13.2 or higher.
## Citation
If you use or build on this repo in academic work, please use this citation:
```
@inproceedings{2021-Heer-FastKDE,
title = {Fast \& Accurate Gaussian Kernel Density Estimation},
author = {Jeffrey Heer},
booktitle = {IEEE VIS Short Papers},
year = {2021},
url = {http://idl.cs.washington.edu/papers/fast-kde},
}
```## Build Instructions
To build a bundle (ESM module or minified UMD):
1. Run `yarn` to install dependencies.
2. Run `yarn build` to build the bundles.Compiled bundles will be written to the `dist` directory.
## Benchmark Instructions
1. Run `yarn` to install dependencies.
2. Run `yarn benchmarks` to run benchmarks.
3. Wait until the benchmarks complete...Benchmark result datasets will be written to the `results` directory as JSON files.
## API Documentation
### Density Estimate Helpers
Convenience methods for performing 1D or 2D density estimation.
#
kde.density1d()Creates a new 1D density estimate helper with default settings.
*Example*
```js
// perform 1D estimation with bandwidth = 1 over domain [0, 10]
// use default grid size (512) and method (kdeDeriche1d)
// returns an array of [ { x, value }, ... ] points
kde.density1d()
.bandwidth(1)
.extent([0, 10])
.points([1, 2, 5, 5, 6, 9])
```#
density1d(data)Computes a 1D Gaussian kernel density estimate using the current settings for the given *data* and returns a Float64Array of gridded density values. To instead produce an array of objects containing coordinate values and density estimates, use [density1d.points()](#density1d_points).
#
density1d.points(data)Computes a 1D Gaussian kernel density estimate using the current settings for the given *data* and returns an array of objects containing the grid coordinate value (`x`) and density value (`value`).
#
density1d.x([x])Get or set the *x* coordinate getter for the input data. Defaults to the identity function, which assumes a flat array of values.
#
density1d.grid([grid])Get or set the *grid* function for binning the input data into a one-dimensional grid, such as [grid1d_linear](#grid1d_linear) (the default) or [grid1d_simple](#grid1d_simple).
#
density1d.size([size])Get or set the *size* (default 512) of the binned data grid.
#
density1d.bandwidth([bandwidth])Get or set the *bandwidth* (standard deviation) of the Gaussian kernel. If set to `null` or zero (the default), the [normal reference density](#nrd) heuristic will be used to select a bandwidth automatically.
#
density1d.extent([extent])Get or set the *extent* ([x0, x1]) of the data domain over which to perform density estimation. Any data points outside this extent will be ignored. The value defaults to [0, 1]. Note that the estimation helper does _not_ automatically select a domain based on the input data.
#
density1d.method([method])Get or set the density estimation *method* to use. One of [kdeDeriche1d](#kdeDeriche1d) (the default), [kdeBox1d](#kdeBox1d), [kdeExtBox1d](#kdeExtBox1d), or [kdeCDF1d](#kdeCDF1d).
#
kde.density2d()Creates a new 2D density estimate helper with default settings.
*Example*
```js
// perform 2D estimation with bandwidth = 1 over domain [[0, 10], [0, 10]]
// use default grid size ([256, 256]) and method (kdeDeriche2d)
// returns an array of [ { x, y, value }, ... ] points
kde.density2d()
.bandwidth([1, 1])
.extent([[0, 10], [0, 10]])
.points([[1, 1], [1, 2], [5, 4], [5, 3], [6, 2], [8, 7]])
```#
density2d(data)Computes a 2D Gaussian kernel density estimate for the given *data* using the current settings and returns a Float64Array of gridded density values. To instead produce an array of objects containing coordinate values and density estimates, use [density2d.points()](#density2d_points).
#
density2d.points(data)Computes a 2D Gaussian kernel density estimate for the given *data* using the current settings and returns an array of objects containing the grid coordinate (`x`, `y`) and density (`value`) values.
#
density2d.x([x])Get or set the *x* coordinate getter for the input data. Defaults to index 0 (i.e., the first element if individual data points are provided as arrays).
#
density2d.y([y])Get or set the *y* coordinate getter for the input data. Defaults to index 1 (i.e., the second element if individual data points are provided as arrays).
#
density2d.grid([grid])Get or set the *grid* function for binning the input data into a two dimensional grid, such as [grid2d_linear](#grid2d_linear) (the default) or [grid2d_simple](#grid2d_simple).
#
density2d.size([size])Get or set the *size* (default [256, 256]) of the binned data grid.
#
density2d.bandwidth([bandwidth])Get or set the *bandwidth*s (standard deviations) of the Gaussian kernels as a two element array. If an entry is set to `null` or zero (the defaults), the [normal reference density](#nrd) heuristic will be used to select a bandwidth automatically for that dimension.
#
density2d.extent([extent])Get or set the *extent*s ([[x0, x1], [y0, y1]]) of the data domain over which to perform density estimation. Any data points outside this extent will be ignored. The value defaults to [[0, 1], [0, 1]]. Note that the estimation helper does _not_ automatically select domains based on the input data.
#
density2d.method([method])Get or set the density estimation *method* to use. One of [kdeDeriche2d](#kdeDeriche2d) (the default), [kdeBox2d](#kdeBox2d), [kdeExtBox2d](#kdeExtBox2d), or [kdeCDF2d](#kdeCDF2d).
### 1D Density Estimation Methods
#
kde.kdeCDF1d(data, extent, size, bandwidth)Computes a 1D Gaussian kernel density estimate for an array of numeric *data* values via direct calculation of the Gaussian cumulative distribution function. This method can be very slow in practice and is provided only for comparison and testing purposes; see [kdeDeriche1d](#kdeDeriche1d) instead. The *extent* parameter is an [x0, x1] array indicating the data domain over which to compute the estimates. Any data points lying outside this domain will be ignored. The *size* parameter indicates the number of bins (grid steps) to use. The *bandwidth* parameter indicates the width (standard deviation) of the kernel function. Returns a Float64Array of gridded density estimates.
#
kde.kdeBox1d(data, extent, size, bandwidth, grid1d)Computes a 1D Gaussian kernel density estimate for an array of numeric *data* values using the standard box filtering method. This method is not recommended for normal use, see [kdeDeriche1d](#kdeDeriche1d) instead. The *extent* parameter is an [x0, x1] array indicating the data domain over which to compute the estimates. Any data points lying outside this domain will be ignored. The *size* parameter indicates the number of bins (grid steps) to use. The *bandwidth* parameter indicates the width (standard deviation) of the kernel function. The *grid1d* parameter indicates the binning method to use, such as [grid1d_linear](#grid1d_linear) or [grid1d_simple](#grid1d_simple). Returns a Float64Array of gridded density estimates.
#
kde.kdeExtBox1d(data, extent, size, bandwidth, grid1d)Computes a 1D Gaussian kernel density estimate for an array of numeric *data* values using the extended box filtering method, which smooths away the quantization error of standard box filtering. This method is not recommended for normal use, see [kdeDeriche1d](#kdeDeriche1d) instead. The *extent* parameter is an [x0, x1] array indicating the data domain over which to compute the estimates. Any data points lying outside this domain will be ignored. The *size* parameter indicates the number of bins (grid steps) to use. The *bandwidth* parameter indicates the width (standard deviation) of the kernel function. The *grid1d* parameter indicates the binning method to use, such as [grid1d_linear](#grid1d_linear) or [grid1d_simple](#grid1d_simple). Returns a Float64Array of gridded density estimates.
#
kde.kdeDeriche1D(data, extent, size, bandwidth, grid1d)Computes a 1D Gaussian kernel density estimate for an array of numeric *data* values using Deriche's recursive filter approximation. This method is recommended for normal use. The *extent* parameter is an [x0, x1] array indicating the data domain over which to compute the estimates. Any data points lying outside this domain will be ignored. The *size* parameter indicates the number of bins (grid steps) to use. The *bandwidth* parameter indicates the width (standard deviation) of the kernel function. The *grid1d* parameter indicates the binning method to use, such as [grid1d_linear](#grid1d_linear) or [grid1d_simple](#grid1d_simple). Returns a Float64Array of gridded density estimates.
### 2D Density Estimation Methods
#
kde.kdeCDF2d(data, extent, size, bandwidth)Computes a 2D Gaussian kernel density estimate for two arrays of numeric *data* values via direct calculation of the Gaussian cumulative distribution function. This method can be very slow in practice and is provided only for comparison and testing purposes; see [kdeDeriche2d](#kdeDeriche2d) instead. The input *data* should be an array with two entries: a numeric x-value array and a numeric y-value array. The *extent* parameter should be an [[x0, x1], [y0, y1]] array of data domain extents. The *size* parameter should be a [sizeX, sizeY] array indicating the grid dimensions. The *bandwidth* parameter should be a [bandwidthX, bandwidthY] array. Returns a Float64Array of gridded density estimates.
#
kde.kdeBox2d(data, extent, size, bandwidth, grid2d)Computes a 2D Gaussian kernel density estimate for two arrays of numeric *data* values using the standard box filtering method. This method is not recommended for normal use, see [kdeDeriche2d](#kdeDeriche2d) instead. The input *data* should be an array with two entries: a numeric x-value array and a numeric y-value array. The *extent* parameter should be an [[x0, x1], [y0, y1]] array of data domain extents. The *size* parameter should be a [sizeX, sizeY] array indicating the grid dimensions. The *bandwidth* parameter should be a [bandwidthX, bandwidthY] array. The *grid2d* parameter indicates the binning method to use, such as [grid2d_linear](#grid2d_linear) or [grid2d_simple](#grid2d_simple). Returns a Float64Array of gridded density estimates.
#
kde.kdeExtBox2d(data, extent, size, bandwidth, grid2d)Computes a 2D Gaussian kernel density estimate for two arrays of numeric *data* values using the extended box filtering method, which smooths away the quantization error of standard box filtering. This method is not recommended for normal use, see [kdeDeriche2d](#kdeDeriche2d) instead. The input *data* should be an array with two entries: a numeric x-value array and a numeric y-value array. The *extent* parameter should be an [[x0, x1], [y0, y1]] array of data domain extents. The *size* parameter should be a [sizeX, sizeY] array indicating the grid dimensions. The *bandwidth* parameter should be a [bandwidthX, bandwidthY] array. The *grid2d* parameter indicates the binning method to use, such as [grid2d_linear](#grid2d_linear) or [grid2d_simple](#grid2d_simple). Returns a Float64Array of gridded density estimates.
#
kde.kdeDeriche1D(data, extent, size, bandwidth, grid2d)Computes a 2D Gaussian kernel density estimate for two arrays of numeric *data* values using Deriche's recursive filter approximation. This method is recommended for normal use. The input *data* should be an array with two entries: a numeric x-value array and a numeric y-value array. The *extent* parameter should be an [[x0, x1], [y0, y1]] array of data domain extents. The *size* parameter should be a [sizeX, sizeY] array indicating the grid dimensions. The *bandwidth* parameters should be a [bandwidthX, bandwidthY] array. The *grid2d* parameter indicates the binning method to use, such as [grid2d_linear](#grid2d_linear) or [grid2d_simple](#grid2d_simple). Returns a Float64Array of gridded density estimates.
### Utility Methods
Utility methods for grid construction and bandwidth suggestion.
#
kde.grid1d_linear(data, size, init, scale[, offset])Bins a set of numeric *data* values into a 1D output grid of size *size*, using linear interpolation of point mass across adjacent bins. The *init* value indicates the minimum grid value for the data domain. The *scale* parameter is a numeric value for scaling data values to a [0, 1] domain. The optional *offset* parameter (default 0) indicates the number of bins by which to offset the start of the data domain, in the case of padded grids with extra spacing. Returns a Float64Array of gridded values.
#
kde.grid1d_linear(data, size, init, scale[, offset])Bins a set of numeric *data* values into a 1D output grid of size *size*, where the full weight of a point is assigned to the nearest enclosing bin. The *init* value indicates the minimum grid value for the data domain. The *scale* parameter is a numeric value for scaling data values to a [0, 1] domain. The optional *offset* parameter (default 0) indicates the number of bins by which to offset the start of the data domain, in the case of padded grids with extra spacing. Returns a Float64Array of gridded values.
#
kde.grid2d_linear(data, size, init, scale, offset)Bins a set of numeric *data* values into a 2D output grid of size *size*, using linear interpolation of point mass across adjacent bins. The input *data* should be an array with two entries: a numeric x-value array and a numeric y-value array. The *size* parameter sets the grid dimensions and should be an [sizeX, sizeY] array containing two integers. The *init* parameter ([initX, initY]) indicates the minimum grid values for the data domains. The *scale* parameter ([scaleX, scaleY]) provides numeric values for scaling data values to a [0, 1] domain. The *offset* parameter ([offsetX, offsetY]) indicates the number of bins by which to offset the start of the data domains, in the case of padded grids with extra spacing. Returns a Float64Array of gridded values.
#
kde.grid2d_simple(data, size, init, scale, offset)Bins a set of numeric *data* values into a 2D output grid of size *size*, where the full weight of a point is assigned to the nearest enclosing bin. The input *data* should be an array with two entries: a numeric x-value array and a numeric y-value array. The *size* parameter sets the grid dimensions and should be an [sizeX, sizeY] array containing two integers. The *init* parameter ([initX, initY]) indicates the minimum grid values for the data domains. The *scale* parameter ([scaleX, scaleY]) provides numeric values for scaling data values to a [0, 1] domain. The *offset* parameter ([offsetX, offsetY]) indicates the number of bins by which to offset the start of the data domains, in the case of padded grids with extra spacing. Returns a Float64Array of gridded values.
#
kde.nrd(data)Calculates a suggested bandwidth for an array of numeric *data* values, using Scott's normal reference distribution (NRD) heuristic.