Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pnnl/ddks

A high-dimensional Kolmogorov-Smirnov distance for comparing high dimensional distributions
https://github.com/pnnl/ddks

machine-learning physics statistics

Last synced: about 2 months ago
JSON representation

A high-dimensional Kolmogorov-Smirnov distance for comparing high dimensional distributions

Host: GitHub
URL: https://github.com/pnnl/ddks
Owner: pnnl
License: other
Created: 2021-06-24T14:50:30.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2022-09-20T14:27:38.000Z (over 2 years ago)
Last Synced: 2024-10-29T10:55:34.604Z (3 months ago)
Topics: machine-learning, physics, statistics
Language: Jupyter Notebook
Homepage:
Size: 2.15 MB
Stars: 23
Watchers: 5
Forks: 6
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# ddKS - a d-dimensional Kolmogorov-Smirnov Test

*Alex Hagen¹, Shane Jackson¹, James Kahn², Jan Strube¹, Isabel Haide², Karl Pazdernik¹, and Connor Hainje¹*

¹: Pacific Northwest National Laboratory,
²: Karlsruhe Institute of Technology

This code accompanies our paper submitted to IEEE Transactions on
Pattern Analysis and Machine Intelligence titled "Accelerated Computation of a
High Dimensional Kolmogorov-Smirnov Distance" ([arXiv](https://arxiv.org/abs/2106.13706)).

As of 6/25/2021 there are 3 methods implemented:

* ddKS - d-dimensional KS test caclulated per
* Variable splitting of space (all points, subsample, grid spacing)
* rdKS - ddKS approximation using distance from (d+1) corners
* vdKS - ddKS approximation calculating ddks distance between voxels instead of points

# Quickstart

Installation of `ddks` should be pretty easy, simple run

```bash
pip install git+https://github.com/pnnl/DDKS
```

or, if you want to develop on DDKS, simply clone this repository into a safe
spot on your computer and run

```bash
pip install -e .
```

from the top level of the repository.

Then, you can get started used the
repository by starting a `ddks` object and performing the distance calculation
on any pair of torch tensors that are `sample_size` x `dimension`.

```python
import torch
import ddks

p = torch.rand((100, 3))
t = torch.rand((50, 3))

calculation = ddks.methods.ddKS()
distance = calculation(p, t)
print(f"The ddKS distance is {distance}")
```

To operate on GPU, all you need to do is move the tensors to the device before
calculation:

```python
p = torch.rand((100, 3)).to('cuda:0')
t = torch.rand((50, 3)).to('cuda:0')

calculation = ddks.methods.ddKS()
distance = calculation(p, t)
```

If you want to use a different accelerated method, simply use
`ddks.methods.rdKS` or `ddks.methods.vdKS`. Note that rdKS and vdKS cannot use
GPU.

# Package Structure:
1. methods - Callable classes for xdks methods [x=d,r,v]
1. data - Contains several data generators to play around with
1. run_scripts - Contains an example run script
1. Unit_tests - Contains unit tests for repo