https://github.com/vmchale/emd
Earth mover's distance on Nvidia GPUS
https://github.com/vmchale/emd
Last synced: 9 months ago
JSON representation
Earth mover's distance on Nvidia GPUS
- Host: GitHub
- URL: https://github.com/vmchale/emd
- Owner: vmchale
- License: other
- Created: 2016-07-25T19:45:24.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-09-10T02:48:26.000Z (over 9 years ago)
- Last Synced: 2025-04-04T08:01:52.337Z (9 months ago)
- Language: Haskell
- Size: 13.4 MB
- Stars: 15
- Watchers: 4
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#EMD
Earth mover's distance on Nvidia GPUs
##CUDA
CUDA and relevant NVIDIA drivers have to be installed from their site. This should be done first.
The NVIDIA website https://developer.nvidia.com/cuda-downloads has everything you need.
##Building
To build, simply run
```
./build
```
in the directory `EMD/`.
If you are on a linux machine, this will simply copy some binaries instead of building from source (which takes quite some time due to dependencies)
EMD require `devIL` (an image library) and `imagemagick` to run. If `./build` fails to install these for you, you will have to install them manually.
If './build' fails completely, jump to the Manual Build section.
After this, the haskemd module can be used from any directory. Test it with:
```
python3 test.py
```
##Python
The provided python wrapper takes two (one-dimensional) numpy arrays and computes the EMD:
```
python3
>>>import haskemd
>>>arrs = haskemd.equalize(haskemd.sinksrand(1023),haskemd.sourcesrand(1023))
>>>haskemd.emd(a[0],a[1])
232386
```
`sinksrand(1024)` and `sourcesrand(1024)` just generates test data with 1024 bins, while the `equalize` function makes the distributions have equal mass each.
To try your own, consider the following example:
```
>>>import haskemd
>>>import numpy as np
>>>a1 = np.array([1,2,3,4])
>>>a2 = np.array([4,3,2,1])
>>>haskemd.emd(a1, a2)
6
```
You might be prompted for the sudo password while computing the EMD. This is because of a bug upstream in the code for Haskell Accelerate.
##Ground distance
In our case, the "ground distance" is the hamming distance. Moreover, the matrix for the metric is automatically generated. This means that an array like
```
np.array([0.25,0.5])
```
must have the correct values in the correct places. dij is based on the indices. Here, d01 would mean the distance between the bin containing 0.25 and the bin containing 0.5. In general, we have dij=hammingDistance i j
##Hamming distance
To get the hamming distance from a integer indices, say 3 and 7, we first convert to binary to get 011 and 111. Then we can compute the hamming distance to get 1 (in this case). Hence d37=1
##Stored matrices
After the matrix for a metric is computed, it is stored in the `data/` directory. This means that if you compute an EMD between distributions with 8192 bins, every subsequent calculation between distributions with 8192 bins will be faster. As the number of bins gets larger, it becomes necessary to store these matrices in order to guarantee quick runtime.
##Computers without an NVIDIA GPU
If you want to run the code on your CPU, you can can edit the code in `src/Edmonds.hs` Open it up and the first lines should look like this:
```
{-# LANGUAGE FlexibleContexts #-}
module Edmonds
(
exec) where
import Data.Array.Accelerate as A
import Data.Array.Accelerate.IO
import Data.Array.Accelerate.CUDA
--import Data.Array.Accelerate.Interpreter
import System.Environment
import System.IO.Unsafe (unsafePerformIO)
```
Comment out the line with `Data.Array.Accelerate.CUDA` and uncomment the line with `Data.Array.Accelerate.Interpreter` Your file should now look like this:
```
{-# LANGUAGE FlexibleContexts #-}
module Edmonds
(
exec) where
import Data.Array.Accelerate as A
import Data.Array.Accelerate.IO
--import Data.Array.Accelerate.CUDA
import Data.Array.Accelerate.Interpreter
import System.Environment
import System.IO.Unsafe (unsafePerformIO)
```
Save the file and run `stack install` to build and install it. It will be reasonably fast for up to 1000 bins, and after that it will be noticeably slower than the GPU
##Manual Build
There are three parts to building manually: downloading dependencies, building the Haskell code, and installing the python module.
###Dependencies
The only outside dependencies are: devIL (an image processing library) and imagemagick. The website for devIL is http://openil.sourceforge.net/ and the website for imagemagick is http://www.imagemagick.org/script/binary-releases.php
###Get Stack
Stack is the tool that buils haskell. To install `stack` simply type
```
wget -qO- https://get.haskellstack.org/ | sh
```
or go to haskellstack.org for help if that fails.
###Building the Haskell
To build the Haskell code, type
```
stack setup && stack install
```
in the directory `EMD/`. This may take a long time.
###Installing the python module
To install the python module, simply `cd` into `haskemd/` and then type
```
sudo python3 setup.py install
```
At this point, everything should be working, so run `python3 test.py` to make sure this is the case.