https://github.com/fandreuz/parallel-mapped-distance-matrix
Parallel mapped distance matrix with NumPy and Numba
https://github.com/fandreuz/parallel-mapped-distance-matrix
hacktoberfest hpc numba numpy
Last synced: 4 months ago
JSON representation
Parallel mapped distance matrix with NumPy and Numba
- Host: GitHub
- URL: https://github.com/fandreuz/parallel-mapped-distance-matrix
- Owner: fandreuz
- License: mit
- Created: 2022-10-08T22:32:22.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2025-02-17T15:50:55.000Z (over 1 year ago)
- Last Synced: 2025-05-07T20:36:55.863Z (about 1 year ago)
- Topics: hacktoberfest, hpc, numba, numpy
- Language: Python
- Homepage:
- Size: 43.9 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Parallel MDM
## Mapped distance matrix
The Mapped Distance Matrix (MDM) of two sets $\mathcal{X}, \mathcal{Y}$ of
n-dimensional points is an algebraic structure which is defined in general as
follows, given a mapping $f$:
$$\mathbf{M}(\mathcal{X}, \mathcal{Y}, f)\_{i,j} := f(\Vert \mathcal{X}\_i - \mathcal{Y}\_j\Vert)$$
where $\Vert \cdot \Vert$ is an appropriate distance notion on the space of
definition of $\mathcal{X}$ and $\mathcal{Y}$.
The problem might be augmented by weighting the contributions with a matrix
of weights $\mathbf{W}$; the updated definition is then:
$$\mathbf{M}(\mathcal{X}, \mathcal{Y}, f)\_{i,j} := \mathbf{W}_{i,j} f(\Vert \mathcal{X}\_{i} - \mathcal{Y}\_{j}\Vert)$$
A particularly popular form of the problem (which is also what we treat in this
repository) occurs when weights are defined individually for the members of
$\mathcal{Y}$ (i.e. the columns of $\mathbf{W}$ are taken constants):
$$\mathbf{M}(\mathcal{X}, \mathcal{Y}, f)\_{i,j} := \mathbf{W}\_{j} f(\Vert \mathcal{X}\_{i} - \mathcal{Y}\_{j}\Vert)$$
### A notable case: uniform grid
In general $\mathcal{X}, \mathcal{Y}$ identify two general sets of points. A
few applications allow more assumptions on the two sets. For instance,
$\mathcal{X}$ might be taken to be an uniform grid. In this case a few
interesting optimizization can be taken into account for the computation of the
matrix.
### More assumptions
Practical applications usually require huge sets of points, which causes
memory errors on commonly used devices. This is why it's preferrable to
compute the vector $\tilde{\mathbf{M}}$ defined below instead of $\mathbf{M}$:
$$\tilde{\mathbf{M}}\_{i} := \sum\_{j} \mathbf{M}\_{i,j}$$
For most use cases this is enough.
## Roadmap
- Algorithms
- [x] Uniform grid algorithm
- [x] Scattered points algorithm
- [ ] Fourier-transfor based algorithm
- [ ] Backends
- [ ] NumPy/Numba
- [ ] PyTorch
- [ ] JAX(?)
- [ ] Parallelization
- [x] Multithreading/Multiprocessing
- [ ] GPU w/ PyTorch
- [ ] GPU w/ JAX
- [ ] CUDA kernels(?)
- [ ] Tests
- [ ] Documentation
- [ ] Benchmark (+comparison with competitors)
- [ ] CPU
- [ ] GPU
- [ ] Several different bin sizes
- [ ] `pts_per_future`
- [ ] Future
- [ ] Periodicity
- [ ] More general about distance definitions