Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/Romit-Maulik/PyParSVD

An implementation of the 1. Parallel, 2. Streaming, 3. Randomized SVD using MPI4Py
https://github.com/Romit-Maulik/PyParSVD

Last synced: 3 months ago
JSON representation

An implementation of the 1. Parallel, 2. Streaming, 3. Randomized SVD using MPI4Py

Host: GitHub
URL: https://github.com/Romit-Maulik/PyParSVD
Owner: Romit-Maulik
License: mit
Created: 2021-02-25T18:19:24.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2021-05-06T03:13:40.000Z (about 3 years ago)
Last Synced: 2024-01-16T10:46:23.516Z (5 months ago)
Language: Python
Size: 15.2 MB
Stars: 51
Watchers: 5
Forks: 13
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.rst

Lists

awesome-stars - Romit-Maulik/PyParSVD - An implementation of the 1. Parallel, 2. Streaming, 3. Randomized SVD using MPI4Py (Python)
awesome-stars - PyParSVD - Maulik | 54 | (Python)

README

        


  

    

  





  



  





  





  	  





  





# PyParSVD: Python Parallel Singular Value Decomposition

## Table of contents

  * [Description](#description)

  * [Testing](#testing)

  * [References](#references)

  * [License](#license)

## Description

The **PyParSVD** library implements both a serial and a parallel singular value decomposition (SVD). The implementation of the library is conveniently:

1. **Distributed** using `MPI4Py` (for parallel SVD);

2. **Streaming** - data can be shown in batches to update the left singular vectors;

3. **Randomized** - further acceleration of any serial components of the overall algorithm.

The **distributed** computation of the SVD follows [(Wang et al 2016)](#Wang-et-al-2016). The **streaming** algorithm used in this library is from [(Levy and Lindenbaum 1998)](#Levy-and-Lindenbaum-1998),  where the parallel QR algorithm (the TSQR method) required for the streaming feature follows [(Benson et al 2013)](#Benson-et-al-2013). Finally, the **randomized** algorithm follows [(Halko et al 2013)](#Halko-et-al-2011).

The library is organized using a base class, [**pyparsvd/parsvd_base.py**](pyparsvd/parsvd_base.py), that implements methods shared across the two derived classes, [**pyparsvd/parsvd_serial.py**](pyparsvd/parsvd_serial.py), and [**pyparsvd/parsvd_parallel.py**](pyparsvd/parsvd_parallel.py). The former implements the *serial SVD*, while the latter implements the *parallel SVD*. We also provide a module that implements some postprocessing utilities, [**pyparsvd/postprocessing.py**](pyparsvd/postprocessing.py), that can be used as a standalone package or directly called from the derived classes [**pyparsvd/parsvd_serial.py**](pyparsvd/parsvd_serial.py), and [**pyparsvd/parsvd_parallel.py**](pyparsvd/parsvd_parallel.py).

A simple case can be implementated following [**tutorials/basic.py**](tutorial/basic.py), where we generate data for the the 1D Burgers' equation with a pre-written routine pre-stored data from [**tutorials/basic/data/data_splitter.py**](tutorials/basic/data/data_splitter.py). The data generated by the `data_splitter.py` routine has been already splitted into different distributed ranks. In order to run this simple case, you need to run the data generation routine in [**tutorials/basic/data/**](tutorials/basic/data/), as follows:

```bash

python3 data_splitter.py

```

You can then go to [**tutorials/basic/**](tutorials/basic/), and use the following to ensure that there is no shared-memory acceleration by numpy using

```bash

export OPENBLAS_NUM_THREADS=1

```

Following this you may run the serial version of the streaming SVD using

```bash

python3 tutorial_basic_serial.py

```

and a parallel version of the same using

```bash

mpirun -np 6 python3 tutorial_basic_parallel.py

```

This should produce a set of figures, under a folder called *results* similar to the ones below

**Serial** 

 

   

   

   



**Parallel**

 

   

   

   



***Caution***: Due to differences in the parallel and serial versions of the algorithm, singular vectors may be "flipped". 

An orthogonality check is also deployed for an additional sanity check.

The main components of the implementation are as follows

- import of the libraries 

```python

import os

import sys

import numpy as np

# Import library specific modules

sys.path.append(os.path.join("../../"))

from pyparsvd.parsvd_serial   import ParSVD_Serial

from pyparsvd.parsvd_parallel import ParSVD_Parallel

```

- instantiation of the serial and parallel SVD objects can be done respectively as

```python

# Construct SVD objects

SerSVD = ParSVD_Serial(K=10, ff=1.0)

ParSVD = ParSVD_Parallel(K=10, ff=1.0, low_rank=True)

```

We note that the parameter `K` corresponds to the number of modes to truncate, the parameter `ff` is the forget factor, while the parameter `low_rank=True` allows for randomized SVD.

- reading of the data for both the serial and parallel SVD computations

```python

# Path to data

path = os.path.join(CFD, '../../tests/data/')

# Serial data

initial_data_ser = np.load(os.path.join(path, 'Batch_0_data.npy'))

new_data_ser = np.load(os.path.join(path, 'Batch_1_data.npy'))

newer_data_ser = np.load(os.path.join(path, 'Batch_2_data.npy'))

newest_data_ser = np.load(os.path.join(path, 'Batch_3_data.npy'))

# Parallel data

initial_data_par = np.load(os.path.join(path, 'points_rank_' + str(ParSVD.rank) + '_batch_0.npy'))

new_data_par = np.load(os.path.join(path, 'points_rank_' + str(ParSVD.rank) + '_batch_1.npy'))

newer_data_par = np.load(os.path.join(path, 'points_rank_' + str(ParSVD.rank) + '_batch_2.npy'))

newest_data_par = np.load(os.path.join(path, 'points_rank_' + str(ParSVD.rank) + '_batch_3.npy'))

```

- serial SVD computation

```python

# Do first modal decomposition -- Serial

SerSVD.initialize(initial_data_ser)

# Incorporate new data -- Serial

SerSVD.incorporate_data(new_data_ser)

SerSVD.incorporate_data(newer_data_ser)

SerSVD.incorporate_data(newest_data_ser)

```

- parallel SVD computation

```python

# Do first modal decomposition -- Parallel

ParSVD.initialize(initial_data_par)

# Incorporate new data -- Parallel

ParSVD.incorporate_data(new_data_par)

ParSVD.incorporate_data(newer_data_par)

ParSVD.incorporate_data(newest_data_par)

```

- basic postprocessing where we plot the results

```python

# Basic postprocessing

if ParSVD.rank == 0:

	# Save results

	SerSVD.save()

	ParSVD.save()

	# Visualize modes

	SerSVD.plot_1D_modes(filename='serial_1d_modes.png')

	ParSVD.plot_1D_modes(filename='parallel_1d_modes.png')

```

## Parallel IO

**PyParSVD** also comes with parallel-IO capability by virtue of h5py and parallel-HDF5. Ensure adequate availability of requisite libraries - the easiest way is to use conda as follows (otherwise installing can get a bit tricky)

```bash

conda install -c conda-forge "h5py>=2.9=mpi*"

```

Once this step is complete you can run the parallel-IO tutorial in [**tutorials/parallel_io/**](tutorial/parallel_io/) using

```bash

python data_splitter.py

```

from [**tutorials/parallel_io/data/**](tutorials/parallel_io/data/) and then executing

```bash

mpirun -np 6 python tutorial_parallel_io.py

```

from [**tutorials/parallel_io/**](tutorials/parallel_io/). In this example - it is assumed that your data is in h5 format.

## Testing

Regression tests are deployed using Travis CI, that is a continuous intergration framework. 

You can check out the current status of **PyParSVD** [here](https://travis-ci.org/Romit-Maulik/PyParSVD).

IF you want to run tests locally, you can do so by:

```bash

> cd tests/

> mpirun -np 6 python3 -m pytest --with-mpi -v

```

## References

#### (Levy and Lindenbaum 1998) 

*Sequential Karhunen–Loeve Basis Extraction and its Application to Images.* [[DOI](https://ieeexplore.ieee.org/abstract/document/723422)]

#### (Wang et al 2016) 

*Approximate partitioned method of snapshots for POD.* [[DOI](https://www.sciencedirect.com/science/article/pii/S0377042715005774)]

#### (Benson et al 2013)

*Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures.* [[DOI](https://ieeexplore.ieee.org/document/6691583)]

#### (Halko et al 2011) 

*Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.* [[DOI](https://epubs.siam.org/doi/abs/10.1137/090771806)]

  

## How to contribute

Contributions improving code and documentation, as well as suggestions about new features are more than welcome!

The guidelines to contribute are as follows: 

1. open a new issue describing the bug you intend to fix or the feature you want to add.

2. fork the project and open your own branch related to the issue you just opened, and call the branch `fix/name-of-the-issue` if it is a bug fix, or `feature/name-of-the-issue` if you are adding a feature.

3. ensure to use 4 spaces for formatting the code.

4. if you add a feature, it should be accompanied by relevant tests to ensure it functions correctly, while the code continue to be developed.

5. commit your changes with a self-explanatory commit message. 

6. push your commits and submit a pull request. Please, remember to rebase properly in order to maintain a clean, linear git history.

## License

See the [LICENSE](LICENSE.rst) file for license rights and limitations (MIT).