Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ryancswallace/npdb
Parallel NumPy-like interface for large n-dimensional arrays on disk.
https://github.com/ryancswallace/npdb
mapreduce-designpatterns numpy numpy-arrays parallel-computing python
Last synced: about 1 month ago
JSON representation
Parallel NumPy-like interface for large n-dimensional arrays on disk.
- Host: GitHub
- URL: https://github.com/ryancswallace/npdb
- Owner: ryancswallace
- License: mit
- Created: 2017-12-02T18:12:08.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2017-12-10T21:13:09.000Z (about 7 years ago)
- Last Synced: 2024-11-13T17:55:15.741Z (3 months ago)
- Topics: mapreduce-designpatterns, numpy, numpy-arrays, parallel-computing, python
- Language: Python
- Homepage:
- Size: 75.2 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# npdb: Large Parallelized NumPy Arrays
[](https://badge.fury.io/py/npdb)
[](https://opensource.org/licenses/MIT)
[](https://travis-ci.org/ryancwallace/npdb)
[](https://codecov.io/gh/ryancwallace/npdb)
[](http://npdb.readthedocs.io/en/latest/?badge=latest)[**Docs**](https://npdb.readthedocs.io)
| [**Install Guide**](https://npdb.readthedocs.io/en/latest/install.html)
| [**Tutorial**](https://npdb.readthedocs.io/en/latest/tutorial.html)
| [**Examples**](https://github.com/ryancwallace/npdb/tree/master/algs)npdb is an implementation of large disk-stored NumPy-compatible n-dimenstional arrays that may exceed available memory. npdb implements the core multi-dimensional array class `npdb.dbarray`, which supports persistent binary storage and distributed batch processed operations. `npdb.dbarray` supports a subset of the `numpy.ndarray` interface.
## Background
The `numpy.memmap` class also supports arrays stored on disk but has several limitations. Arrays are stored in a single file, and the size of the file can not exceed 2GB on 32-bit systems. This implementation both restricts the size of data and disallows distributed storage. On the other hand, `numpy.memmap` objects support the entire `numpy.ndarray` interface.The npdb library strikes a different balance--array sizes are constrained only by available disk space and can be distributed across multiple files. The cost of this capability is that a limited subset of the numpy interface is supported.
## Example
```python
import npdb as nd# create on disk a 3D array of floats of lengths 100
db_arr = npdb.dbarray((100,100,100), float)# slice from array
```
## Installation
You can install npdb using pip by running```
$ pip install npdb
```## Testing
If unittest is installed, tests can be run after installation with```
$ python -m unittest discover
```## License
MIT License (see `LICENSE`). Copyright (c) 2017 Ryan Wallace.## Authors
Ryan Wallace. [email protected].# IN PRE-ALHPA DEV
TODO:---infrastruture---
* autodoc for reference
* host
* link examples, install guide, docs, tutorial---core content---
* set minimum size before spill over onto disk
* better handling of open(file) re exceptions, remove file on delete
* magic methods
* indexing location, module names?
* creation: basic - npdb.empty(), from data - npdb.array()
* free dbarray? parameter to save explicitly, or paramater for persistence? overload del
* keep track of what's in memory to avoid repetitive pulls
* default size params, data dir
* rewrite arraymap and indexing in Cython/C/numpy?
* compression