https://github.com/pescadores/pescador
Stochastic multi-stream sampling for iterative learning
https://github.com/pescadores/pescador
machine-learning nyucds python
Last synced: 4 months ago
JSON representation
Stochastic multi-stream sampling for iterative learning
- Host: GitHub
- URL: https://github.com/pescadores/pescador
- Owner: pescadores
- License: isc
- Created: 2013-12-24T03:56:36.000Z (over 12 years ago)
- Default Branch: main
- Last Pushed: 2024-03-12T15:16:03.000Z (over 2 years ago)
- Last Synced: 2025-10-21T19:54:07.415Z (8 months ago)
- Topics: machine-learning, nyucds, python
- Language: Python
- Homepage: https://pescador.readthedocs.io
- Size: 588 KB
- Stars: 81
- Watchers: 7
- Forks: 11
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Authors: AUTHORS.md
Awesome Lists containing this project
README
pescador
========
[](https://pypi.python.org/pypi/pescador)
[](https://anaconda.org/conda-forge/pescador)
[](https://github.com/pescadores/pescador/actions/workflows/ci.yml)
[](https://codecov.io/gh/pescadores/pescador)
[](https://readthedocs.org/projects/pescador/?badge=latest)
[](https://doi.org/10.5281/zenodo.400700)
Pescador is a library for streaming (numerical) data, primarily for use in machine learning applications.
Pescador addresses the following use cases:
- **Hierarchical sampling**
- **Out-of-core learning**
- **Parallel streaming**
These use cases arise in the following common scenarios:
- Say you have three data sources `(A, B, C)` that you want to sample.
For example, each data source could contain all the examples of a particular category.
Pescador can dynamically interleave these sources to provide a randomized stream `D <- (A, B, C)`.
The distribution over `(A, B, C)` need not be uniform: you can specify any distribution you like!
- Now, say you have 3000 data sources, each of which may contain a large number of samples. Maybe that's too much data to fit in RAM at once.
Pescador makes it easy to interleave these sources while maintaining a small `working set`.
Not all sources are simultaneously active, but Pescador manages the working set so you don't have to.
- If loading data incurs substantial latency (e.g., due to accessing on-disk storage
or pre-processing), this can be a problem.
Pescador can seamlessly move data generation into a background process, so that your main thread can continue working.
Want to learn more? [Read the docs!](http://pescador.readthedocs.org)
Installation
============
Pescador can be installed from PyPI through `pip`:
```
pip install pescador
```
or with `conda` using the `conda-forge` channel:
```
conda install -c conda-forge pescador
```