https://github.com/neerajd12/pybatch
Framework to simplify multi-core processing in python.
https://github.com/neerajd12/pybatch
batch-processing multiprocessing python python-3
Last synced: 4 months ago
JSON representation
Framework to simplify multi-core processing in python.
- Host: GitHub
- URL: https://github.com/neerajd12/pybatch
- Owner: neerajd12
- License: gpl-3.0
- Created: 2018-03-11T21:37:08.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-04-14T10:55:36.000Z (about 8 years ago)
- Last Synced: 2026-01-14T10:44:26.936Z (5 months ago)
- Topics: batch-processing, multiprocessing, python, python-3
- Language: Python
- Homepage:
- Size: 32.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pybatch
Framework to simplify multi-core processing in python.
## Installation
pip install pybatch
or Download from
https://pypi.python.org/pypi/pybatch/0.1
## Usage
### Create a holder for your data. Extend DataProxy to represent different types of data.
```python
from pybatch.data_proxy import ListDataProxy
data = ListDataProxy(data=[i for i in range(100)])
```
### Create a Partitioner for your data. You can extend DataPartitioner class to create your custom partitioner.
```python
from pybatch.partitioner import SimpleListPartitioner
partitioner = SimpleListPartitioner(data_proxy=Data, processors=4)
```
### Create a holder for the results. Extend ResultManager to process the result as you wish.
```python
from pybatch.result import ListResultManager
result_manager = ListResultManager()
```
### Process the data on multiple cores
#### Using the JobExecutor API
* Create your data processing function. Use "DEFAULT_PARTITION_KEY" to get data from the kwargs. Example
```python
from pybatch.executor import DEFAULT_PARTITION_KEY
def test_worker(*args, **kwargs):
data = kwargs[DEFAULT_PARTITION_KEY]
# Actual processing
return [i*i for i in data]
```
* Create a JobExecutor instance with your data processing function as worker and call execute on it.
```python
from pybatch.executor import PipedJobExecutor
result = PipedJobExecutor(worker=test_worker, data_partitioner=partitioner,
result_manager=result_manager).execute().show_results()
```
#### Using the "@Parallelize" Decorator
* Create your processing function and decorate it with "@Parallelize". Use "DEFAULT_PARTITION_KEY" to get data from the kwargs
```python
from pybatch.executor import Parallelize, ExecutorCommType, DEFAULT_PARTITION_KEY
@Parallelize(executor_type=ExecutorCommType.pipe, data_partitioner=partitioner, result_manager=result_manager)
def test_worker(*args, **kwargs):
data = kwargs[DEFAULT_PARTITION_KEY]
# Actual processing
return [i*i for i in data]
```
#### Using "@Parallelize" with executor factory to create job executor once and parallelize multiple functions.
* Create a function that returns a JobExecutor and decorate the worker functions with it.
```python
def executor_factory():
return PipedJobExecutor(data_partitioner=partitioner, result_manager=result_manager)
@Parallelize(executor_factory=executor_factory)
def test_worker(*args, **kwargs):
data = kwargs[DEFAULT_PARTITION_KEY]
# Actual processing
return [i*i for i in data]
@Parallelize(executor_factory=executor_factory)
def test_worker1(*args, **kwargs):
data = kwargs[DEFAULT_PARTITION_KEY]
# Actual processing
return [i*i*i for i in data]
```
* Or create an ExecutorFactory class that returns a JobExecutor and decorate the worker functions with it.
```python
from pybatch.executor import ExecutorFactory
class MyExecutorFactory(ExecutorFactory):
def __init__(self, arg1, arg2):
pass
def executor(self):
partitioner = SimpleListPartitioner(processors=4, data_proxy=data)
return PipedJobExecutor(data_partitioner=partitioner, result_manager=result_manager)
@Parallelize(executor_factory=MyExecutorFactory)
def test_worker(*args, **kwargs):
data = kwargs[DEFAULT_PARTITION_KEY]
# Actual processing
return [i*i for i in data]
@Parallelize(executor_factory=MyExecutorFactory)
def test_worker1(*args, **kwargs):
data = kwargs[DEFAULT_PARTITION_KEY]
# Actual processing
return [i*i*i for i in data]
```
* Call the worker function and use the result
```python
for i in test_worker().show_results():
print(i.result)
for i in test_worker1().show_results():
print(i.result)
```
## Samples coming soon.