https://github.com/salpreh/transpydata

A minimalist framework for managing migrations
https://github.com/salpreh/transpydata

etl framework migrations python tool

Last synced: about 1 month ago
JSON representation

A minimalist framework for managing migrations

Host: GitHub
URL: https://github.com/salpreh/transpydata
Owner: salpreh
License: apache-2.0
Created: 2020-12-08T19:17:41.000Z (over 5 years ago)
Default Branch: develop
Last Pushed: 2021-05-30T23:01:24.000Z (about 5 years ago)
Last Synced: 2026-06-06T09:22:14.945Z (about 1 month ago)
Topics: etl, framework, migrations, python, tool
Language: Python
Homepage:
Size: 87.9 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # TransPyData

[![PyPI version](https://badge.fury.io/py/transpydata.svg)](https://badge.fury.io/py/transpydata)

[![PyPI version](https://img.shields.io/github/license/salpreh/transpydata.svg)](https://img.shields.io/github/license/salpreh/transpydata.svg)

**A minimal framework for managing migrations**

---

## Overview

TransPyData implements a generic pipeline to perform migrations. It has 2 main components. First one is `TransPy` class, which executes the migration pipeline according to a configuration. Second the _data services_ implementations (`IDataInput`, `IDataProcess` and `IDataOutput`), this services manages how data is gathered, processed and sent to the new destination.

### TransPy

The `TransPy` class manages the migration pipeline. It needs to be provided with an instance of: 

- `IDataInput`: Manages the gathering of source data.

- `IDataProcess`: Manages data transformation and filtering prior to pass it to the data output.

- `IDataOutput`: Manages data sending to the new destination.

_**NOTE**: Data services overview below_

Apart from the data services there are other optional configurations:

```python

trans_py = TransPy()

config = {

  'datainput_source': [], # If working with single record pipeline this should be an iterable of data to feed IDataInput

  'datainput_by_one': False, # Enable single record pipeline on input

  'dataprocess_by_one': False, # Enable single record pipeline on processing

  'dataoutput_by_one': False, # Enable single record pipeline on output

}

trans_py.configure(config)

```

The values in the snippet are the defaults, so by default the migration will move all migration data through the pipeline at once.

#### All processing mode

When all data services have the "_by\_one_" flag to `False` the migration will move all data at once through the pipeline. So the `TransPy` instance will call the method `get_all` of `IDataInput` configured to get all input data, with the response will call `process_all` of `IDataProcess`, and with the response of `IDataProcess` will call `send_all` of `IDataOutput`. Finally a list with `IDataOutput` results is returned by `TransPy`.

#### Single record mode

If "_by\_one_" flags are `True` the data are "_queried_" by one and moved through all the pipeline. The `IDataOutput` return are accumulated and returned as list at the end of the processing, so the `TransPy` return type is the same.

There are some additional cases, what if ***datainput*** and ***dataprocess*** are in "_by\_one_" mode and dataoutput not? In this case the data is gathered and processed one by one, at the end of processing (`IDataProcess`) the results are accumulated and the `IDataOutput` is called with all data. Similar case when ***dataprocess*** and ***dataoutput*** are in "_by\_one_" mode, data is gathered all at once and then piped one by one through `IDataProcess` and `IDataOutput`.

### Data services

_under construction_

## Getting started

To start a migration create an instance of `TransPy` and configure it. At least instances of `IDataInput`, `IDataProcess` and `IDataOutput` needs to be provided. Prior to starting the migration the data services might need to be configured too. Here is an code example:

```python

import json

from transpydata import TransPy

from transpydata.config.datainput.MysqlDataInput import MysqlDataInput

from transpydata.config.dataprocess.NoneDataProcess import NoneDataProcess

from transpydata.config.dataoutput.RequestDataOutput import RequestDataOutput

def main():

    # Configure imput

    mysql_input = MysqlDataInput()

    config = {

        'db_config': {

            'user': 'root',

            'password': 'TryingTh1ngs',

            'host': 'localhost',

            'port': '3306',

            'database': 'migration'

        },

        'get_one_query': None, # We'll go with all query

        'get_all_query': """

            SELECT s.staff_Id, s.staff_name, s.staff_grade, m.module_Id, m.module_name

            FROM staff s

            LEFT JOIN teaches t ON s.staff_Id = t.staff_Id

            LEFT JOIN module m ON t.module_Id = m.module_Id

        """,

        'all_query_params': {} # No where clause, no interpolation

    }

    mysql_input.configure(config)

    # Configure process

    none_process = NoneDataProcess()

    # Configure output

    request_output = RequestDataOutput()

    request_output.configure({

        'url': 'http://localhost:8008',

        'req_verb': 'POST',

        'headers': {

            'content-type': 'application/json',

            'accept-encoding': 'application/json',

            'x-app-id': 'MT1'

        },

        'encode_json': True,

        'json_response': True

    })

    # Configure TransPy

    trans_py = TransPy()

    trans_py.datainput = mysql_input

    trans_py.dataprocess = none_process

    trans_py.dataoutput = request_output

    res = trans_py.run()

    print(json.dumps(res))

if __name__ == '__main__':

    main()

```

Full working example could be found at `examples/mysql_to_http/`, there is a [docker-compose](https://docs.docker.com/compose/gettingstarted/#step-6-re-build-and-run-the-app-with-compose) to launch an instance of mysql and a webserver.

## Custom data services

For now you can check the interfaces `IDataInput`, `IDataProcess` and `IDataOutput` to see what needs to be implemented in a custom data service.

_(I'll improve this section in the future)_

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/salpreh/transpydata

Awesome Lists containing this project

README