An open API service indexing awesome lists of open source software.

https://github.com/praisetompane/app_etl

A toy API driven ETL application to experiment with the Flask(with gunicorn), SQLAlchemy, Alembic and Postgres.
https://github.com/praisetompane/app_etl

alembic docker flask python railway sql sqlalchemy worldhealthorg

Last synced: about 1 year ago
JSON representation

A toy API driven ETL application to experiment with the Flask(with gunicorn), SQLAlchemy, Alembic and Postgres.

Awesome Lists containing this project

README

          

# app_etl
![build status](https://github.com/praisetompane/app_etl/actions/workflows/app_etl.yaml/badge.svg)

## Objectives
- An API driven ETL application to experiment with the Flask(with gunicorn), SQLAlchemy, Alembic and Postgres.
- Extract data from World Health Organization.
- Supported Datasets
- Malaria Annual Confirmed Cases
- ...
## Database
- Structure after first run

![](docs/app_etl_erd.png)

- Connect to database
- The database is accessible on `localhost` at the port and credentials specified in [env](.env).

## Project Structure
- docs: Project documentation lives in here.
- src: production code lives in folder and is divided in the modules below:
- app_etl: project package
- api:
- the API to the application lives in this module.
- the current implementation is a REST API, but a gRPC, CLI API, etc would be implemented in here.
- config:
- configurable values live in here.
- these are values such as Hand Ranks, Card Ranks.
- as the system scales, you could migrate these into a database to allow independently
changing config without restarting the application.
- core:
- the domain logic of the application lives in this module.
- gateway:
- all external interaction objects(e.g. files, external APIs etc) live in this module.
- model:
- The domain models for Poker live in this in this module.
- repository:
- Data interactions(persitence and access) concerns live in this module.
- app.py:
entry point to startup the application
- tests: test code lives in folder.
the tests are intentionally separated from production code.
- benefits:
- tests can run against an installed version after executing `pip install .`.
- tests can run against the local copy with an editable install after executing `pip install --edit`.
- when using Docker, the entire app_etl folder can be copied without needing to exclude tests, which we don't release to PROD.
- more in depth discussion here: https://docs.pytest.org/en/latest/explanation/goodpractices.html#choosing-a-test-layout-import-rules

- utilities: any useful scripts, such as curl & postman requests, JSON payloads, software installations, etc.

## Dependencies
- [Docker](https://docs.docker.com/get-started/)

## Setup Instructions
- The repository is configured to use [devcontainers](https://containers.dev) for development.
- [Developing inside a Container](https://code.visualstudio.com/docs/devcontainers/containers)

## Run Program
- The system automatically starts up as part of loading the project into an editor that supports devcontainers.
- If you wouuld like to run the prod image, change `dockerfile: Dockerfile.dev` to `dockerfile: Dockerfile` in [docker-compose](docker-compose.debug.yml).
- Run an ETL
```shell
# specifically imports malaria_annual_confirmed_cases
./utilities/curl/malaria/malaria_annual_confirmed_cases.sh
```
- Debugging
- Running in debug mode and debug with VSCode:
- Open the "Run and Debug" view.
- Click the green play button.

![start system output](./docs/vscode_debugging.png)

- Allow debugging without frozen modules by clicking "Debug Anyway" once the debugger is installed and ready.
![bypass frozen modueles](./docs/vscode_debugging_frozen.png)
- The server will inform you the host and port in the terminal output at the bottom.

- From here you debug like normal(i.e. add break points, step into code definitions, evaluate code snippets, etc)

- If you wouuld like to debug the prod image, change `dockerfile: Dockerfile.dev` to `dockerfile: Dockerfile` in [docker-compose.debug](docker-compose.debug.yml).

## Testing
- Run unit and integration tests
```shell
pytest
```
- Run End to End tests
- Not Implemented

## Database State Management

- The database state (i.e. tables, stored procedures, indexes, etc) are managed using [Alembic](https://alembic.sqlalchemy.org/en/latest/).
- Migrations location: src/app_etl/migrations
- Migrations naming scheme: YYYY_MM_DD_HHMM_rev_nanme
- uses alembic's full revision scheme defined in alembic.ini
- example: `2025_02_08_0825-98af2865f6fc_create_schema_etl`
- Current database state can be queried with `SELECT * FROM public.alembic_version;`
- To upgrade the database to latest migrations:
```shell
alembic upgrade head
```
- To downgrade the database to the base state:
```shell
alembic downgrade base
```

## Git Conventions
- **NB:** The main is locked and all changes must come through a Pull Request.
- Commit Messages:
- Provide concise commit messages that describe what you have done.
```shell
# example:
git commit -m "feat(core): algorithm" -m"implement my new shiny faster algorithm"
```
- screen shot of Githb view
- references:
- https://www.conventionalcommits.org/en/v1.0.0/
- https://www.freecodecamp.org/news/how-to-write-better-git-commit-messages/

**Disclaimer**: This is still work in progress.