https://github.com/tomgorb/project-template-for-production
project template to (help) put a Machine/Deep learning algorithm into production
https://github.com/tomgorb/project-template-for-production
airflow bigquery gcp
Last synced: about 1 month ago
JSON representation
project template to (help) put a Machine/Deep learning algorithm into production
- Host: GitHub
- URL: https://github.com/tomgorb/project-template-for-production
- Owner: tomgorb
- Created: 2020-07-21T11:35:00.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-07-31T15:30:46.000Z (almost 2 years ago)
- Last Synced: 2025-01-09T05:25:28.237Z (over 1 year ago)
- Topics: airflow, bigquery, gcp
- Language: Python
- Homepage:
- Size: 76.2 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
template
-----
This template can be used to productionize a *Machine/Deep learning* algorithm (with a few adjustments).
```main.py``` contains **tasks** to be run: some tasks can be run locally (*e.g.* for developement purpose) or on the cloud (this template uses GCP AI platform).
There is a sample DAG ```dag.py``` that can be used on an Airflow - production - cluster (\*Airflow is an orchestrator).

After cloning this repo, you should create a python3 venv (named *venv* at the repo's root) and install all 3 requirements file:
- *requirements* contains modules needed by our main.py and custom model **except** tensorflow ;
- *requirements-ml* contains our custom library (not available on PyPI) that will be needed by GCP AI platform (will be uploaded to Google Storage when needed) ;
- *requirement-extra* contains the same tensorflow version that will be used on AI platform (depending on the runtime environment and installed by default).
There is a **MyModel** class containing *preprocess*, *train* and *predict* methods skeletons.
```template.yaml.template``` is the configuration file template containing (mainly) credentials.
```queries.yaml``` contains templated queries to be used to access or get data from any database (BigQuery in this case).
**mymodel** is our custom library .
### PACKAGING DAG AIRFLOW AND CODE
```build``` folder contains code to make debian packages for DAG and CODE alongside files for jenkins pipelines.
**DAG**: make-dag-package.sh
* version number
> hard coded
* path
>/opt/airflow/dags/
**CODE**: make-code-package.sh
* version number
> hard coded
* path
> /opt/