https://github.com/portolan75/data_pipeline_automation
Data Pipeline Automations with GitHub Actions (in VS-code Dev Containers)
https://github.com/portolan75/data_pipeline_automation
dashboard docker electricity-demand github-actions github-pages plotly python quarto ubuntu
Last synced: 3 months ago
JSON representation
Data Pipeline Automations with GitHub Actions (in VS-code Dev Containers)
- Host: GitHub
- URL: https://github.com/portolan75/data_pipeline_automation
- Owner: portolan75
- Created: 2025-02-06T18:04:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-13T20:54:47.000Z (over 1 year ago)
- Last Synced: 2025-02-13T21:33:35.362Z (over 1 year ago)
- Topics: dashboard, docker, electricity-demand, github-actions, github-pages, plotly, python, quarto, ubuntu
- Language: Python
- Homepage: https://portolan75.github.io/data_pipeline_automation/
- Size: 7.55 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Pipeline Automation with GitHub Actions
This is the repository for my custom `Data Pipeline Automation with GitHub Actions` originated from [Rami Krispin](https://github.com/LinkedInLearning/data-pipeline-automation-with-github-actions-4503382).

This repo is about how to set up workflows on GitHub Actions to automate data processes with Python.
It shows how to set up a data pipeline, pull metadata from a pipeline, and deploy a live dashboard with GitHub Actions and Pages.
It automates hours of running manual scripts, pulling data from APIs or updating dashboards.
## Instructions
Some Python code examples are available under the [python folder](https://github.com/portolan75/data_pipeline_automation/tree/main/python).
This repo has VScode [setting](https://github.com/portolan75/data_pipeline_automation/tree/main/.devcontainer/devcontainer.json) to launch the repo inside a Docker container using the Visual Studio Dev Containers extension. The image was built to support amd64 CPU architecture (GitHub Actions default).
Alternatively, one can install locally the required Python requirements using the [requirements.txt](https://github.com/portolan75/data_pipeline_automation/blob/main/.devcontainer/requirements.txt).
The examples are using the EIA API (Energy Information Administration) to pull data and metadata [EIA website](https://www.eia.gov/opendata/index.php).
The EIA API is the U.S. Energy Information Administration (EIA) which collects, analyzes, and disseminates independent and impartial energy information to promote sound policymaking, efficient markets, and public understanding of energy and its interaction with the economy and the environment.
For these purposes, data pipeline outputs and metadata are stored locally, in the [csv](https://github.com/portolan75/data_pipeline_automation/blob/main/csv) and [metadata](https://github.com/portolan75/data_pipeline_automation/blob/main/metadata) folders, but as displayed in the image one can make use of cloud services (like AWS S3, Azure Storage, Google Storage) for a production setup.
## Customize the Docker image
To modify the Docker image, edit `.devcontainer/build_docker.sh`, eventually update the image name on `.devcontainer/devcontainer.json` and if other environment variables or requirements changed, consider to editing `.devcontainer/Dockerfile`, `.devcontainer/requirements.txt`.
To re-create the image:
- `cd ..project_folder/.devcontainer` then
- `bash build_docker.sh`
To open a project within `.devcontatiner`, make sure Terminal is poiting at the project folder (in this example `..path_to/data_pipeline_automation`).
Inside `..path_to/data_pipeline_automation` make sure there's a folder named `.devcontainer` including the files currently available.
The first data_backfile batch execution ran the following command, saving the html output directly in `docs` (default folder for Github Pages):
`quarto render python/data_backfile_py.qmd --to html --output-dir ../docs/data_backfile_python`
and removing undesired files/folders:
`rm -rf python/iframe_figures`
`rm python/.gitignore`