{"id":25543472,"url":"https://github.com/portolan75/data_pipeline_automation","last_synced_at":"2026-04-12T09:47:20.862Z","repository":{"id":276195684,"uuid":"928484878","full_name":"portolan75/data_pipeline_automation","owner":"portolan75","description":"Data Pipeline Automations with GitHub Actions (in VS-code Dev Containers)","archived":false,"fork":false,"pushed_at":"2025-02-13T20:54:47.000Z","size":7916,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-13T21:33:35.362Z","etag":null,"topics":["dashboard","docker","electricity-demand","github-actions","github-pages","plotly","python","quarto","ubuntu"],"latest_commit_sha":null,"homepage":"https://portolan75.github.io/data_pipeline_automation/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/portolan75.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-06T18:04:05.000Z","updated_at":"2025-02-13T20:54:51.000Z","dependencies_parsed_at":"2025-02-06T20:44:45.825Z","dependency_job_id":null,"html_url":"https://github.com/portolan75/data_pipeline_automation","commit_stats":null,"previous_names":["portolan75/data_pipeline_automation"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portolan75%2Fdata_pipeline_automation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portolan75%2Fdata_pipeline_automation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portolan75%2Fdata_pipeline_automation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portolan75%2Fdata_pipeline_automation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/portolan75","download_url":"https://codeload.github.com/portolan75/data_pipeline_automation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239793061,"owners_count":19697893,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dashboard","docker","electricity-demand","github-actions","github-pages","plotly","python","quarto","ubuntu"],"created_at":"2025-02-20T07:19:33.468Z","updated_at":"2026-04-12T09:47:20.852Z","avatar_url":"https://github.com/portolan75.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Pipeline Automation with GitHub Actions\n\nThis is the repository for my custom `Data Pipeline Automation with GitHub Actions` originated from [Rami Krispin](https://github.com/LinkedInLearning/data-pipeline-automation-with-github-actions-4503382). \n\n![](/readme_images/pipeline_automation.drawio.png)\n\nThis repo is about how to set up workflows on GitHub Actions to automate data processes with Python. \nIt shows how to set up a data pipeline, pull metadata from a pipeline, and deploy a live dashboard with GitHub Actions and Pages. \nIt automates hours of running manual scripts, pulling data from APIs or updating dashboards.\n\n\n## Instructions\n\nSome Python code examples are available under the [python folder](https://github.com/portolan75/data_pipeline_automation/tree/main/python).\n\nThis repo has VScode [setting](https://github.com/portolan75/data_pipeline_automation/tree/main/.devcontainer/devcontainer.json) to launch the repo inside a Docker container using the Visual Studio Dev Containers extension. The image was built to support amd64 CPU architecture (GitHub Actions default). \nAlternatively, one can install locally the required Python requirements using the [requirements.txt](https://github.com/portolan75/data_pipeline_automation/blob/main/.devcontainer/requirements.txt).\n\n\nThe examples are using the EIA API (Energy Information Administration) to pull data and metadata [EIA website](https://www.eia.gov/opendata/index.php). \nThe EIA API is the U.S. Energy Information Administration (EIA) which collects, analyzes, and disseminates independent and impartial energy information to promote sound policymaking, efficient markets, and public understanding of energy and its interaction with the economy and the environment.\n\n\nFor these purposes, data pipeline outputs and metadata are stored locally, in the [csv](https://github.com/portolan75/data_pipeline_automation/blob/main/csv) and [metadata](https://github.com/portolan75/data_pipeline_automation/blob/main/metadata) folders, but as displayed in the image one can make use of cloud services (like AWS S3, Azure Storage, Google Storage) for a production setup.\n\n## Customize the Docker image\nTo modify the Docker image, edit `.devcontainer/build_docker.sh`, eventually update the image name on `.devcontainer/devcontainer.json` and if other environment variables or requirements changed, consider to editing `.devcontainer/Dockerfile`, `.devcontainer/requirements.txt`.\n\nTo re-create the image:\n\n- `cd ..project_folder/.devcontainer` then\n- `bash build_docker.sh`\n\nTo open a project within `.devcontatiner`, make sure Terminal is poiting at the project folder (in this example `..path_to/data_pipeline_automation`).\nInside `..path_to/data_pipeline_automation` make sure there's a folder named `.devcontainer` including the files currently available.\n\nThe first data_backfile batch execution ran the following command, saving the html output directly in `docs` (default folder for Github Pages):\n`quarto render python/data_backfile_py.qmd --to html --output-dir ../docs/data_backfile_python`\nand removing undesired files/folders:\n`rm -rf python/iframe_figures`\n`rm python/.gitignore`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fportolan75%2Fdata_pipeline_automation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fportolan75%2Fdata_pipeline_automation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fportolan75%2Fdata_pipeline_automation/lists"}