https://github.com/hvalfangst/azure-functions-pandas
Azure Functions for ETL operations using Pandas. Uploaded CSV files trigger data processing, calculating correlations and storing results in a JSON file. Automated deployment via GitHub Actions and Terraform.
https://github.com/hvalfangst/azure-functions-pandas
az-204 azure azure-functions azure-functions-python pandas python scikit-learn terraform
Last synced: 2 months ago
JSON representation
Azure Functions for ETL operations using Pandas. Uploaded CSV files trigger data processing, calculating correlations and storing results in a JSON file. Automated deployment via GitHub Actions and Terraform.
- Host: GitHub
- URL: https://github.com/hvalfangst/azure-functions-pandas
- Owner: hvalfangst
- Created: 2024-01-10T11:30:13.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-15T19:29:45.000Z (5 months ago)
- Last Synced: 2025-01-25T06:25:09.346Z (4 months ago)
- Topics: az-204, azure, azure-functions, azure-functions-python, pandas, python, scikit-learn, terraform
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Azure Function ETL using Pandas with CI/CD
This repository hosts a pair of Azure Functions designed for Extract, Transform, Load (ETL) operations:
1. HTTP-triggered Function:
Handles CSV file uploads from request body to the Blob 'in/input.csv' under container 'hvalfangstcontainer'.2. Blob-triggered Function:
Listens for changes to aforementioned Blob 'in/input.csv' and responds by loading the CSV into a Pandas dataframe on changes.
Conducts calculations to determine correlations between specific columns.
Stores the resulting correlations in a dictionary and dumps said dictionary to JSON.
It then proceeds to upload the JSON contents to the Blob 'out/statistics.json'.A CI/CD pipeline has been implemented utilizing a GitHub Actions Workflow script,
which enabled automatic deployment to Azure Function App on repository pushes. Azure resources are provisioned with Terraform via shell scripts 'up' and 'down'.## Requirements
* x86-64
* Linux/Unix
* [Python](https://www.python.org/downloads/)## Creating resources
The shell script 'up' allocates Azure resources with Terraform.
## Deleting resources
The shell script 'down' deallocates Azure resources.
## Guide
### 1. Provision Azure Resources
- Run the 'up' script to provision Azure resources with Terraform.
### 2. Access Azure Portal
- Open your browser and navigate to the Azure Portal.
### 3. Function App Publish Profile
- Navigate to the newly created Function App 'hvalfangstlinuxfunctionapp'
- Click on 'Get publish profile' to download a file.
- The associated file contents will be used in the next step.### 4. GitHub Repository Secrets
- Open the 'Settings' tab of your GitHub repository.
- Click on 'Actions' under 'Security' -> 'Secrets and variables'.
- Create the following repository secret:
- PUBLISH_PROFILE: Copy value from downloaded file obtained in step #3### 5. Deploy Workflow
Our Azure Function will be deployed on repository pushes by utilizing a GitHub Actions workflow script.