https://github.com/siobhan-doherty/ag_challenge
https://github.com/siobhan-doherty/ag_challenge
airflow bigquery csv-files data-engineering etl google-cloud-platform python sql
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/siobhan-doherty/ag_challenge
- Owner: siobhan-doherty
- Created: 2024-12-16T17:03:43.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-12-16T17:25:07.000Z (over 1 year ago)
- Last Synced: 2025-04-04T15:40:36.654Z (about 1 year ago)
- Topics: airflow, bigquery, csv-files, data-engineering, etl, google-cloud-platform, python, sql
- Language: Python
- Homepage:
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AG Challenge
This project automates the generation, validation, and upload of mock data for Users, Transactions, and User Preferences into BigQuery using Airflow.
---
## Project Overview
1. Generating mock data based on the provided schema.
2. Automating tasks with Airflow to:
- Generate data.
- Upload CSV files into BigQuery.
- Validate the uploaded data.
- Test email notifications via SMTP.
3. Airflow DAG runs tasks sequentially:
- `generate_data`: Generate mock data files.
- `upload_into_bigquery`: Load data into BigQuery tables.
- `validate_bigquery_data`: Validate data has been uploaded correctly.
- `test_email_configuration`: Test email functionality to notify about failures.
---
## Prerequisites
1. Python 3.8+.
2. Airflow installed and configured.
3. BigQuery access configured with the `bq` CLI tool.
4. SMTP access enabled for sending notifications (e.g., Outlook SMTP).
---
## Dependencies
```bash
pip install -r requirements.txt