https://github.com/niranjanrao07/docker-airflow-pipeline
Yfinance to Snowflake ETL is an automated pipeline that uses Apache Airflow on Google Cloud Composer to load daily stock data from Yahoo Finance into Snowflake. This ETL process provides up-to-date financial data, allowing for seamless integration into analytics workflows.
https://github.com/niranjanrao07/docker-airflow-pipeline
airflow analytics etl finance googlecloud snowflake
Last synced: about 1 year ago
JSON representation
Yfinance to Snowflake ETL is an automated pipeline that uses Apache Airflow on Google Cloud Composer to load daily stock data from Yahoo Finance into Snowflake. This ETL process provides up-to-date financial data, allowing for seamless integration into analytics workflows.
- Host: GitHub
- URL: https://github.com/niranjanrao07/docker-airflow-pipeline
- Owner: NiranjanRao07
- Created: 2024-10-18T07:08:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-27T03:13:43.000Z (over 1 year ago)
- Last Synced: 2024-10-27T04:23:31.635Z (over 1 year ago)
- Topics: airflow, analytics, etl, finance, googlecloud, snowflake
- Language: Python
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Yfinance to Snowflake ETL
This project leverages **Apache Airflow** in **Google Cloud Composer** to automate the ETL pipeline for loading **Yahoo Finance** stock data into **Snowflake**. Each day, stock data is extracted for a specified symbol, transformed, and loaded into a Snowflake table, enabling easy access for analytics and reporting.
---
## 📖 About
This Airflow DAG extracts daily stock data for a given symbol (e.g., `AAPL`) from Yahoo Finance and loads it into Snowflake for further analysis. The data is stored in the `stock_data_db.raw_data.yfinance` table, with fields for date, open, close, high, low, volume, and symbol.
---
## 🚀 Setup and Usage
1. **Dependencies**: Ensure `apache-airflow-providers-snowflake` is added to your **Cloud Composer** environment’s PyPI packages.
2. **Airflow Variables and Connections**:
- Create a Snowflake connection in Airflow (`snowflake_conn`) with the necessary credentials.
3. **DAG Structure**:
- `extract`: Extracts stock data for a specific date from Yahoo Finance using the `yfinance` library.
- `load`: Loads the extracted data into Snowflake, creating the target table if it doesn't exist.
4. **Trigger the DAG**:
- The DAG runs daily at 02:05 UTC to load the previous day’s stock data.
---
## 🔧 Key Functions
- **get_next_day**: Helper function to calculate the next day’s date.
- **return_snowflake_conn**: Establishes and returns a Snowflake connection.
- **extract**: Extracts stock data for a specific date.
- **load**: Creates or updates the Snowflake table with the extracted stock data.
---
## 📅 Schedule
- **Cron Schedule**: `5 2 * * *` (Runs daily at 02:05 UTC).
---
## Example
Run the `YfinanceToSnowflake` DAG in Cloud Composer to automate daily stock data ETL from Yahoo Finance to Snowflake.