https://github.com/theodorusblote/financial-etl-pipeline
Automated ETL (Extract, Transform, Load) pipeline for extracting stock data using yfinance, transforms the data through feature engineering, and loads into a PostgreSQL database.
https://github.com/theodorusblote/financial-etl-pipeline
etl-pipeline postgresql technical-analysis yfinance
Last synced: 6 months ago
JSON representation
Automated ETL (Extract, Transform, Load) pipeline for extracting stock data using yfinance, transforms the data through feature engineering, and loads into a PostgreSQL database.
- Host: GitHub
- URL: https://github.com/theodorusblote/financial-etl-pipeline
- Owner: theodorusblote
- License: mit
- Created: 2024-10-24T23:16:37.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-10-27T20:06:09.000Z (12 months ago)
- Last Synced: 2024-11-05T13:35:44.490Z (11 months ago)
- Topics: etl-pipeline, postgresql, technical-analysis, yfinance
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ETL Pipeline for Financial Data
This project implements an automated ETL (Extract, Transform, Load) pipeline for extracting stock data using `yfinance`, transforms the data by adding technical indicators such as moving averages, RSI, etc., and loads the data into a PostgreSQL database.
## Table of Contents
- [Features](#features)
- [Installation and setup](#installation-and-setup)
- [Project structure](#project-structure)
- [Scheduling the pipeline](#scheduling-the-pipeline)
- [License](#license)## Features
- **Extract:** Gets historical stock data using `yfinance`
- **Transform:** Enhances the data with features such as:
- 50-Day SMA
- 200-Day SMA
- RSI
- Lag open, volume, and daily return
- **Load:** Loads transformed data into a PostgreSQL database for further analysis
- **Automation:** Configurable to run automatically at a specified time daily using cron jobs## Installation and setup
1. **Clone repository:**
```bash
git clone https://github.com/theodorusblote/financial-etl-pipeline.git
cd financial-etl-pipeline
```
2. **Create and activate virtual environment:**
```bash
python3 -m venv .venv
source .venv/bin/activate
```
3. **Install required packages:**
```bash
pip install -r requirements.txt
```
4. **Set up environment variables:**
```env
DB_USERNAME=your_db_username
DB_PASSWORD=your_db_password
DB_HOST=your_db_host
DB_PORT=your_db_port
DB_NAME=your_db_name
```
5. **Run script manually:**
```bash
python financial_etl_pipeline.py
```## Project structure
```plaintext
financial-etl-pipeline/
│
├── financial_etl_pipeline.py # Main script
├── requirements.txt # Dependencies
├── .env # Environment variables
├── .gitignore # .gitignore file
├── LICENSE # License
└── README.md # Documentation
```## Scheduling the piepline
To automate the ETL pipeline to run daily at a specified time, you can set up a cron job (macOS).
1. **Open crontab editor:**
```bash
crontab -e
```
2. **Add cron job:**
```cron
0 23 * * * /path/to/your/project/.venv/bin/python /path/to/your/project/financial_etl_pipeline.py
```Note:
- `0 23 * * *`: Runs the job every day at 23:00 (11 pm)
## License
This project is licensed under the [MIT License](LICENSE).