An open API service indexing awesome lists of open source software.

https://github.com/theodorusblote/financial-etl-pipeline

Automated ETL (Extract, Transform, Load) pipeline for extracting stock data using yfinance, transforms the data through feature engineering, and loads into a PostgreSQL database.
https://github.com/theodorusblote/financial-etl-pipeline

etl-pipeline postgresql technical-analysis yfinance

Last synced: 6 months ago
JSON representation

Automated ETL (Extract, Transform, Load) pipeline for extracting stock data using yfinance, transforms the data through feature engineering, and loads into a PostgreSQL database.

Awesome Lists containing this project

README

          

# ETL Pipeline for Financial Data

This project implements an automated ETL (Extract, Transform, Load) pipeline for extracting stock data using `yfinance`, transforms the data by adding technical indicators such as moving averages, RSI, etc., and loads the data into a PostgreSQL database.

## Table of Contents

- [Features](#features)
- [Installation and setup](#installation-and-setup)
- [Project structure](#project-structure)
- [Scheduling the pipeline](#scheduling-the-pipeline)
- [License](#license)

## Features

- **Extract:** Gets historical stock data using `yfinance`
- **Transform:** Enhances the data with features such as:
- 50-Day SMA
- 200-Day SMA
- RSI
- Lag open, volume, and daily return
- **Load:** Loads transformed data into a PostgreSQL database for further analysis
- **Automation:** Configurable to run automatically at a specified time daily using cron jobs

## Installation and setup

1. **Clone repository:**
```bash
git clone https://github.com/theodorusblote/financial-etl-pipeline.git
cd financial-etl-pipeline
```
2. **Create and activate virtual environment:**
```bash
python3 -m venv .venv
source .venv/bin/activate
```
3. **Install required packages:**
```bash
pip install -r requirements.txt
```
4. **Set up environment variables:**
```env
DB_USERNAME=your_db_username
DB_PASSWORD=your_db_password
DB_HOST=your_db_host
DB_PORT=your_db_port
DB_NAME=your_db_name
```
5. **Run script manually:**
```bash
python financial_etl_pipeline.py
```

## Project structure

```plaintext
financial-etl-pipeline/

├── financial_etl_pipeline.py # Main script
├── requirements.txt # Dependencies
├── .env # Environment variables
├── .gitignore # .gitignore file
├── LICENSE # License
└── README.md # Documentation
```

## Scheduling the piepline

To automate the ETL pipeline to run daily at a specified time, you can set up a cron job (macOS).

1. **Open crontab editor:**
```bash
crontab -e
```
2. **Add cron job:**
```cron
0 23 * * * /path/to/your/project/.venv/bin/python /path/to/your/project/financial_etl_pipeline.py
```

Note:

- `0 23 * * *`: Runs the job every day at 23:00 (11 pm)

## License

This project is licensed under the [MIT License](LICENSE).