https://github.com/frocode/aws-etl
https://github.com/frocode/aws-etl
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/frocode/aws-etl
- Owner: FroCode
- Created: 2024-04-25T22:12:22.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-06T22:59:35.000Z (almost 2 years ago)
- Last Synced: 2024-05-07T12:50:01.864Z (almost 2 years ago)
- Language: Python
- Homepage: https://frocode.github.io/AWS-ETL/
- Size: 2.22 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fintech Data Processing and Analysis
## Overview
This project involves extracting Fintech data from MySQL Database, loading it into Amazon Redshift, and preparing for further analysis and visualization in Power BI. The data is processed using a Python script that uploads the data to an S3 bucket and then loads it into a Redshift database.
##


## Project Structure
- `unicorn_data_loading_redshift.py`: This script handles the connection to AWS services (S3 and Redshift), creates necessary database schema and tables, and performs data loading operations.
- `.env`: A dotenv file to store sensitive credentials like AWS access keys, Redshift database credentials, etc. (Note: This file should not be checked into version control).
- `README.md`: Provides project documentation.
## Setup Instructions
### AWS Services And Tools
- AWS CLI
- Boto3
- IAM
- VPC
- Amazon Redshift Cluster
- Amazon S3 Bucket
- Lambda
- Power BI for visualization (Upcoming)
### Environment Setup
1. Clone the repository to your local machine.
2. Ensure Python 3.x is installed.
3. Install required Python packages:
```bash
pip install pandas boto3 psycopg2-binary python-dotenv
### Current Work
The data upload and initial processing are functioning correctly. However, there are still tasks under development:
1. Data Analysis: Detailed analysis of the data is in the planning stages.
2. Lambda function for Increamental Load
3. Extracting data from different sources like : PostgreSQL
4. Automation: For regular and scheduled transformations execute SQL scripts.
5. Regular Backups: Configure and ensure regular backups of Redshift cluster to safeguard against data loss.
6. Dashboard Development: A Power BI dashboard is currently under development to visualize and interact with the dataset.
#### View in Power Bi [Click Here](https://app.powerbi.com/groups/me/reports/e69eac26-39f3-432e-ba1f-dcc801b32a8a/ReportSection?experience=power-bi)
#### Live preview [Click Here](https://frocode.github.io/AWS-ETL/)