https://github.com/shubhammohanty680/spotify_snowflake
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.
https://github.com/shubhammohanty680/spotify_snowflake
aws-athena aws-lambda aws-s3 aws-trigger dashboard data-engineering data-engineering-pipeline powerbi python s3-notification snowflake
Last synced: 2 months ago
JSON representation
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.
- Host: GitHub
- URL: https://github.com/shubhammohanty680/spotify_snowflake
- Owner: ShubhamMohanty680
- Created: 2025-01-22T09:38:58.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-23T15:51:59.000Z (5 months ago)
- Last Synced: 2025-04-10T04:03:01.189Z (2 months ago)
- Topics: aws-athena, aws-lambda, aws-s3, aws-trigger, dashboard, data-engineering, data-engineering-pipeline, powerbi, python, s3-notification, snowflake
- Language: Python
- Homepage:
- Size: 1.79 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spotify_Data_Pipe_Snowflake
### Introduction
In this Project created an ETL(Extract,Transform,Load) pipeline by using Spotify API on AWS cloud. The Project aims at using the playlist from Spotify API and transforming and storing cleaned data in AWS cloud storage and utilizing the data in snowflake using snowpipe.
### Architecture
### Services Used
1. **AWS S3(Simple Storage Service):** AWS S3 is a highly saclable object storage service used for storing and retrieving any amount of data from anywhere using web. It is commonly used to store and distribute large files.
2. **AWS Lambda:** AWS Lambda is a serverless computing service that lets you run code without managing servers.
3. **Cloud Watch:** AWS Cloudwatch is a monitoring service for AWS Resources and the applications you run on them. It is used to collect and tracks metrics, collect and monitor log files and set alarms.4. **AWS Data Catalog:** AWS Data Catalog is a centralized metadata repository for all your data assets across various data sources.
5. **AWS GLue Crawler:** AWS Data Catalog consists of AWS Glue Crawler which crawls the data sources and identifies data formats, infer schemas and create AWS Glue Data Catalog.
6. **AWS Athena:** Amazon Athena is a interactive query service that makes it easy to analyze data stored in S3 using standard SQL. It is also used to analyze stored in Glue Data Catalog.
7. **Snowflake:** Snowflake is a cloud-based data warehouse platform that allows users to store, analyze, and exchange data securely.
8. **PowerBI:** Power BI is a business analytics service by Microsoft that allows organizations to connect to various data sources, transform and clean data, create interactive visualizations, and share insights with others.
### Dashboard
