Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering
Last synced: 10 days ago
JSON representation
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
- Host: GitHub
- URL: https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering
- Owner: ShubhamMohanty680
- Created: 2024-12-03T11:23:56.000Z (21 days ago)
- Default Branch: main
- Last Pushed: 2024-12-03T11:59:29.000Z (21 days ago)
- Last Synced: 2024-12-03T12:26:39.400Z (21 days ago)
- Language: Jupyter Notebook
- Size: 1.44 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spotify_End_To_End_Data_Engineering Project
### Introduction
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS. The pipeline will retrieve data from Spotify API, transform it to a desired format, and load into AWS S3(Data Storage).### Architecture
![Architecture Diagram](https://github.com/user-attachments/assets/504350dd-973c-4a4c-8459-d008eb0edb31)### Services Used
1. **AWS S3(Simple Storage Service):** AWS S3 is a highly saclable object storage service used for storing and retrieving any amount of data from anywhere using web. It is commonly used to store and distribute large files.
2. **AWS Lambda:** AWS Lambda is a serverless computing service that lets you run code without managing servers.
3. **Cloud Watch:** AWS Cloudwatch is a monitoring service for AWS Resources and the applications you run on them. It is used to collect and tracks metrics, collect and monitor log files and set alarms.4. **AWS Data Catalog:** AWS Data Catalog is a centralized metadata repository for all your data assets across various data sources.
5. **AWS GLue Crawler:** AWS Data Catalog consists of AWS Glue Crawler which crawls the data sources and identifies data formats, infer schemas and create AWS Glue Data Catalog.
6. **AWS Athena:** Amazon Athena is a interactive query service that makes it easy to analyze data stored in S3 using standard SQL. It is also used to analyze stored in Glue Data Catalog.