https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering

aws aws-athena aws-glue-crawler aws-glue-data-catalog aws-lambda aws-s3 aws-trigger awscloudwatch data-engineering data-engineering-pipeline python spotify-api spotipy-library

Last synced: about 2 months ago
JSON representation

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.

Host: GitHub
URL: https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering
Owner: ShubhamMohanty680
Created: 2024-12-03T11:23:56.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-01-22T10:02:32.000Z (4 months ago)
Last Synced: 2025-03-28T06:51:14.941Z (2 months ago)
Topics: aws, aws-athena, aws-glue-crawler, aws-glue-data-catalog, aws-lambda, aws-s3, aws-trigger, awscloudwatch, data-engineering, data-engineering-pipeline, python, spotify-api, spotipy-library
Language: Jupyter Notebook
Homepage:
Size: 1.44 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Spotify_End_To_End_Data_Engineering Project

### Introduction
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS. The pipeline will retrieve data from Spotify API, transform it to a desired format, and load into AWS S3(Data Storage).

### Architecture
![Architecture Diagram](https://github.com/user-attachments/assets/504350dd-973c-4a4c-8459-d008eb0edb31)

### Services Used
1. **AWS S3(Simple Storage Service):** AWS S3 is a highly saclable object storage service used for storing and retrieving any amount of data from anywhere using web. It is commonly used to store and distribute large files.

2. **AWS Lambda:** AWS Lambda is a serverless computing service that lets you run code without managing servers.

3. **Cloud Watch:** AWS Cloudwatch is a monitoring service for AWS Resources and the applications you run on them. It is used to collect and tracks metrics, collect and monitor log files and set alarms.

4. **AWS Data Catalog:** AWS Data Catalog is a centralized metadata repository for all your data assets across various data sources.

5. **AWS GLue Crawler:** AWS Data Catalog consists of AWS Glue Crawler which crawls the data sources and identifies data formats, infer schemas and create AWS Glue Data Catalog.

6. **AWS Athena:** Amazon Athena is a interactive query service that makes it easy to analyze data stored in S3 using standard SQL. It is also used to analyze stored in Glue Data Catalog.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering

Awesome Lists containing this project

README