https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering
aws aws-athena aws-glue-crawler aws-glue-data-catalog aws-lambda aws-s3 aws-trigger awscloudwatch data-engineering data-engineering-pipeline python spotify-api spotipy-library
Last synced: about 2 months ago
JSON representation
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
- Host: GitHub
- URL: https://github.com/shubhammohanty680/spotify_end_to_end_data_engineering
- Owner: ShubhamMohanty680
- Created: 2024-12-03T11:23:56.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-01-22T10:02:32.000Z (4 months ago)
- Last Synced: 2025-03-28T06:51:14.941Z (2 months ago)
- Topics: aws, aws-athena, aws-glue-crawler, aws-glue-data-catalog, aws-lambda, aws-s3, aws-trigger, awscloudwatch, data-engineering, data-engineering-pipeline, python, spotify-api, spotipy-library
- Language: Jupyter Notebook
- Homepage:
- Size: 1.44 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spotify_End_To_End_Data_Engineering Project
### Introduction
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS. The pipeline will retrieve data from Spotify API, transform it to a desired format, and load into AWS S3(Data Storage).### Architecture
### Services Used
1. **AWS S3(Simple Storage Service):** AWS S3 is a highly saclable object storage service used for storing and retrieving any amount of data from anywhere using web. It is commonly used to store and distribute large files.
2. **AWS Lambda:** AWS Lambda is a serverless computing service that lets you run code without managing servers.
3. **Cloud Watch:** AWS Cloudwatch is a monitoring service for AWS Resources and the applications you run on them. It is used to collect and tracks metrics, collect and monitor log files and set alarms.4. **AWS Data Catalog:** AWS Data Catalog is a centralized metadata repository for all your data assets across various data sources.
5. **AWS GLue Crawler:** AWS Data Catalog consists of AWS Glue Crawler which crawls the data sources and identifies data formats, infer schemas and create AWS Glue Data Catalog.
6. **AWS Athena:** Amazon Athena is a interactive query service that makes it easy to analyze data stored in S3 using standard SQL. It is also used to analyze stored in Glue Data Catalog.