https://github.com/hq969/youtube-data-pipeline-aws
About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
https://github.com/hq969/youtube-data-pipeline-aws
aws aws-cloudwatch aws-data-engineering-project aws-glue aws-iam aws-lambda aws-s3 data-engineering-pipeline data-pipeline dataengineering etl etl-pipeline pandas pyhton pyspark spark
Last synced: 6 months ago
JSON representation
About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
- Host: GitHub
- URL: https://github.com/hq969/youtube-data-pipeline-aws
- Owner: hq969
- Created: 2025-02-22T14:35:33.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-24T16:53:54.000Z (8 months ago)
- Last Synced: 2025-04-05T07:16:38.234Z (6 months ago)
- Topics: aws, aws-cloudwatch, aws-data-engineering-project, aws-glue, aws-iam, aws-lambda, aws-s3, data-engineering-pipeline, data-pipeline, dataengineering, etl, etl-pipeline, pandas, pyhton, pyspark, spark
- Language: Python
- Homepage:
- Size: 1.69 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0