An open API service indexing awesome lists of open source software.

https://github.com/divitmittal/datathon-bigdata

Efficient Data Processing ETL Pipeline for Event Records
https://github.com/divitmittal/datathon-bigdata

aws aws-glue aws-lambda aws-s3 etl-pipeline hadoop spark

Last synced: about 1 month ago
JSON representation

Efficient Data Processing ETL Pipeline for Event Records

Awesome Lists containing this project

README

          

= Datathon-BigData

== Efficient Data Processing ETL Pipeline for Event Records
To process raw product event data & filter relevant https://www.bobble.ai/en/home[BobbleAI Keyboard] event records from the last five days before July 1, 2024, expand JSON columns, and store the final data in a structured Apache Parquet format S3.

== Technology Stack
- **Cloud Services**: AWS (S3, Lambda, Glue)
- **Data Processing**: PySpark on AWS Glue
- **Storage**: S3 (Parquet format)
- **IAM & Security**: Managed using AWS IAM roles and policies for access control.