https://github.com/divitmittal/datathon-bigdata

Efficient Data Processing ETL Pipeline for Event Records
https://github.com/divitmittal/datathon-bigdata

aws aws-glue aws-lambda aws-s3 etl-pipeline hadoop spark

Last synced: about 2 months ago
JSON representation

Efficient Data Processing ETL Pipeline for Event Records

Host: GitHub
URL: https://github.com/divitmittal/datathon-bigdata
Owner: DivitMittal
License: mit
Created: 2024-09-21T14:27:56.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2026-01-15T20:06:32.000Z (6 months ago)
Last Synced: 2026-01-15T22:31:41.511Z (6 months ago)
Topics: aws, aws-glue, aws-lambda, aws-s3, etl-pipeline, hadoop, spark
Language: Jupyter Notebook
Homepage: https://deepwiki.com/DivitMittal/Datathon-BigData
Size: 4.11 MB
Stars: 6
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.adoc
- License: LICENSE

Awesome Lists containing this project

README

= Datathon-BigData

== Efficient Data Processing ETL Pipeline for Event Records
To process raw product event data & filter relevant https://www.bobble.ai/en/home[BobbleAI Keyboard] event records from the last five days before July 1, 2024, expand JSON columns, and store the final data in a structured Apache Parquet format S3.

== Technology Stack
- **Cloud Services**: AWS (S3, Lambda, Glue)
- **Data Processing**: PySpark on AWS Glue
- **Storage**: S3 (Parquet format)
- **IAM & Security**: Managed using AWS IAM roles and policies for access control.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/divitmittal/datathon-bigdata

Awesome Lists containing this project

README