https://github.com/pavithra19/apache_spark_people_data_processor

This project is a data processing application built with Apache Spark and Scala. This is designed to efficiently process, analyze and transform large datasets related to people data. It leverages Spark’s distributed computing capabilities to handle scalable data ingestion, cleaning and reporting. Shell scripts are included for hadoop deployment.
https://github.com/pavithra19/apache_spark_people_data_processor

apachespark dataengineering hadoop hdfs scala

Last synced: 10 months ago
JSON representation

Host: GitHub
URL: https://github.com/pavithra19/apache_spark_people_data_processor
Owner: pavithra19
Created: 2025-06-09T12:05:44.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-06-11T22:52:40.000Z (11 months ago)
Last Synced: 2025-06-11T23:40:08.325Z (11 months ago)
Topics: apachespark, dataengineering, hadoop, hdfs, scala
Language: Scala
Homepage: https://github.com/pavithra19/apache_spark_people_data_processor
Size: 1.81 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pavithra19/apache_spark_people_data_processor

Awesome Lists containing this project