https://github.com/pavithra19/apache_spark_people_data_processor
This project is a data processing application built with Apache Spark and Scala. This is designed to efficiently process, analyze and transform large datasets related to people data. It leverages Spark’s distributed computing capabilities to handle scalable data ingestion, cleaning and reporting. Shell scripts are included for hadoop deployment.
https://github.com/pavithra19/apache_spark_people_data_processor
apachespark dataengineering hadoop hdfs scala
Last synced: 4 months ago
JSON representation
This project is a data processing application built with Apache Spark and Scala. This is designed to efficiently process, analyze and transform large datasets related to people data. It leverages Spark’s distributed computing capabilities to handle scalable data ingestion, cleaning and reporting. Shell scripts are included for hadoop deployment.
- Host: GitHub
- URL: https://github.com/pavithra19/apache_spark_people_data_processor
- Owner: pavithra19
- Created: 2025-06-09T12:05:44.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-11T22:52:40.000Z (4 months ago)
- Last Synced: 2025-06-11T23:40:08.325Z (4 months ago)
- Topics: apachespark, dataengineering, hadoop, hdfs, scala
- Language: Scala
- Homepage: https://github.com/pavithra19/apache_spark_people_data_processor
- Size: 1.81 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md