https://github.com/soheil-mp/sales-analytics-pipeline

Data analytics pipeline built with Apache Spark and Hadoop for processing and analyzing large-scale sales data.
https://github.com/soheil-mp/sales-analytics-pipeline

apache-spark hadoop hdfs sql

Last synced: 7 months ago
JSON representation

Data analytics pipeline built with Apache Spark and Hadoop for processing and analyzing large-scale sales data.

Host: GitHub
URL: https://github.com/soheil-mp/sales-analytics-pipeline
Owner: soheil-mp
Created: 2024-07-17T18:57:54.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2024-07-17T19:31:14.000Z (about 1 year ago)
Last Synced: 2025-03-06T03:33:51.467Z (7 months ago)
Topics: apache-spark, hadoop, hdfs, sql
Language: Python
Homepage:
Size: 5.86 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Sales-Analytics-Pipeline

A comprehensive data analytics pipeline built with Apache Spark and Hadoop for processing and analyzing large-scale sales data.

This project demonstrates how to read and write data from and to HDFS, clean and preprocess data using PySpark, conduct advanced analytics with Spark SQL and window functions, integrate with Hive for data warehousing, and maintain a modular code structure for complex ETL processes.

Key features include scalable sales data processing, monthly sales trend analysis, insights into customer purchasing behavior, evaluation of product category performance, configurable data input/output paths, and robust error handling and logging.

The tech stack utilized in this project comprises Apache Spark, the Hadoop Distributed File System (HDFS), Apache Hive, and Python.

Run the following command to start the codes:
```bash
$ spark-submit --master yarn \
--deploy-mode client \
--driver-memory 2g \
--executor-memory 2g \
--executor-cores 2 \
main.py
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/soheil-mp/sales-analytics-pipeline

Awesome Lists containing this project

README