https://github.com/bytebyrajeev/flipkart-data-analysis-using-pyspark-on-databricks

data-engineering databricks-notebooks pyspark python

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/bytebyrajeev/flipkart-data-analysis-using-pyspark-on-databricks
Owner: bytebyrajeev
Created: 2025-04-06T15:03:47.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-04-06T15:09:26.000Z (about 1 year ago)
Last Synced: 2026-04-30T06:33:03.895Z (about 1 month ago)
Topics: data-engineering, databricks-notebooks, pyspark, python
Language: Jupyter Notebook
Homepage:
Size: 871 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Flipkart Data Analysis Using PySpark on Databricks
-------------------------------------------------------------

The project focuses on building an end-to-end data engineering pipeline using PySpark to address real-world business scenarios. Key steps include exploring and understanding the dataset structure, performing data cleaning to handle inconsistencies, and applying transformations to prepare the data for analysis. The workflow involves simulating practical use cases such as organizing product information, calculating metrics, and generating insights to meet business requirements. By leveraging PySpark's capabilities within the Databricks environment, the project demonstrates the implementation of a scalable and efficient data pipeline, providing a hands-on approach to solving data engineering challenges.

Link to the project: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/19652298897236/3492530066299206/4655662666255799/latest.html

## Steps Involved:
### Data Exploration
- Inspecting the dataset to understand its structure, including key columns like product ID, title, and ratings.
- Refering to the data dictionary to interpret column meanings and verify the dataset.

### Data Cleaning
- Handling missing or invalid entries in the dataset.
- Removing irrelevant rows to ensure data consistency.

### Data Processing and Analysis
##### Perform operations like:
- Filtering data based on defined criteria (e.g., valid product ratings).
- Aggregating data to calculate totals and averages for key metrics.
- Analyzing patterns in product performance.

### Project Insights
- Identifysing products with the highest ratings and consistent performance trends.
- Analyzing key product categories contributing to overall trends.

#### Sanpshots:
![Screenshot 2024-11-23 131227](https://github.com/user-attachments/assets/93061612-58a7-4ac3-b0b8-3c8415934f39)

----------------------------------------------------------------
Credits: Be a programmer

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bytebyrajeev/flipkart-data-analysis-using-pyspark-on-databricks

Awesome Lists containing this project

README