Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/annethsivakumar/retail-sales-analysis
This project culminates the Python for Data Engineering course, focusing on implementing an end-to-end ETL (Extract, Transform, Load) pipeline for retail sales analysis. The objective is to create a Python script that processes and analyzes retail sales data stored in a CSV file with four columns: date, product, quantity, and sales.
https://github.com/annethsivakumar/retail-sales-analysis
Last synced: about 1 month ago
JSON representation
This project culminates the Python for Data Engineering course, focusing on implementing an end-to-end ETL (Extract, Transform, Load) pipeline for retail sales analysis. The objective is to create a Python script that processes and analyzes retail sales data stored in a CSV file with four columns: date, product, quantity, and sales.
- Host: GitHub
- URL: https://github.com/annethsivakumar/retail-sales-analysis
- Owner: annethsivakumar
- Created: 2024-08-15T17:34:27.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-15T17:44:39.000Z (5 months ago)
- Last Synced: 2024-08-15T19:50:35.685Z (5 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 44.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Retail-Sales-Analysis
This project is the culmination of the _Python for Data Engineering_ Linkedin course, focusing on implementing an end-to-end ETL (Extract, Transform, Load) pipeline for retail sales analysis. The objective is to create a Python script that processes and analyzes retail sales data stored in a CSV file with four columns: date, product, quantity, and sales.
The project follows these key data engineering steps:
1. **Data Extraction:** The CSV file is read into a pandas DataFrame, forming the basis for the ETL process.
2. **Data Transformation:** The data undergoes cleaning to remove rows with missing or incomplete values. Subsequently, the script performs data manipulation to calculate total sales per product, identify the bestselling product, and determine average daily sales.
3. **Data Loading and Visualization:** The transformed data is used to generate visualizations, including sales trends over time and a bar chart showing sales per product, facilitating clear insights into the data.The analysis is structured within a Python class named RetailSalesAnalyzer, encapsulating methods for each stage of the ETL process. The final script creates an instance of this class, executing its methods to perform the complete data analysis and visualization, demonstrating practical data engineering skills.