https://github.com/akshpraj/data-cleaning-and-preprocessing
Sales Data Cleaning and Preprocessing - Jupyter Notebook
https://github.com/akshpraj/data-cleaning-and-preprocessing
jupyter-notebook
Last synced: 10 months ago
JSON representation
Sales Data Cleaning and Preprocessing - Jupyter Notebook
- Host: GitHub
- URL: https://github.com/akshpraj/data-cleaning-and-preprocessing
- Owner: AkshPraj
- Created: 2025-04-07T11:44:48.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-15T13:20:53.000Z (about 1 year ago)
- Last Synced: 2025-04-19T19:51:46.777Z (about 1 year ago)
- Topics: jupyter-notebook
- Language: Jupyter Notebook
- Homepage:
- Size: 20.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐งน Sales Data Cleaning Project
### ๐ Objective
To clean and prepare a raw sales dataset by handling missing values, duplicates, inconsistent text formats, and incorrect data types. The cleaned dataset will be used for downstream analysis or reporting.
### ๐งฐ Tools Used
- Python (Pandas)
- Jupyter Notebook / Google Colab
- OR Excel for non-programmatic data cleaning
## ๐ Cleaning Steps Performed
## Task Description
- ๐ Missing Values Identified using .isnull() and handled by imputation or row removal.
- โป๏ธ Duplicates Removed using .drop_duplicates() or Excel's "Remove Duplicates".
- ๐งโ๐ผ Standardized Text Gender, country names, etc., were cleaned for consistency (e.g., male, Male, MALE โ Male).
- ๐ Date Format Fixes Converted all dates to consistent format (DD-MM-YYYY).
- ๐ท๏ธ Column Name Cleanup Renamed headers to lowercase with underscores (e.g., Order Date โ order_date).
- ๐ข Data Type Corrections Ensured numeric fields (like age, sales) are of correct type and dates as datetime.
## ๐งผ Example Summary of Changes
- Removed 5 duplicate rows
- Filled 12 missing 'customer_name' values with "Unknown"
- Standardized 'Gender' column to: ['Male', 'Female']
- Converted 'order_date' to datetime format
- Renamed columns: "Order Date" โ "order_date", "Sales Amount" โ "sales_amount"
- Casted 'quantity' and 'age' columns to integer