Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/keerthanapalanikumar/data-cleaning-on-sql
This repository contains SQL scripts and documentation for cleaning and standardizing data in the NashvilleHousing table within the sqlproject2 database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.
https://github.com/keerthanapalanikumar/data-cleaning-on-sql
data-cleaning data-deduplication data-manipulation data-standardization database-management mssql ssms
Last synced: about 1 month ago
JSON representation
This repository contains SQL scripts and documentation for cleaning and standardizing data in the NashvilleHousing table within the sqlproject2 database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.
- Host: GitHub
- URL: https://github.com/keerthanapalanikumar/data-cleaning-on-sql
- Owner: KeerthanaPalanikumar
- Created: 2024-06-17T13:47:39.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-06-17T13:51:19.000Z (5 months ago)
- Last Synced: 2024-10-12T07:03:13.664Z (about 1 month ago)
- Topics: data-cleaning, data-deduplication, data-manipulation, data-standardization, database-management, mssql, ssms
- Homepage:
- Size: 5.64 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SQL Project 2: Nashville Housing Data Cleaning
This repository contains SQL scripts and documentation for cleaning and standardizing data in the `NashvilleHousing` table within the `sqlproject2` database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.
## Key Features
- **Database Creation**: Initializes the `sqlproject2` database.
- **Data Standardization**: Converts date formats and standardizes field values.
- **Address Processing**: Splits combined address fields into separate columns for easier analysis.
- **Data Deduplication**: Identifies and removes duplicate records to ensure data integrity.
- **Column Cleanup**: Removes unused columns to streamline the dataset.## Usage
1. **Setup**: Create and populate the `NashvilleHousing` table in the `sqlproject2` database.
2. **Execution**: Run the provided SQL scripts in SQL Server Management Studio (SSMS) to clean the data.
3. **Verification**: Review the final cleaned dataset to confirm the changes.## Documentation
- **README**: Provides an overview of the project, step-by-step instructions, and usage guidelines.
- **SQL Scripts**: Contains the SQL commands for each data cleaning step, including comments for clarity.