Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/as16082023/nashville-housing-data-cleaning-project
This project involved using MySQL to clean and optimize a Nashville housing dataset, addressing key data quality issues to ensure it was ready for accurate analysis.
https://github.com/as16082023/nashville-housing-data-cleaning-project
data-analysis data-cleaning mysql nashville-housing-data
Last synced: 4 days ago
JSON representation
This project involved using MySQL to clean and optimize a Nashville housing dataset, addressing key data quality issues to ensure it was ready for accurate analysis.
- Host: GitHub
- URL: https://github.com/as16082023/nashville-housing-data-cleaning-project
- Owner: as16082023
- Created: 2024-07-25T18:09:15.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-08-22T18:28:37.000Z (6 months ago)
- Last Synced: 2024-12-23T21:19:04.569Z (about 2 months ago)
- Topics: data-analysis, data-cleaning, mysql, nashville-housing-data
- Homepage:
- Size: 2.41 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Nashville-Housing-Data-Cleaning
This project involved using MySQL to clean and prepare a Nashville housing dataset with over 56,000 rows for analysis. The primary focus was on resolving various data quality issues to enhance the dataset's usability.
Key tasks included:
- Standardizing Date Format: Ensured consistency across the dataset.
- Populating Null Property Addresses: Filled missing data in the PropertyAddress column.
- Breaking Down Address Information: Separated City, State, and House Address into individual columns for both Property and Owner addresses.
- Standardizing Categorical Values: Converted 'Y' and 'N' values to 'Yes' and 'No' in the "Sold as Vacant" field.
- Removing Duplicates: Cleared duplicate entries to ensure data accuracy.
- Deleting Unnecessary Columns: Removed unnecessary columns to streamline the dataset.
The cleaned and well-structured dataset is now better suited for accurate analysis, supporting informed decision-making.