Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nagar2nd/audible-data-analysis
This project focuses on cleaning and standardizing an Audible dataset using Power Query Editor in Excel. Key tasks include splitting and merging columns, standardizing formats, and handling null values to prepare the dataset for further analysis.
https://github.com/nagar2nd/audible-data-analysis
ms-excel
Last synced: 18 days ago
JSON representation
This project focuses on cleaning and standardizing an Audible dataset using Power Query Editor in Excel. Key tasks include splitting and merging columns, standardizing formats, and handling null values to prepare the dataset for further analysis.
- Host: GitHub
- URL: https://github.com/nagar2nd/audible-data-analysis
- Owner: Nagar2nd
- Created: 2024-09-11T10:17:40.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-10-22T06:30:33.000Z (3 months ago)
- Last Synced: 2024-10-23T09:04:09.322Z (3 months ago)
- Topics: ms-excel
- Homepage:
- Size: 7.52 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Audible Data Cleaning Project
## Project Overview
This project involves cleaning and standardizing an Audible dataset using Power Query Editor in Excel.
The dataset was transformed to ensure consistency and prepared for further analysis by applying various data cleaning and transformation techniques.**Dataset Link**: https://drive.google.com/file/d/1yjyozaSrwShoaROq-TDuSgC5HNLLmrTE/view?usp=sharing
## Data Cleaning Tasks and Applied Steps
### 1. **Name Standardization**
- **Step Applied:** Capitalized each word in the "Name" column to ensure uniform title casing.
- **Explanation:** This ensures that the product titles follow a consistent format, improving readability and data presentation.### 2. **Author Name Separation**
- **Step Applied:** Split the "Author" column by delimiter or character transitions to separate first and last names.
- **Explanation:** When authors’ full names were combined, the column was split into separate fields for better data organization.### 3. **Release Date Standardization**
- **Step Applied:** Converted the "ReleaseDate" column to a date format (DD-MM-YYYY).
- **Explanation:** Ensuring consistent date formats across the dataset allows for accurate analysis based on release dates.### 4. **Duration Conversion**
- **Step Applied:**
- Extracted hours from the "Time" column.
- Converted hours to minutes using a multiplication formula.
- **Explanation:** The time values were transformed into a recognized Excel duration format, allowing for proper time calculations and analysis.### 5. **Price Column Cleanup**
- **Step Applied:**
- Replaced non-numeric values (e.g., "Free") with "0."
- Changed the data type of the "Price" column to currency.
- Applied formatting to ensure all values have two decimal places.
- **Explanation:** This ensures all price entries are numeric and uniformly formatted, allowing for meaningful pricing analysis.### 6. **Star Rating Conversion**
- **Step Applied:** Replaced text-based star ratings with numeric values.
- **Explanation:** This step facilitates easier analysis and calculations of product ratings.### 7. **Narrator Separation**
- **Step Applied:**
- Split the "NarratedBy" column by delimiters (e.g., commas) to separate multiple narrators.
- Merged split narrator columns for clarity.
- **Explanation:** This helps identify individual narrators when multiple are listed for a single audiobook, improving the clarity of the dataset.### 8. **Merging Columns for Release Info**
- **Step Applied:** Merged the "ReleaseDate" and "Language" columns into a new column "ReleaseInfo" with the format "DD-MM-YYYY, Language."
- **Explanation:** Combining these columns provides a comprehensive view of release information and simplifies the dataset.### 9. **Null Value Replacement**
- **Step Applied:**
- Replaced null values in various columns with appropriate placeholders (e.g., "Not Applicable" or "0").
- **Explanation:** This ensures there are no gaps in the dataset, improving data integrity for further analysis.## Key Excel Features Used
- **Power Query Editor:** For column formatting, data splitting, and transformation tasks.
- **Text to Columns:** To separate combined names and narrators.
- **Data Type Conversion:** Converted text data types to appropriate formats like numeric and date.
- **Merge Columns:** Combined "ReleaseDate" and "Language" into a single column using custom formatting.
- **Conditional Formatting:** Applied for visual consistency checks.## Conclusion
The project effectively cleaned and standardized the Audible dataset, ensuring uniformity in column formats and preparing it for deeper analysis. Power Query Editor's transformation capabilities were pivotal in streamlining this process.