Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/reyhaneh-saffar/data-analysis-on-book-s-details
Exploring a dataset containing information about books, focusing on data processing, transformation, and model evaluation
https://github.com/reyhaneh-saffar/data-analysis-on-book-s-details
Last synced: 16 days ago
JSON representation
Exploring a dataset containing information about books, focusing on data processing, transformation, and model evaluation
- Host: GitHub
- URL: https://github.com/reyhaneh-saffar/data-analysis-on-book-s-details
- Owner: reyhaneh-saffar
- Created: 2025-01-10T11:18:00.000Z (20 days ago)
- Default Branch: main
- Last Pushed: 2025-01-10T11:23:39.000Z (20 days ago)
- Last Synced: 2025-01-10T12:28:47.483Z (20 days ago)
- Language: Jupyter Notebook
- Size: 1.76 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Book Dataset Analysis
## Overview
This project explores a dataset containing information about books, focusing on data processing, transformation, and model evaluation. The project aims to derive insights, preprocess data for analysis, and apply machine learning models to predict book prices.#### Transformations
- **Encoding**:
- Ratings and Reviews columns were transformed into numerical formats for analysis.
- One-hot encoding was applied to categorical features such as `BookCategory`.
- Word embeddings were implemented for text-based columns like `Synopsis`, `Title`, and `Author`.
- **Feature Engineering**:
- Extracted publication year and book format from the `Edition` column.
- Removed redundant columns (e.g., `Genre` replaced by `BookCategory`).
- **Data Expansion**: Array columns were expanded to create a comprehensive dataset, resulting in 2,200 columns from the original 36.---
### Model Implementation
Two machine learning models were tested to predict book prices:#### Linear Regression
- **Performance Metrics**:
- Mean Absolute Error (MAE): Exceptionally high.
- Mean Squared Error (MSE): Extremely large.
- R-squared: Negative, indicating poor model performance.
- **Insights**: The model struggled to capture underlying patterns, highlighting potential overfitting or data issues.#### Random Forest Regressor
- **Performance Metrics**:
- MAE: ~293.51.
- MSE: ~291,089.59.
- R-squared: ~0.1884.
- **Insights**: Demonstrated better predictive ability than Linear Regression but still showed room for improvement.