An open API service indexing awesome lists of open source software.

https://github.com/vipulbunny/book-data-analysis

A Python-based data analysis project for cleaning, processing, and visualizing book data from a JSON dataset.
https://github.com/vipulbunny/book-data-analysis

booksdataset data-science datacleaning json matplotlib matplotlib-pyplot pandas python seaborn visualization

Last synced: about 1 month ago
JSON representation

A Python-based data analysis project for cleaning, processing, and visualizing book data from a JSON dataset.

Awesome Lists containing this project

README

          

# 📚 Book Data Cleaning and Analysis

## 📝 Description
This project focuses on **cleaning, transforming, and analyzing book-related data** from a JSON dataset. It includes preprocessing techniques to handle missing values, incorrect formats, and unnecessary columns. The final cleaned dataset is used for **data visualization and exploratory data analysis (EDA)** using **Pandas, Matplotlib, and Seaborn**.

## 🔍 Detailed Description
This repository contains a **Python script** that loads book data from an online JSON file, processes it, and performs various data-cleaning operations such as:

- Handling **missing values** and incorrect formats.
- Cleaning **ISBN** numbers and **publication dates**.
- Removing **duplicates** and irrelevant columns.
- Renaming columns for better readability.
- Performing **data visualization** to analyze book publishing trends and status distributions.

## 🚀 Features
- **Data Cleaning**: Prepares raw JSON data into structured Pandas DataFrame.
- **Exploratory Data Analysis (EDA)**: Uses visualizations to extract insights.
- **Automated Preprocessing**: Handles missing values and formatting inconsistencies.
- **Simple and Efficient Code**: Well-commented and easy to understand.

## 🏗️ Technologies Used
- **Python**
- **Pandas**
- **NumPy**
- **Matplotlib**
- **Seaborn**
- **Requests**
- **Regular Expressions (re)**

## 📂 Project Structure
```
📦 Book-Data-Analysis
┣ 📜 book_analysis.py # Main Python script for data cleaning and visualization
┣ 📜 README.md # Project documentation
┗ 📜 requirements.txt # List of dependencies
```

## 🛠️ Setup & Installation
### 1️⃣ Install Required Libraries
Ensure you have **Python 3.7+** installed. Install dependencies using:
```bash
pip install -r requirements.txt
```

### 2️⃣ Run the Script
Execute the script with:
```bash
python book_analysis.py
```

## 📊 Sample Visualizations
![image](https://github.com/user-attachments/assets/34f22f54-8e80-4c82-92d2-0c17e3d42fcb)
![image](https://github.com/user-attachments/assets/690113e0-02d2-4548-8d80-bd6f38a9b533)
![image](https://github.com/user-attachments/assets/ec32001b-11b8-4b17-b3a0-8e608af0d371)

## 🏆 Results & Insights
- The dataset contains books published between **1993 and 2013**.
- Most books have a **status of Published**, while some are **Unpublished**.
- The **top 10 books with the highest page count** were identified and visualized.

## 📌 Tags
`Data Cleaning` `Pandas` `Python` `Data Analysis` `Visualization` `Book Dataset` `Seaborn` `Matplotlib`

## 📜 License
This project is **open-source** and available for use and modification.

---

**👨‍💻 Developed by [VIPULbunny]**