An open API service indexing awesome lists of open source software.

https://github.com/officiallyxenos/alt-school-second-semester-project

A data analysis project for the AltSchool of Data Science Tinyuka 2024 Second Semester. This project explores missing data classification, COVID-19 case aggregation by region, and time series trends using Python and real-world datasets.
https://github.com/officiallyxenos/alt-school-second-semester-project

data-visualization missing-data pandas seaborn time-series-analysis

Last synced: 29 days ago
JSON representation

A data analysis project for the AltSchool of Data Science Tinyuka 2024 Second Semester. This project explores missing data classification, COVID-19 case aggregation by region, and time series trends using Python and real-world datasets.

Awesome Lists containing this project

README

          

# ๐Ÿ“Š AltSchool Data Science Project: Tinyuka 2024 Second Semester

This project is submitted as part of the AltSchool of Data Science **Tinyuka 2024 Second Semester Assessment**. It focuses on analyzing a real-world housing dataset and COVID-19 case data to demonstrate handling missing values, aggregating data, and performing basic time series analysis.

---

## ๐Ÿ“ Project Structure

```
.
โ”œโ”€โ”€ Akintomiwa_Akinpelu.ipynb # Main Jupyter Notebook with all analysis
โ”œโ”€โ”€ house_prices.csv # Housing dataset (used for Task 1)
โ””โ”€โ”€ README.md # Project documentation
```

---

## โœ… Assessment Tasks Completed

### 1. ๐Ÿงน Dealing with Missing Data

- Categorized missing values in `house_prices.csv` as **MAR**, **MCAR**, or **MNAR**
- Justified each classification using observed patterns (e.g., plot types missing `size`, etc.)
- Found that:
- `size`, `bath`, `balcony`, and `society` = **MAR**
- `location` = **MCAR**

---

### 2. ๐Ÿ“Š Data Aggregation and Grouping

- Loaded the **NYT COVID-19 Dataset** for U.S. counties in 2020.
- Aggregated **average COVID-19 cases by county** (or state).
- Rounded results to 2 decimal places for clean presentation.
- Displayed top 10 and bottom 5 counties for insights.

---

### 3. โฑ๏ธ Time Series Analysis

- Converted the `date` column to `datetime` format.
- Extracted short-form month names (e.g., Jan, Feb) and converted to categorical for ordering.
- Filtered data for **California**.
- Generated a **line plot** showing **monthly total COVID-19 cases** for the state.

---

## ๐Ÿ“ˆ Libraries Used

- `pandas`
- `matplotlib`
- `seaborn` (optional)

---

## ๐Ÿ“Œ Dataset Sources

- **House Prices Dataset** (provided by AltSchool)
- **NYT COVID-19 Data**
๐Ÿ”— [us-counties-2020.csv](https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2020.csv)

---

## ๐Ÿง  Author

- **Akintomiwa Akinpelu**
- AltSchool of Data Science โ€“ Tinyuka Track
- 2024 Second Semester Project

---

## ๐Ÿš€ How to Run

1. Clone this repo:
```bash
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
```

2. Install dependencies:
```bash
pip install pandas matplotlib
```

3. Open the notebook:
```bash
jupyter notebook Akintomiwa_Akinpelu.ipynb
```

---

## ๐Ÿ“ฌ Feedback

Feel free to open an issue or submit a pull request if you'd like to improve the project!