https://github.com/officiallyxenos/alt-school-second-semester-project
A data analysis project for the AltSchool of Data Science Tinyuka 2024 Second Semester. This project explores missing data classification, COVID-19 case aggregation by region, and time series trends using Python and real-world datasets.
https://github.com/officiallyxenos/alt-school-second-semester-project
data-visualization missing-data pandas seaborn time-series-analysis
Last synced: 29 days ago
JSON representation
A data analysis project for the AltSchool of Data Science Tinyuka 2024 Second Semester. This project explores missing data classification, COVID-19 case aggregation by region, and time series trends using Python and real-world datasets.
- Host: GitHub
- URL: https://github.com/officiallyxenos/alt-school-second-semester-project
- Owner: OfficiallyXenos
- Created: 2025-04-15T22:43:32.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-15T22:45:25.000Z (about 1 year ago)
- Last Synced: 2025-07-01T11:51:15.938Z (12 months ago)
- Topics: data-visualization, missing-data, pandas, seaborn, time-series-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 283 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ AltSchool Data Science Project: Tinyuka 2024 Second Semester
This project is submitted as part of the AltSchool of Data Science **Tinyuka 2024 Second Semester Assessment**. It focuses on analyzing a real-world housing dataset and COVID-19 case data to demonstrate handling missing values, aggregating data, and performing basic time series analysis.
---
## ๐ Project Structure
```
.
โโโ Akintomiwa_Akinpelu.ipynb # Main Jupyter Notebook with all analysis
โโโ house_prices.csv # Housing dataset (used for Task 1)
โโโ README.md # Project documentation
```
---
## โ
Assessment Tasks Completed
### 1. ๐งน Dealing with Missing Data
- Categorized missing values in `house_prices.csv` as **MAR**, **MCAR**, or **MNAR**
- Justified each classification using observed patterns (e.g., plot types missing `size`, etc.)
- Found that:
- `size`, `bath`, `balcony`, and `society` = **MAR**
- `location` = **MCAR**
---
### 2. ๐ Data Aggregation and Grouping
- Loaded the **NYT COVID-19 Dataset** for U.S. counties in 2020.
- Aggregated **average COVID-19 cases by county** (or state).
- Rounded results to 2 decimal places for clean presentation.
- Displayed top 10 and bottom 5 counties for insights.
---
### 3. โฑ๏ธ Time Series Analysis
- Converted the `date` column to `datetime` format.
- Extracted short-form month names (e.g., Jan, Feb) and converted to categorical for ordering.
- Filtered data for **California**.
- Generated a **line plot** showing **monthly total COVID-19 cases** for the state.
---
## ๐ Libraries Used
- `pandas`
- `matplotlib`
- `seaborn` (optional)
---
## ๐ Dataset Sources
- **House Prices Dataset** (provided by AltSchool)
- **NYT COVID-19 Data**
๐ [us-counties-2020.csv](https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2020.csv)
---
## ๐ง Author
- **Akintomiwa Akinpelu**
- AltSchool of Data Science โ Tinyuka Track
- 2024 Second Semester Project
---
## ๐ How to Run
1. Clone this repo:
```bash
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
```
2. Install dependencies:
```bash
pip install pandas matplotlib
```
3. Open the notebook:
```bash
jupyter notebook Akintomiwa_Akinpelu.ipynb
```
---
## ๐ฌ Feedback
Feel free to open an issue or submit a pull request if you'd like to improve the project!