Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gino-freud-hobayan/exploratory_data_analysis_covid19_on_python

Exploratory Data Analysis (EDA) on Python
https://github.com/gino-freud-hobayan/exploratory_data_analysis_covid19_on_python

data-cleaning data-visualization exploratory-data-analysis python

Last synced: about 1 month ago
JSON representation

Exploratory Data Analysis (EDA) on Python

Host: GitHub
URL: https://github.com/gino-freud-hobayan/exploratory_data_analysis_covid19_on_python
Owner: Gino-Freud-Hobayan
Created: 2023-07-30T13:52:56.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-08-22T18:31:06.000Z (over 1 year ago)
Last Synced: 2024-11-12T12:05:36.346Z (3 months ago)
Topics: data-cleaning, data-visualization, exploratory-data-analysis, python
Language: Jupyter Notebook
Homepage: https://docs.google.com/presentation/d/14flA5fAz6sI6FoIZjT0i-NIM9b48pMiJRTlnwcdTRK8/edit?usp=sharing
Size: 877 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Exploratory Data Analysis (EDA) on Python about COVID-19 Cases in the City of Manila from 2020-2023




# Descriptive and Diagnostic Analytics on COVID-19 data

- **Descriptive Analytics** tells you **WHAT** happened in the past. 

- **Diagnostic Analytics** helps you understand **WHY** something happened in the past.







### **Capstone Project:** 

### Python for Data Analysis Bootcamp by [Data Vanguard](https://datavanguard.ph/)















### Link to the Google Drive

### - Slideshow/Presentation: [https://docs.google.com/presentation/d/14flA5fAz6sI6FoIZjT0i-NIM9b48pMiJRTlnwcdTRK8/edit?usp=sharing  ](https://drive.google.com/file/d/1bDyJTsF0UFhrhkldfWvFMxq_NXGu3VD3/view?usp=sharing)

### - covid2020-2023.csv: https://drive.google.com/drive/folders/1LRZy4zms4wMs-iYtKuLsUbMuhy6gyuwB?usp=sharing






# Awards:

### Best Data Handling:

![Best_data_handling_-_Alchemist_page-0001](https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/e0685018-e859-4f53-8d5e-6c27a195adc9)




### Best Storytelling:

![Best_storytelling_-Alchemist_page-0001](https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/60d9a111-97d6-48e5-b990-4a4c883c847b)










We used COVID-19 data from the Dept. of Health for this Capstone Project.

We were able to practice dealing with an extremely large dataset (around 4.1 million rows)

We performed the following:




## **1. ROCCC for the Reliability of the Dataset**

The dataset follows the ROCCC Analysis as described below:

- Reliable - yes, not biased

- Original - yes, can locate the original public data

- Comprehensive - yes, not missing important information

- Current - yes, updated monthly

- Cited - yes





## **2. Data Cleaning**



We dealt with 5 batches containing 4.1 million + rows of data in total.

After filtering and data cleaning, we now only had 168,000 + rows of data to work with.




## **Data Cleaning steps:**

- Filtered the dataset to only include data from the “City of Manila”

- Checked for Duplicates

- Checked for Missing/Null Values

- Dropped Unnecessary Columns

- Replaced the null values with appropriate data like “Not Recorded”

- Merged the five cleaned datasets into one 

I posted a Jupyter notebook of one of the batches (**"batch_4_covid_manilaupdated3.ipynb"**) that shows how we did our Data Cleaning.




One of our groupmates Erwin did the Majority of the Data Cleaning 

I learned a lot from him and became more confident and competent in my Data Cleaning skills in Python.







## **3. Exploratory Data Analysis**



```python

import numpy as np 

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import plotly.express as px

    # Library to perform Statistical Analysis.

from scipy import stats

from scipy.stats import chi2

from scipy.stats import chi2_contingency

    # Library to Display whole Dataset.

pd.set_option("display.max.columns",None)

pd.set_option("display.max.rows",None)

    # to suppress warnings

import warnings

warnings.filterwarnings('ignore')

covid_df = pd.read_csv("covid2020-2023.csv")

```

We performed multiple Data Visualizations on our merged and cleaned Datasets

I posted the Jupyter Notebook that covers all of it including my revised Data Visualizations 

#### (**"EDA_covid_manila_FINAL_Gino.ipynb"**)










## **4. Key Findings:**

1. Age group **25-29** has the Highest number of cases, followed shortly by **30-34** and **20-24**, most likely these are the F2F and Healthcare workers who are in contact with a large number of people daily.

2. Most of the deaths occurred for Ages 50 and up. One of the reasons might be comorbidities that come with older age and a weaker immune system.

3. We have seen earlier that **the age group of 25 to 29 had the highest number of cases yet they had one of the least number of deaths.**

4. Several counts of death in children and adolescents were observed. These deaths are uncommon, and their deaths might also be linked to some underlying conditions. Additionally, for infants, a possible reason may be that their immune system is not yet well developed.

5. A huge spike in the number of cases occurred from 2020-2021.

6. Administration of COVID-19 vaccines helped reduce the number of cases

7. The dataset contains **ages ranging from 2 to 80 yrs old (spread)**, with a **median age of 32 years.**

8. **The majority of individuals fall between ages 27 and 47**, as indicated by the interquartile range (IQR) of 20. 

    This indicates that **the data might be slightly positively skewed**. This means that the distribution may have a longer tail on the right (higher) side.

9. **The presence of an 80-year-old individual may be considered an outlier**, indicating an unusual or extreme age compared to the rest of the dataset. 




### Overall, the data exhibits a diverse age distribution, with a notable concentration of cases in the middle age range.

    





     

## **Limitations:**

    

#### The analysis is based on the available dataset from DOH. 

#### Data for 2023 is currently inconclusive and is still being updated by DOH. 

#### Additionally, the dataset contains a lot of null values thus affecting the accuracy of the analysis.

    


 







## **5. Conclusion and Recommendations**



### 1. We recommend that people get vaccinated and take the booster shots, as the data clearly shows a large drop in cases once the vaccines started rolling out.





### 2. It is recommended that the elderly and senior citizens have minimal contact with many people since they have the highest fatality rate out of all the Age Groups that contracted the virus in this dataset.





### 3. As they say: “an ounce of prevention is worth a pound of cure”

Neighbors such as Singapore, Taiwan, and Vietnam swiftly implemented preventive measures.

- large-scale public health campaigns 

- calibrated restrictions on public events and gatherings

- proactive contact tracing to prevent intra-community transmission 

- Regular and transparent communication between top officials and the citizenry.





### 4. Don’t be Complacent 

- There was a huge spike in cases back in January 2022, most likely due to Holiday gatherings, complacency, and the presence of the Omicron variant

    http://www.cnnphilippines.com/news/2022/1/1/PH-COVID-19-cases-New-Year-s-Day-.html

    https://www.reuters.com/business/healthcare-pharmaceuticals/philippines-confirms-community-transmission-omicron-cases-hit-record-2022-01-15/





### 5. Quarantine protocols are effective. 

If a person tests positive for COVID, they should immediately take action.

Summary:

- Number of people who survived: 17,968

- Number of people who died: 252

- Percentage of people who survived: 98.61%

- Percentage of people who died: 1.39%

This analysis shows that **the majority of people who were quarantined (98.61%) survived, while a small percentage (1.39%) unfortunately died.** 

The data suggests that the quarantine protocols had a relatively high effectiveness in preventing fatalities during the quarantine period.





### 6. We should learn from our neighboring Countries and the Government should act swiftly in times like these. 

If the travel ban on airports was implemented earlier, it could have lessened the spread of the virus.

Imagine the number of lives you can save.

In a situation where **actions taken early** can have a much bigger impact than actions taken later, time is a crucial factor.





















![Thank you wordcloud1](https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/014fff25-dd93-47dc-a37e-189110787894)