{"id":19834794,"url":"https://github.com/gino-freud-hobayan/exploratory_data_analysis_covid19_on_python","last_synced_at":"2026-05-18T04:12:32.890Z","repository":{"id":184837752,"uuid":"672559200","full_name":"Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python","owner":"Gino-Freud-Hobayan","description":"Exploratory Data Analysis (EDA) on Python","archived":false,"fork":false,"pushed_at":"2023-08-22T18:31:06.000Z","size":898,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-28T21:50:38.953Z","etag":null,"topics":["data-cleaning","data-visualization","exploratory-data-analysis","python"],"latest_commit_sha":null,"homepage":"https://docs.google.com/presentation/d/14flA5fAz6sI6FoIZjT0i-NIM9b48pMiJRTlnwcdTRK8/edit?usp=sharing","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Gino-Freud-Hobayan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-30T13:52:56.000Z","updated_at":"2023-08-22T17:27:52.000Z","dependencies_parsed_at":"2025-02-28T18:40:20.794Z","dependency_job_id":"8455b31a-0c5a-4de1-92f6-826ec3145cf1","html_url":"https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python","commit_stats":null,"previous_names":["gino-freud-hobayan/exploratory_data_analysis_on_python","gino-freud-hobayan/exploratory_data_analysis_covid19_on_python"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gino-Freud-Hobayan%2FExploratory_Data_Analysis_COVID19_on_Python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gino-Freud-Hobayan%2FExploratory_Data_Analysis_COVID19_on_Python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gino-Freud-Hobayan%2FExploratory_Data_Analysis_COVID19_on_Python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gino-Freud-Hobayan%2FExploratory_Data_Analysis_COVID19_on_Python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Gino-Freud-Hobayan","download_url":"https://codeload.github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gino-Freud-Hobayan%2FExploratory_Data_Analysis_COVID19_on_Python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279013269,"owners_count":26085250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-cleaning","data-visualization","exploratory-data-analysis","python"],"created_at":"2024-11-12T12:05:38.146Z","updated_at":"2025-10-12T22:33:24.044Z","avatar_url":"https://github.com/Gino-Freud-Hobayan.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Exploratory Data Analysis (EDA) on Python about COVID-19 Cases in the City of Manila from 2020-2023\n\n\u003cbr\u003e\n\n# Descriptive and Diagnostic Analytics on COVID-19 data\n- **Descriptive Analytics** tells you **WHAT** happened in the past. \n- **Diagnostic Analytics** helps you understand **WHY** something happened in the past.\n\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n### **Capstone Project:** \n\n### Python for Data Analysis Bootcamp by [Data Vanguard](https://datavanguard.ph/)\n\n\n\n\u003cimg width=\"450\" alt=\"Capstone FINAL EDA - revised by Gino\" src=\"https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/d5ca65be-6b73-4637-9209-496cac7e9d0c\"\u003e\n\n\u003cbr\u003e\u003cbr\u003e\n\n\u003cimg width=\"581\" alt=\"Python - cert of completion\" src=\"https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/4f68fdb4-9603-40b6-b362-560f5ba57bae\"\u003e\n\n\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\n### Link to the Google Drive\n### - Slideshow/Presentation: [https://docs.google.com/presentation/d/14flA5fAz6sI6FoIZjT0i-NIM9b48pMiJRTlnwcdTRK8/edit?usp=sharing  ](https://drive.google.com/file/d/1bDyJTsF0UFhrhkldfWvFMxq_NXGu3VD3/view?usp=sharing)\n\n### - covid2020-2023.csv: https://drive.google.com/drive/folders/1LRZy4zms4wMs-iYtKuLsUbMuhy6gyuwB?usp=sharing\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n# Awards:\n\n### Best Data Handling:\n![Best_data_handling_-_Alchemist_page-0001](https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/e0685018-e859-4f53-8d5e-6c27a195adc9)\n\n\n\u003cbr\u003e\n\n### Best Storytelling:\n![Best_storytelling_-Alchemist_page-0001](https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/60d9a111-97d6-48e5-b990-4a4c883c847b)\n\n\n\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\n\nWe used COVID-19 data from the Dept. of Health for this Capstone Project.\n\nWe were able to practice dealing with an extremely large dataset (around 4.1 million rows)\n\nWe performed the following:\n\n\u003cbr\u003e\n\n\n## **1. ROCCC for the Reliability of the Dataset**\n\nThe dataset follows the ROCCC Analysis as described below:\n- Reliable - yes, not biased\n- Original - yes, can locate the original public data\n- Comprehensive - yes, not missing important information\n- Current - yes, updated monthly\n- Cited - yes\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n## **2. Data Cleaning**\n\n\u003cimg width=\"377\" alt=\"Data cleaning pic\" src=\"https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/f540d704-b3ea-4e65-9b85-aa0667b15642\"\u003e\n\n\nWe dealt with 5 batches containing 4.1 million + rows of data in total.\n\nAfter filtering and data cleaning, we now only had 168,000 + rows of data to work with.\n\n\u003cbr\u003e\n\n## **Data Cleaning steps:**\n\n- Filtered the dataset to only include data from the “City of Manila”\n- Checked for Duplicates\n- Checked for Missing/Null Values\n- Dropped Unnecessary Columns\n- Replaced the null values with appropriate data like “Not Recorded”\n- Merged the five cleaned datasets into one \n\nI posted a Jupyter notebook of one of the batches (**\"batch_4_covid_manilaupdated3.ipynb\"**) that shows how we did our Data Cleaning.\n\n\u003cbr\u003e\n\nOne of our groupmates Erwin did the Majority of the Data Cleaning \n\nI learned a lot from him and became more confident and competent in my Data Cleaning skills in Python.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\n## **3. Exploratory Data Analysis**\n\n\u003cimg width=\"376\" alt=\"EDA pic1\" src=\"https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/7277fb63-1921-4e30-aa8e-ec7d3b7a73b5\"\u003e\n\n```python\nimport numpy as np \nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport plotly.express as px\n\n\n    # Library to perform Statistical Analysis.\nfrom scipy import stats\nfrom scipy.stats import chi2\nfrom scipy.stats import chi2_contingency\n\n    # Library to Display whole Dataset.\npd.set_option(\"display.max.columns\",None)\npd.set_option(\"display.max.rows\",None)\n\n    # to suppress warnings\nimport warnings\nwarnings.filterwarnings('ignore')\n\n\ncovid_df = pd.read_csv(\"covid2020-2023.csv\")\n```\n\n\nWe performed multiple Data Visualizations on our merged and cleaned Datasets\n\nI posted the Jupyter Notebook that covers all of it including my revised Data Visualizations \n#### (**\"EDA_covid_manila_FINAL_Gino.ipynb\"**)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\n\n## **4. Key Findings:**\n\n1. Age group **25-29** has the Highest number of cases, followed shortly by **30-34** and **20-24**, most likely these are the F2F and Healthcare workers who are in contact with a large number of people daily.\n\n2. Most of the deaths occurred for Ages 50 and up. One of the reasons might be comorbidities that come with older age and a weaker immune system.\n3. We have seen earlier that **the age group of 25 to 29 had the highest number of cases yet they had one of the least number of deaths.**\n4. Several counts of death in children and adolescents were observed. These deaths are uncommon, and their deaths might also be linked to some underlying conditions. Additionally, for infants, a possible reason may be that their immune system is not yet well developed.\n\n\n5. A huge spike in the number of cases occurred from 2020-2021.\n\n6. Administration of COVID-19 vaccines helped reduce the number of cases\n\n\n7. The dataset contains **ages ranging from 2 to 80 yrs old (spread)**, with a **median age of 32 years.**\n\n\n8. **The majority of individuals fall between ages 27 and 47**, as indicated by the interquartile range (IQR) of 20. \n\n    This indicates that **the data might be slightly positively skewed**. This means that the distribution may have a longer tail on the right (higher) side.\n\n\n9. **The presence of an 80-year-old individual may be considered an outlier**, indicating an unusual or extreme age compared to the rest of the dataset. \n\n\u003cbr\u003e\n\n### Overall, the data exhibits a diverse age distribution, with a notable concentration of cases in the middle age range.\n\n    \n\u003cbr\u003e\u003cbr\u003e\n\n     \n## **Limitations:**\n    \n#### The analysis is based on the available dataset from DOH. \n\n#### Data for 2023 is currently inconclusive and is still being updated by DOH. \n\n#### Additionally, the dataset contains a lot of null values thus affecting the accuracy of the analysis.\n    \n\n\u003cbr\u003e \n\n\n\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## **5. Conclusion and Recommendations**\n\n\u003cimg width=\"451\" alt=\"actions taken early vs later\" src=\"https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/b726b0b2-e760-42c5-9f88-1697cb1d1d46\"\u003e\n\n\n\n### 1. We recommend that people get vaccinated and take the booster shots, as the data clearly shows a large drop in cases once the vaccines started rolling out.\n\n\u003cbr\u003e\u003cbr\u003e\n\n### 2. It is recommended that the elderly and senior citizens have minimal contact with many people since they have the highest fatality rate out of all the Age Groups that contracted the virus in this dataset.\n\n\u003cbr\u003e\u003cbr\u003e\n\n### 3. As they say: “an ounce of prevention is worth a pound of cure”\nNeighbors such as Singapore, Taiwan, and Vietnam swiftly implemented preventive measures.\n- large-scale public health campaigns \n- calibrated restrictions on public events and gatherings\n- proactive contact tracing to prevent intra-community transmission \n- Regular and transparent communication between top officials and the citizenry.\n\n\u003cbr\u003e\u003cbr\u003e\n\n### 4. Don’t be Complacent \n- There was a huge spike in cases back in January 2022, most likely due to Holiday gatherings, complacency, and the presence of the Omicron variant\n\n    http://www.cnnphilippines.com/news/2022/1/1/PH-COVID-19-cases-New-Year-s-Day-.html\n\n    https://www.reuters.com/business/healthcare-pharmaceuticals/philippines-confirms-community-transmission-omicron-cases-hit-record-2022-01-15/\n\n\u003cbr\u003e\u003cbr\u003e\n\n### 5. Quarantine protocols are effective. \nIf a person tests positive for COVID, they should immediately take action.\n\nSummary:\n- Number of people who survived: 17,968\n- Number of people who died: 252\n- Percentage of people who survived: 98.61%\n- Percentage of people who died: 1.39%\n\nThis analysis shows that **the majority of people who were quarantined (98.61%) survived, while a small percentage (1.39%) unfortunately died.** \nThe data suggests that the quarantine protocols had a relatively high effectiveness in preventing fatalities during the quarantine period.\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### 6. We should learn from our neighboring Countries and the Government should act swiftly in times like these. \n\nIf the travel ban on airports was implemented earlier, it could have lessened the spread of the virus.\n\nImagine the number of lives you can save.\n\nIn a situation where **actions taken early** can have a much bigger impact than actions taken later, time is a crucial factor.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\u003cimg width=\"534\" alt=\"Save one life EDA\" src=\"https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/625129fe-c56d-4aae-9ca1-7eb3f839407c\"\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\n\n![Thank you wordcloud1](https://github.com/Gino-Freud-Hobayan/Exploratory_Data_Analysis_COVID19_on_Python/assets/117270964/014fff25-dd93-47dc-a37e-189110787894)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgino-freud-hobayan%2Fexploratory_data_analysis_covid19_on_python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgino-freud-hobayan%2Fexploratory_data_analysis_covid19_on_python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgino-freud-hobayan%2Fexploratory_data_analysis_covid19_on_python/lists"}