{"id":16086809,"url":"https://github.com/olekscode/covidanalysis","last_synced_at":"2025-04-05T14:43:39.523Z","repository":{"id":88081342,"uuid":"251710853","full_name":"olekscode/CovidAnalysis","owner":"olekscode","description":"A setup for COVID-19 data analysis in Pharo","archived":false,"fork":false,"pushed_at":"2020-04-01T02:01:09.000Z","size":399,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-11T11:41:21.389Z","etag":null,"topics":["coronavirus","covid-19","data-analysis","pharo"],"latest_commit_sha":null,"homepage":"","language":"Smalltalk","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/olekscode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-31T19:31:54.000Z","updated_at":"2020-07-31T01:22:34.000Z","dependencies_parsed_at":"2023-05-18T05:00:09.975Z","dependency_job_id":null,"html_url":"https://github.com/olekscode/CovidAnalysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olekscode%2FCovidAnalysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olekscode%2FCovidAnalysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olekscode%2FCovidAnalysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olekscode%2FCovidAnalysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/olekscode","download_url":"https://codeload.github.com/olekscode/CovidAnalysis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247353676,"owners_count":20925325,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coronavirus","covid-19","data-analysis","pharo"],"created_at":"2024-10-09T13:25:19.697Z","updated_at":"2025-04-05T14:43:39.504Z","avatar_url":"https://github.com/olekscode.png","language":"Smalltalk","readme":"# COVID-19 Analysis\n\nIn this repository, I provide an initial setup for analysing the daily-updated COVID-19 dataset published by the European Centre for Disease Prevention and Control: https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide. This dataset contains the following fields: **date**, **country**, number of **cases** reported on a given day, number of **deaths** reported on a given day, and a total **population** of the country as of 2018, taken from the [World Bank Open Data](https://data.worldbank.org/).\n\n## How to install it?\n\nTo install `CovidAnalysis`, go to the Playground (Ctrl+OW) in your [Pharo](https://pharo.org/) image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):\n\n```Smalltalk\nMetacello new\n  baseline: 'CovidAnalysis';\n  repository: 'github://olekscode/CovidAnalysis/src';\n  load.\n```\n\n## How to use it?\n\nFirst create an instance of `CovidDataLoader`. This class will help you download the latest data from the Internet, clean it and load it into your image as a [DataFrame](https://github.com/PolyMathOrg/DataFrame) object.\n\n```Smalltalk\ndataLoader := CovidDataLoader new.\n```\n\nUse the following method to download the latest data. It may take a couple of seconds. The result will be stored as a CSV inside a `data/` folder of this repository.\n\n```Smalltalk\ndataLoader downloadLatestData.\n```\n\nNow you can read the downloaded data from a CSV. This method will automatically clean and parse the values of a dataset:\n\n```Smalltalk\ncovidData := dataLoader loadData.\n```\n\nThe result will be a data frame that looks like this:\n\n![DataFrame of COVID-19 data](img/covidData.png)\n\n### Example of Data Analysis\n\nLet's find top 10 countries by the number of reported cases and number of reported deaths as of March 31, 2020:\n\n```Smalltalk\n(covidData group: 'cases' by: 'country' aggregateUsing: #sum)\n\tsortDescending\n\thead: 10.\n\n(covidData group: 'deaths' by: 'country' aggregateUsing: #sum)\n\tsortDescending\n\thead: 10.\n```\n\n![Top 10 countries by the number of reported cases and number of reported deaths as of March 31, 2020](img/topCountries.png)\n\nNow we will look at the historical data of how COVID-19 was spreading in one specific country, in this case - France:\n\n```Smalltalk\ncovidDataFrance := covidData select: [ :row |\n    (row at: 'country') = 'France' ].\n```\n\nEvery row of this new data frame will have the same values in columns **country** and **population**. So we can remove those columns. But first, let's save the population of France in a separate variable, in case we need it later:\n\n```Smalltalk\npopulationOfFrance := (covidDataFrance column: 'population') anyOne.\n\ncovidDataFrance removeColumns: #(country population).\n```\n\nWe get the following data frame:\n\n![DataFrame of COVID-19 data for France](img/covidDataFrance.png)\n\nWe can find the days on which there were the most reported cases and the most deaths in France:\n\n```Smalltalk\nmaxDailyCases := (covidDataFrance column: 'cases') max. \"4611\"\nmaxDailyDeaths := (covidDataFrance column: 'deaths') max. \"418\"\n\ncovidDataFrance detect: [ :row | (row at: 'cases') = maxDailyCases ].\ncovidDataFrance detect: [ :row | (row at: 'deaths') = maxDailyDeaths ].\n```\n\n![Days with most reported cases and deaths](img/maxDailyCasesAndDeaths.png)\n\nWe can see that so far March 29 had the most reported cases - 4,611, and today, on March 31 there were the most deaths - 418 people died today in France.\n\nLet's add two more columns: cumulative sum of cases and deaths. Cumulative sum tells us the total number of cases reported until the given date. For example, if there were 5 cases reported on Monday, no cases on Tuesday, and 12 cases on Wednesday, then the cumulative sum for those days will be 5 for Monday, 5 for Tuesday (5 + 0), and 17 for Wednesday (5 + 0 + 12).\n\n```Smalltalk\ncumulativeSum := [ :column |\n    sum := 0.\n    column collect: [ :each |\n        sum := sum + each.\n\tsum ] ].\n\ncumulativeCases := cumulativeSum value: (covidDataFrance column: 'cases').\ncumulativeDeaths := cumulativeSum value: (covidDataFrance column: 'deaths').\n\ncovidDataFrance addColumn: cumulativeCases named: 'cumulativeCases'.\ncovidDataFrance addColumn: cumulativeDeaths named: 'cumulativeDeaths'.\n```\n\nNow `covidDataFrance` data frame looks like this: \n\n![DataFrame of COVID-19 data for France with cumulative cases and deaths](img/covidDataFranceCumulative.png)\n\nLet's find out how many days it took for disease to spread from 10 cases to 100 cases, as well as from 100 cases to 1000 cases. The following block will find the date in the given data frame on which the total number of reported cases reached the given number: \n\n```Smalltalk\nfindMilestone := [ :dataFrame :cases | \n    (dataFrame detect: [ :row | (row at: 'cumulativeCases') \u003e= cases ]) at: 'date' ].\n```\n\nWe find three milestones:\n\n```Smalltalk\nfrance10CasesMilestone := findMilestone value: covidDataFrance value: 10. \"8 February 2020\"\nfrance100CasesMilestone := findMilestone value: covidDataFrance value: 100. \"1 March 2020\"\nfrance1000CasesMilestone := findMilestone value: covidDataFrance value: 1000. \"9 March 2020\"\n```\n\nAnd calculate:\n\n```Smalltalk\n(france100CasesMilestone - france10CasesMilestone) days. \"22\"\n(france1000CasesMilestone - france100CasesMilestone) days. \"8\"\n```\n\nSo it took 22 days for COVID-19 to spread from 10 to 100 reported cases in France. And then only 8 more days to reach 1000 cases.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Folekscode%2Fcovidanalysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Folekscode%2Fcovidanalysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Folekscode%2Fcovidanalysis/lists"}