https://github.com/tyleracorn/covid_alberta

looking at some of the alberta specific covid data
https://github.com/tyleracorn/covid_alberta
alberta canada covid-19 covid19-data webscraper
Last synced: 2 months ago
JSON representation
looking at some of the alberta specific covid data
Host: GitHub
URL: https://github.com/tyleracorn/covid_alberta
Owner: tyleracorn
License: apache-2.0
Created: 2020-03-26T03:19:07.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2023-04-12T00:02:03.000Z (about 3 years ago)
Last Synced: 2026-02-12T02:14:23.705Z (4 months ago)
Topics: alberta, canada, covid-19, covid19-data, webscraper
Language: HTML
Homepage: http://tyleracorn.com/covid_alberta/
Size: 4.9 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # Covid Alberta

> This is a small package that I have developed to look at some of the alberta specific covid data.

This file will become your README and also the index of your documentation.

## Install

`pip install covid_alberta`

## Web Scraper

The `albertaC19` is a class that scrapes the updated stats off of the [alberta Covid-19 website](https://covid19stats.alberta.ca/).

example of using the webscraper

```

abC19scaper = covid_alberta.albertaC19(outputfolder="")

# I don't plan on writing out the data in this example thus the keywords

ab_totals, ab_regions, ab_testing = abC19scaper.scrape_all(fltypes=None, return_dataframes=True)

```

    ---------------------------------------------------------------------------

    IndexError                                Traceback (most recent call last)

     in 

          1 abC19scaper = covid_alberta.albertaC19(outputfolder="")

          2 # I don't plan on writing out the data in this example thus the keywords

    ----> 3 ab_totals, ab_regions, ab_testing = abC19scaper.scrape_all(fltypes=None, return_dataframes=True)

    

    c:\Repositories_C\covid_alberta\covid_alberta\webscraper.py in scrape_all(self, totalfl, regionsfl, testfl, fltypes, combine_dataframes, return_dataframes)

        335 

        336         '''

    --> 337         totals = self.scrape_albertaTotals(output_filename=totalfl, fltypes=fltypes, return_dataframe=return_dataframes)

        338         regions = self.scrape_albertaRegions(output_filename=regionsfl, fltypes=fltypes, return_dataframe=return_dataframes)

        339         testing = self.scrape_albertaTesting(output_filename=testfl, fltypes=fltypes, return_dataframe=return_dataframes)

    

    c:\Repositories_C\covid_alberta\covid_alberta\webscraper.py in scrape_albertaTotals(self, output_filename, fltypes, update_figure_order, return_dataframe)

        177         # Scrape the data

        178         ab_cumulative = json.loads(totals_results[fig_order['cum_cases']].string)

    --> 179         ab_daily_cases = json.loads(totals_results[fig_order['daily_cases']].string)

        180         ab_case_status = json.loads(totals_results[fig_order['case_status']].string)

        181 

    

    IndexError: list index out of range

Now we can show the dataframes

```

ab_totals.tail()

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      cum_cases

      Confirmed_count

      Probable_count

      Daily_count

      Active_cum

      Died_cum

      Recovered_cum

    

  

  

    

      2020-04-04

      1250

      38

      19

      57

      618

      23

      322

    

    

      2020-04-05

      1308

      35

      23

      58

      676

      24

      382

    

    

      2020-04-06

      1344

      20

      16

      36

      712

      27

      449

    

    

      2020-04-07

      1409

      39

      26

      65

      776

      27

      518

    

    

      2020-04-08

      1423

      9

      5

      14

      876

      29

      518

    

  



```

ab_regions.tail()

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      Calgary_cumulative

      Central_cumulative

      Edmont_cumulative

      North_cumulative

      South_cumulative

      Unknown_cumulative

    

  

  

    

      2020-04-04

      778

      61

      315

      75

      19

      2

    

    

      2020-04-05

      801

      65

      340

      79

      21

      2

    

    

      2020-04-06

      821

      65

      348

      86

      22

      2

    

    

      2020-04-07

      854

      72

      364

      94

      23

      2

    

    

      2020-04-08

      860

      72

      368

      95

      26

      2

    

  



```

ab_testing.tail()

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      test_count

    

  

  

    

      2020-04-04

      1737

    

    

      2020-04-05

      1112

    

    

      2020-04-06

      1129

    

    

      2020-04-07

      1319

    

    

      2020-04-08

      459

    

  



These are all pandas DataFrames. For more info on using pandas check out the pandas [cookbook](https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html).

## analysis

> these are functions that I have started working on for some quick analyses of the data. The main one being doubling rates

### Doubling times

the `calculate_doublingtimes` function returns 2 columns.

> `dtime` is how many days our count has been doubling from the first reported case to get to todays case count

> `dtime_rw` is a rolling window calcualtion. So if you window is 6 days it looks at what our doubling rate, starting from the case count 6 days ago, would have to be to get to todays case count.

I started off looking at the rolling window calculation. However the more I look into it the more I'm not happy with using the rolling window. Our information about Covid-19 cases are changing so rapidly, that the rolling window calculation tends to be too noisy and too optimistic to be useful. We can calculate both below and see what they look like

```

totals_dt = covid_alberta.calculate_doublingtimes(ab_totals, col_suffix="cum_cases", combine_df=False)

regions_dt = covid_alberta.calculate_doublingtimes(ab_regions, col_suffix="cumulative", combine_df=False)

totals_dt.tail()

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      dtime

      dtime_rw

    

  

  

    

      2020-04-04

      2.818897

      7.119992

    

    

      2020-04-05

      2.897670

      7.353586

    

    

      2020-04-06

      2.982973

      9.613334

    

    

      2020-04-07

      3.059140

      11.617191

    

    

      2020-04-08

      3.150442

      17.176893

    

  



```

regions_dt.tail()

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      Calgary_dtime

      Calgary_dtime_rw

      Central_dtime

      Central_dtime_rw

      Edmont_dtime

      Edmont_dtime_rw

      North_dtime

      North_dtime_rw

      South_dtime

      South_dtime_rw

      Unknown_dtime

      Unknown_dtime_rw

    

  

  

    

      2020-04-04

      3.019693

      7.296903

      4.046714

      14.735665

      3.872364

      5.864623

      3.692514

      8.141493

      4.472769

      7.609425

      0

      0

    

    

      2020-04-05

      3.110208

      7.587349

      4.151191

      12.826571

      3.956375

      6.261873

      3.807239

      8.008629

      4.553405

      6.431655

      0

      0

    

    

      2020-04-06

      3.202070

      9.970858

      4.317239

      18.637702

      4.082834

      8.636192

      3.890285

      7.959255

      4.709120

      6.000000

      0

      0

    

    

      2020-04-07

      3.286065

      12.181763

      4.376066

      15.441420

      4.189037

      11.309771

      3.966687

      8.029614

      4.863424

      7.289318

      0

      0

    

    

      2020-04-08

      3.385243

      19.656061

      4.538143

      20.885405

      4.323639

      15.835158

      4.109679

      9.387934

      4.893159

      8.566048

      0

      0

    

  



## Plots

Here is some of the plots I've used for looking at the data. For this example I'm using matplotlib. Plotly creates nice plots but is a little harder to include in this documentation since it's hosted on github pages. If you head over to [my website](www.tyleracorn.com) I'll post the plotly code and example of the interactive plots there.

```

import matplotlib.pyplot as plt

# Set defaults and settings

days_to_trim = 1

date_fmt = "%B %d"

# Grab the data we want for the plots and trim the last day off

plt_totals = ab_totals[:-days_to_trim]

plt_total_dt = totals_dt[:-days_to_trim]

plt_regions = ab_regions[:-days_to_trim]

plt_regions_dt = regions_dt[:-days_to_trim]

# use a format dictionary so I only have to set them in one location

fmt = {'alb': {'x_data': plt_totals['cum_cases'],

               'y_data': plt_total_dt['dtime'],

               'last_date': plt_totals.index.strftime(date_fmt)[-1],

               'annot_x': plt_totals['cum_cases'][-1],

               'annot_y': plt_total_dt['dtime'][-1],

               'color': 'green',

               'label': 'Alberta'},

       'cal': {'x_data': plt_regions['Calgary_cumulative'],

               'y_data': plt_regions_dt['Calgary_dtime'],

               'last_date': plt_regions.index.strftime(date_fmt)[-1],

               'annot_x': plt_regions['Calgary_cumulative'][-1],

               'annot_y': plt_regions_dt['Calgary_dtime'][-1],

               'color': 'orange',

               'label': 'Calgary'},

       'edm': {'x_data': plt_regions['Edmont_cumulative'],

               'y_data': plt_regions_dt['Edmont_dtime'],

               'last_date': plt_regions.index.strftime(date_fmt)[-1],

               'annot_x': plt_regions['Edmont_cumulative'][-1],

               'annot_y': plt_regions_dt['Edmont_dtime'][-1],

               'color': 'blue', 

               'label': 'Edmonton'},

      }

# Setup the plot

fig, ax = plt.subplots(figsize=(8,6))

# Create the scatter plots using a loop and the dictionary above

for rgn in ['alb', 'cal', 'edm']:

    ax.plot(fmt[rgn]['x_data'], fmt[rgn]['y_data'], 

            c=fmt[rgn]['color'], label=fmt[rgn]['label'])

# add an annotation to the last point

for rgn in ['alb', 'cal', 'edm']:

    ax.plot(fmt[rgn]['annot_x'], fmt[rgn]['annot_y'], 'o', c=fmt[rgn]['color'])

    ax.text(fmt[rgn]['annot_x'] - 60, fmt[rgn]['annot_y'] + 0.08, fmt[rgn]['last_date'], 

            fontdict={'color': fmt[rgn]['color'], 'size': 8, 'weight': 'bold'})

# fancy up the plot

ax.grid(which='both', linestyle=(0, (5, 3)), lw=0.5)

ax.legend(frameon=True, fancybox=True, shadow=True)

ax.set_ylabel('Doubling Time (Days)', fontdict={'size': 9, 'family': 'sans-serif', 'style':'italic'})

ax.set_xlabel('Cumulative Case Count', fontdict={'size': 9, 'family': 'sans-serif', 'style':'italic'})

title = ax.set_title("Alberta: Doubling Time by Cumulative Cases",

                     fontdict={'fontsize': 10, 'family': 'sans-serif', 'fontweight': 'bold'})

```

![png](docs/images/output_14_0.png)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tyleracorn/covid_alberta

Awesome Lists containing this project

README