https://github.com/alext234/coronavirus-stats
Automatically scrape data and statistics on Coronavirus to make them easily accessible in CSV format
https://github.com/alext234/coronavirus-stats
australia cdc china coronavirus covid-19 data-science europe health italy jupyter-notebook pipeline scraping-data singapore south-korea stats usa wuhan-virus
Last synced: 3 months ago
JSON representation
Automatically scrape data and statistics on Coronavirus to make them easily accessible in CSV format
- Host: GitHub
- URL: https://github.com/alext234/coronavirus-stats
- Owner: alext234
- License: apache-2.0
- Created: 2020-01-31T11:07:03.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-23T04:14:49.000Z (almost 3 years ago)
- Last Synced: 2023-12-16T15:54:26.528Z (over 2 years ago)
- Topics: australia, cdc, china, coronavirus, covid-19, data-science, europe, health, italy, jupyter-notebook, pipeline, scraping-data, singapore, south-korea, stats, usa, wuhan-virus
- Language: Jupyter Notebook
- Homepage:
- Size: 774 MB
- Stars: 47
- Watchers: 4
- Forks: 19
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

### CSV Data on Coronavirus (COVID-19)
This repository contains data in (CSV format) which are scraped from reliable sources (e.g. World Health Organisation).
* Data are scraped a few times daily and pushed back to this repository together with generated charts (.PNG files).
* Look for those **CSV direct link** below to get the scraped historical data.
### Aggregate sites
#### [BNO News](https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/)
NOTE: paused on 1 Jan 2021
Below are international stats, excluding China.
[CSV direct link](data/bnonews-international.csv?raw=true)
Bar chart of the latest snapshot.

### WHO & Government sites
#### From WHO (World Health Organisation) Situation reports
NOTE: paused on 16 Jun 2020 due to format changes from WHO.
Data are scraped from [these reports](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/) which are in PDF formats. New reports are released daily.
#### Globally confirmed cases
[CSV direct link](data/who-global-cases.csv?raw=true)

### Stats from Australia
NOTE: paused on 10 July 2020
Data is pulled from Department of Health [website](https://www.health.gov.au/news/coronavirus-update-at-a-glance).
#### Cases in Australia
[CSV direct link](data/australia-cases.csv?raw=true)

### Stats from Singapore
Data are scraped from the MOH (Ministry of Health) local situation [web page](https://www.moh.gov.sg/2019-ncov-wuhan).
#### Cases in Singapore
NOTE: paused on 27 Jun 2020.
[CSV direct link](data/singapore-cases.csv?raw=true)
### From US CDC (Centers for Disease Control and Prevention)
NOTE: paused on 27 Jun 2020 due to format changes from CDC.
#### Cases in the US (data are scraped from [here](https://www.cdc.gov/coronavirus/2019-ncov/cases-in-us.html))
* Till 18 Apr 2020:
[CSV direct link](data/cdc-us-cases.csv?raw=true)
* From 18 Apr 2020, the data format CDC website has been changed to include races and age groups.
[CSV direct link](data/cdc-us-cases-by-races-and-age-group.csv?raw=true)
* From 7 May 2020,
[CSV direct link](data/us-cdc-total-cases-deaths.csv?raw=true)
### Stats from China
[This page](https://ncov.dxy.cn/ncovh5/view/pneumonia) has the realtime stats from China. Data are pulled several times a day by the pipeline.
#### All cases in China
[CSV direct link](data/china-summary-cases.csv?raw=true)

### How it works
* Jupyter notebooks are used for scraping data and output to CSV files
* These notebooks are executed on a schedule by Github Actions pipeline to scrape new data
* This pipeline also commits back new data to this repository
### Development
* Tools: Python3, Jupyter, Pandas, BeautifulSoup and related stuff (e.g. Selenium for web-scraping).
It is recommended to start the development environment with this docker image, which is also used for the Github Actions build pipeline.
```
docker run -p 8888:8888 -it -v $PWD:/stats -w /stats alext234/datascience:latest bash
```
* [requirements.txt](requirements.txt) contains Python dependencies
```
pip install -r requirements.txt
```
* Start Jupyter notebook from inside the container and then visit the browser at `http://localhost:8888`
```
jupyter notebook --allow-root --ip=0.0.0.0
```
### Contributions
* Feel free to create new issues for any potential data source worth scraping.
* Pull requests are welcomed!
### Repo status and stats
* Stargazers

* Last update from pipeline

* Pipeline status
