Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kirbs-/covid-19-dataset

US county level COVID-19 case data.
https://github.com/kirbs-/covid-19-dataset

covid-19 covid19-data

Last synced: about 1 month ago
JSON representation

US county level COVID-19 case data.

Awesome Lists containing this project

README

        

# covid-19-dataset
US county level COVID-19 case data.

Daily snapshots of US cases by county.

## County Data Status
| State | Scraper | Validator | Aggergator | Time Series |
|-------|---------|-----------|------------|-------------|
| AK | Y | N | N | N |
| AL | Y | N | N | N |
| CA | Y | N | N | N |
| CO | Y | N | N | N |
| DE | Y | N | N | N |
| FL | Y | N | N | N |
| GA | Y | N | N | N |
| IA | Y | N | N | N |
| KS | Y | N | N | N |
| KY | Y | N | N | N |
| LA | Y | N | N | N |
| MD | Y | N | N | N |
| ME | Y | N | N | N |
| MI | Y | N | N | N |
| MO | Y | N | N | N |
| MN | Y | N | N | N |
| MT | Y | N | N | N |
| NJ | Y | N | N | N |
| NY | Y | N | N | N |
| OH | Y | N | N | N |
| PA | Y | N | N | N |
| TN | Y | N | N | N |
| TX | Y | N | N | N |
| VA | Y | N | N | N |
| WA | Y | N | N | N |
| WY | Y | N | N | N |

## Project structure
```
/data # county level snapshots by scrape timestamp.
|
- {state}_by_county_{scraper_timestamp_in_EDT}.txt # snapshot of scraped results as of timestamp.
/source_page_backup # backup of source pages by scrape timestamp.
|
- {state}_county_{scrape_timestamp}.html # backup of source page. Extension depends on data source.
- main.ipynb # triggers crawler
- config.yaml # shared scraper configurations
- {state}_by_county.ipynb # State specific scapers
```

## Scraper Format
Scrapers are simple python scripts or jupyter notebooks that implement a fetch, save, and run method.
#### fetch()
_Returns_
- DataFrame containing positive cases by county.
- Source data - HTML page, etc.

Fetch is responsible for getting and processing a page into a Pandas DataFrame. Fetch must return a DataFrame must contain `county` and `positive_cases` columns (additional columns are fine) and a string containing the data source being scraped.

#### save(df, source)
_Params:_
- df (DataFrame): DataFrame containing `county` and `positive_cases` columns (additional columns are fine)
- source (str): string containing the data source page that was scraped.

Save handles persisting the Data Frame and source data. df is saved as a pipe delimited text file in the data directory with the scraping timestamp in EDT.

#### run()
Handles fetch and save in one action. Used in main crawling job.