https://github.com/jakubvalenta/covid-berlin-scraper

Download Covid-19 data from the official sources of the city of Berlin.
https://github.com/jakubvalenta/covid-berlin-scraper

berlin coronavirus covid-19 covid-data covid19-data germany

Last synced: 7 months ago
JSON representation

Download Covid-19 data from the official sources of the city of Berlin.

Host: GitHub
URL: https://github.com/jakubvalenta/covid-berlin-scraper
Owner: jakubvalenta
License: apache-2.0
Created: 2020-06-07T16:26:05.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-08-27T21:49:26.000Z (about 2 years ago)
Last Synced: 2025-01-30T11:11:35.693Z (9 months ago)
Topics: berlin, coronavirus, covid-19, covid-data, covid19-data, germany
Language: Python
Homepage:
Size: 6.97 MB
Stars: 2
Watchers: 4
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Scraper for Covid-19 Data in Berlin

Download Covid-19 data from the official sources of the city of Berlin:

- [Pressemitteilungen der Senatsverwaltung für Gesundheit, Pflege und
Gleichstellung](https://www.berlin.de/sen/gpg/service/presse/2020/)
- [COVID-19 in Berlin, Verteilung in den
Bezirken](https://www.berlin.de/lageso/gesundheit/infektionsepidemiologie-infektionsschutz/corona/tabelle-bezirke/)
- [COVID-19 in
Berlin](https://www.berlin.de/corona/lagebericht/desktop/corona.html)
(dashboard).

See [covid-berlin-data](https://www.github.com/jakubvalenta/covid-berlin-data)
for the data itself (updated daily).

## Installation

### Mac

``` shell
$ brew install python
$ pip install poetry
$ make setup
```

### Arch Linux

``` shell
# pacman -S poetry
$ make setup
```

### Other systems

Install these dependencies manually:

- Python >= 3.8.1
- poetry

Then run:

``` shell
$ make setup
```

## Usage

This program works in several steps:

1. Download **press releases** from the current RSS feed and save their metadata
to a database in the passed cache directory:

``` shell
$ ./covid-berlin-scraper --cache my_cache_dir --verbose download-feed
```

2. Download the current **district table** (_Verteilung in den Bezirken_) and
save the data to a database in the passed cache directory:

``` shell
$ ./covid-berlin-scraper --cache my_cache_dir --verbose download-district-table
```

3. Download the current **dashboard** and save the data in a database to the
passed cache directory:

``` shell
$ ./covid-berlin-scraper --cache my_cache_dir --verbose download-dashboard
```

4. (Optional) Download press releases from the **press release archive** and
save their metadata to the same database:

``` shell
$ ./covid-berlin-scraper --cache my_cache_dir --verbose download-archives
```

3. **Parse** the content of all press releases, district tables and dashboards
stored in the database and generate a CSV output:

``` shell
$ ./covid-berlin-scraper --cache my_cache_dir --verbose parse-press-releases \
-o my_output.csv \
--output-hosp my_output_incl_hospitalized.csv
```

## Help

See all command line options:

``` shell
$ ./covid-berlin-scraper --help
```

## Development

### Installation

``` shell
$ make setup
```

### Testing and linting

``` shell
$ make test
$ make lint
```

### Help

``` shell
$ make help
```

## Contributing

__Feel free to remix this project__ under the terms of the [Apache License,
Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jakubvalenta/covid-berlin-scraper

Awesome Lists containing this project

README