https://github.com/sodascience/disease_database
Historical disease database (19th-20th century) for municipalities in the Netherlands
https://github.com/sodascience/disease_database
demography geospatial-data health history
Last synced: 11 months ago
JSON representation
Historical disease database (19th-20th century) for municipalities in the Netherlands
- Host: GitHub
- URL: https://github.com/sodascience/disease_database
- Owner: sodascience
- License: mit
- Created: 2024-06-05T11:07:21.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-05-08T14:34:18.000Z (about 1 year ago)
- Last Synced: 2025-05-08T15:34:26.535Z (about 1 year ago)
- Topics: demography, geospatial-data, health, history
- Language: Python
- Homepage:
- Size: 11.4 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Disease database
[](https://www.repostatus.org/#active)
[](https://github.com/sodascience/disease_database/releases/latest)

Code to create a historical disease database (19th-20th century) for municipalities in the Netherlands.

_Cholera mention rates in the mid-1860s. [source code](src/analysis/create_map.R)_
This database was produced by:
- π Harvesting >80 million Dutch newspaper texts in the period 1830-1940 from [Delpher](https://www.delpher.nl/).
- π Finding mentions of locations and diseases in these texts via [hand-crafted regex](./raw_data/manual_input).
- π½ Processing the results and creating a user-friendly historical disease database for the following diseases:
- cholera, diphteria, dysentery, influenza, malaria, measles, scarlet fever, smallpox, tuberculosis, and typhus.
β¬ [Download the database from the latest release page](https://github.com/sodascience/disease_database/releases/latest) β¬
Other resources related to this database:
- π»ββοΈ [Polars](https://pola.rs) is the engine that powers this data processing pipeline, together with the [Apache Parquet format](https://parquet.apache.org/)
- π [NLGIS](https://nlgis.nl) Provides historical geographic data for mapping, plotting, and more.
- πΊοΈ [Disease database viewer](https://github.com/sodascience/disease_database_viewer): an experimental R shiny app to interactively view the disease database.
- π΅οΈββοΈ [Initial exploration into smoothing](https://erikjanvankesteren.nl/blog/smooth_disease) the mention rates within the disease database, using spatial, temporal, and spatiotemporal models.
## Installation
This project uses [pyproject.toml](pyproject.toml) to handle its dependencies. You can install them using pip like so:
```sh
pip install .
```
However, we recommend using [uv](https://github.com/astral-sh/uv) to manage the environment. First, install uv, then clone / download this repo, then run:
```sh
uv sync
```
this will automatically install the right python version, create a virtual environment, and install the required packages. If you choose not to use `uv`, you can replace `uv run` in the code examples in this repo with `python`.
> π macOS note: if you encounter `error: command 'cmake' failed: No such file or directory`, you need to install [cmake](https://cmake.org/download/) first, e.g., through `brew install cmake`. Similarly, you may have to install `apache-arrow` separately as well (`brew install apache-arrow`). Once these dependency issues are solved, run `uv sync` one more time.
## Running the data processing pipeline
The full data processing pipeline looks like this:

Each of the separate processing steps (rectangles in the above image) has its own subfolder with its own readme documentation:
- Open archive processing in [`./src/process_open_archive/`](./src/process_open_archive/)
- Delpher API harvesting in [`./src/harvest_delpher_api/`](./src/harvest_delpher_api/)
- Final database creation in [`./src/create_database/`](./src/create_database/)
## Data analysis
For a basic analysis after the database has been created, take a look at the file [`src/analysis/query_db.py`](src/analysis/query_db.py).

For more in-depth analysis and usage scripts, take a look at our analysis repository: [disease_database_analysis](https://github.com/sodascience/disease_database_analysis).
## Contact
This project is developed and maintained by the [ODISSEI Social Data
Science (SoDa)](https://odissei-soda.nl) team.
Do you have questions, suggestions, or remarks? File an issue in the
issue tracker or feel free to contact the team at [`odissei-soda.nl`](https://odissei-soda.nl)