{"id":21685865,"url":"https://github.com/sodascience/disease_database","last_synced_at":"2025-07-14T03:37:40.783Z","repository":{"id":258170198,"uuid":"810790200","full_name":"sodascience/disease_database","owner":"sodascience","description":"Historical disease database (19th-20th century) for municipalities in the Netherlands","archived":false,"fork":false,"pushed_at":"2025-05-08T14:34:18.000Z","size":11995,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-08T15:34:26.535Z","etag":null,"topics":["demography","geospatial-data","health","history"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sodascience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-05T11:07:21.000Z","updated_at":"2025-04-03T12:08:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"975c5d51-6775-4e7d-a617-614d224a5665","html_url":"https://github.com/sodascience/disease_database","commit_stats":null,"previous_names":["sodascience/disease_database"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/sodascience/disease_database","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fdisease_database","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fdisease_database/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fdisease_database/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fdisease_database/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sodascience","download_url":"https://codeload.github.com/sodascience/disease_database/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fdisease_database/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261960502,"owners_count":23236573,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["demography","geospatial-data","health","history"],"created_at":"2024-11-25T16:23:26.868Z","updated_at":"2025-06-25T22:06:17.182Z","avatar_url":"https://github.com/sodascience.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Disease database \n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![GitHub Release](https://img.shields.io/github/v/release/sodascience/disease_database?include_prereleases)](https://github.com/sodascience/disease_database/releases/latest)\n![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json) \n\n\nCode to create a historical disease database (19th-20th century) for municipalities in the Netherlands.\n\n![Cholera in the Netherlands](img/cholera_1864_1868.png)\n\u003csup\u003e_Cholera mention rates in the mid-1860s. [source code](src/analysis/create_map.R)_\u003c/sup\u003e\n\n\nThis database was produced by:\n- 🚜 Harvesting \u003e80 million Dutch newspaper texts in the period 1830-1940 from [Delpher](https://www.delpher.nl/).\n- 🔎 Finding mentions of locations and diseases in these texts via [hand-crafted regex](./raw_data/manual_input).\n- 💽 Processing the results and creating a user-friendly historical disease database for the following diseases:\n  - cholera, diphteria, dysentery, influenza, malaria, measles, scarlet fever, smallpox, tuberculosis, and typhus.\n\n\n⏬ [Download the database from the latest release page](https://github.com/sodascience/disease_database/releases/latest) ⏬\n\n\nOther resources related to this database:\n\n- 🐻‍❄️ [Polars](https://pola.rs) is the engine that powers this data processing pipeline, together with the [Apache Parquet format](https://parquet.apache.org/)\n- 🌍 [NLGIS](https://nlgis.nl) Provides historical geographic data for mapping, plotting, and more.\n- 🗺️ [Disease database viewer](https://github.com/sodascience/disease_database_viewer): an experimental R shiny app to interactively view the disease database.\n- 🕵️‍♀️ [Initial exploration into smoothing](https://erikjanvankesteren.nl/blog/smooth_disease) the mention rates within the disease database, using spatial, temporal, and spatiotemporal models.\n\n## Installation\n\nThis project uses [pyproject.toml](pyproject.toml) to handle its dependencies. You can install them using pip like so:\n\n```sh\npip install .\n```\n\nHowever, we recommend using [uv](https://github.com/astral-sh/uv) to manage the environment. First, install uv, then clone / download this repo, then run:\n\n```sh\nuv sync\n```\n\nthis will automatically install the right python version, create a virtual environment, and install the required packages. If you choose not to use `uv`, you can replace `uv run` in the code examples in this repo with `python`.\n\n\u003e 🍏 macOS note: if you encounter `error: command 'cmake' failed: No such file or directory`, you need to install [cmake](https://cmake.org/download/) first, e.g., through `brew install cmake`. Similarly, you may have to install `apache-arrow` separately as well (`brew install apache-arrow`). Once these dependency issues are solved, run `uv sync` one more time.\n\n## Running the data processing pipeline\n\nThe full data processing pipeline looks like this:\n\n![disease database flow](img/disease_database_flow.svg)\n\nEach of the separate processing steps (rectangles in the above image) has its own subfolder with its own readme documentation:\n- Open archive processing in [`./src/process_open_archive/`](./src/process_open_archive/)\n- Delpher API harvesting in [`./src/harvest_delpher_api/`](./src/harvest_delpher_api/)\n- Final database creation in [`./src/create_database/`](./src/create_database/)\n\n\n## Data analysis\n\nFor a basic analysis after the database has been created, take a look at the file [`src/analysis/query_db.py`](src/analysis/query_db.py). \n\n![](./img/two_diseases_three_cities.png)\n\nFor more in-depth analysis and usage scripts, take a look at our analysis repository: [disease_database_analysis](https://github.com/sodascience/disease_database_analysis).\n\n\n## Contact\nThis project is developed and maintained by the [ODISSEI Social Data\nScience (SoDa)](https://odissei-soda.nl) team.\n\nDo you have questions, suggestions, or remarks? File an issue in the\nissue tracker or feel free to contact the team at [`odissei-soda.nl`](https://odissei-soda.nl)\n\n\u003cimg src=\"https://odissei-soda.nl/images/logos/soda_logo.svg\" alt=\"SoDa logo\" width=\"250px\"/\u003e \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsodascience%2Fdisease_database","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsodascience%2Fdisease_database","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsodascience%2Fdisease_database/lists"}