{"id":18897751,"url":"https://github.com/uk-ipop/open-data-pipeline","last_synced_at":"2026-05-25T11:01:35.274Z","repository":{"id":58968794,"uuid":"531128215","full_name":"UK-IPOP/open-data-pipeline","owner":"UK-IPOP","description":"A pipeline for processing, enhancing, and sharing open datasets.","archived":false,"fork":false,"pushed_at":"2026-05-12T18:52:07.000Z","size":56682,"stargazers_count":2,"open_issues_count":6,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-12T20:37:06.389Z","etag":null,"topics":["actions","automation","data","python"],"latest_commit_sha":null,"homepage":"https://uk-ipop.github.io/open-data-pipeline/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UK-IPOP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-08-31T14:46:19.000Z","updated_at":"2026-05-12T18:52:11.000Z","dependencies_parsed_at":"2026-02-28T06:05:45.272Z","dependency_job_id":null,"html_url":"https://github.com/UK-IPOP/open-data-pipeline","commit_stats":null,"previous_names":[],"tags_count":177,"template":false,"template_full_name":null,"purl":"pkg:github/UK-IPOP/open-data-pipeline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fopen-data-pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fopen-data-pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fopen-data-pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fopen-data-pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UK-IPOP","download_url":"https://codeload.github.com/UK-IPOP/open-data-pipeline/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fopen-data-pipeline/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33471530,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-25T06:32:55.349Z","status":"ssl_error","status_checked_at":"2026-05-25T06:32:35.322Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actions","automation","data","python"],"created_at":"2024-11-08T08:39:28.009Z","updated_at":"2026-05-25T11:01:35.228Z","avatar_url":"https://github.com/UK-IPOP.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Medical Examiner Open Data Pipeline\n\n[![Pipeline](https://github.com/UK-IPOP/open-data-pipeline/actions/workflows/pipeline.yml/badge.svg?branch=main)](https://github.com/UK-IPOP/open-data-pipeline/actions/workflows/pipeline.yml)\n[![Docs](https://github.com/UK-IPOP/open-data-pipeline/actions/workflows/pages/pages-build-deployment/badge.svg?branch=gh-pages)](https://github.com/UK-IPOP/open-data-pipeline/actions/workflows/pages/pages-build-deployment)\n\n\u003cimg src=\"https://github.com/UK-IPOP/open-data-pipeline/assets/45318637/c5c50811-f242-42d2-adfd-fa5563c1a89f\" alt=\"logo\" width=500 /\u003e\n\nThis repository contains the code for the Medical Examiner Open Data Pipeline.\n\nWe currently fetch data from the following sources:\n\n- [Cook County Medical Examiner's Archives](https://datacatalog.cookcountyil.gov/Public-Safety/Medical-Examiner-Case-Archive/cjeq-bs86)\n- [San Diego Medical Examiner's Office](https://data.sandiegocounty.gov/Safety/Medical-Examiner-Cases/jkvb-n4p7)\n- [Milwaukee County Medical Examiner's Office](https://county.milwaukee.gov/EN/Medical-Examiner)\n- [Connecticut (State) Accidental Drug Deaths](https://data.ct.gov/Health-and-Human-Services/Accidental-Drug-Related-Deaths-2012-2022/rybz-nyjw/about_data)\n- [Santa Clara County Medical Examiner's Office](https://data.sccgov.org/Health/Medical-Examiner-Coroner-Full-dataset/s3fb-yrjp/about_data)\n- [Sacramento County Medical Examiner's Office](https://sacramentocounty.maps.arcgis.com/apps/dashboards/0661fb44435b4611bf52be84708c4591)\n- [Pima County Medical Examiner's Office](https://www.google.com/url?sa=t\u0026source=web\u0026rct=j\u0026opi=89978449\u0026url=https://www.pima.gov/212/Medical-Examiner\u0026ved=2ahUKEwidg83ljdqJAxWYwskDHdbRE4YQFnoECDkQAQ\u0026usg=AOvVaw2T_hdJ3x-pqh07VFa9n6B8)\n  - This source is a manual data dump in collaboration with the Pima County ME/C Office. Data is refreshed monthly.\n- Cuyahoga County Medical Examiner's Office\n  - This source is a manual data dump in collaboration with the Cuyahoga County ME/C Office. Data is refreshed monthly.\n\nThe results of this data are used in various other analysis here on GitHub:\n\n- [Cook County](https://github.com/UK-IPOP/cook-county-analysis)\n  - Where we add geospatial data to the Cook County data\n    - This was excluded from this automated pipeline due to specific requirements for the data for only Cook County\n\n\n\u003e **NOTICE**: We have removed the Milwaukee data due to them taking down their site for the short term. Use [older](https://github.com/UK-IPOP/open-data-pipeline/releases/tag/2025-12-01)\n\u003e data if you require that jurisdiction.\n\u003e\n\u003e **UPDATE: 2026/03/18** - They have since re-published their site but are now on PowerBI. We are making efforts to reach out to them for direct data sharing or enabling the data download feature on the dashboard.\n\n## Getting Started\n\nThis repo exists mainly to take advantage of GitHub actions for automation.\n\nThe actions workflow is located in `.github/workflows/pipeline.yml` and is triggered weekly or manually.\n\nThis workflow fetches data from the configured data sources inside `config.json`, \ngeocodes addresses (when available) using ArcGIS, extracts drugs using the drug extraction [toolbox](https://github.com/UK-IPOP/drug-extraction)\nand then compiles and zips up the results into the GitHub Releases page.\n\nThe data is then available for download from the [releases page](https://github.com/UK-IPOP/open-data-pipeline/releases) page.\n\nFurther, the entire workflow effectively runs a series of commands using the CLI application `opendata-pipeline` which is located in the `src` directory.\n\nThis is also available via a docker image hosted on [ghcr.io](https://github.com/UK-IPOP/open-data-pipeline/pkgs/container/opendata-pipeline). The\nbenefits of using the CLI via a docker image is that you don't have to have Python or the drug toolbox on your local machine 🙂.\n\nWe utilize async methods to speed up the large number of web requests we make to the data sources.\n\n\u003e It is important to regularly fetch/pull from this repo to maintain an updated `config.json`\n\nWe currently do not guarantee Windows support unfortunately. If you want to help make that a reality, please submit a new [Pull Request](https://github.com/UK-IPOP/open-data-pipeline/pulls)\n\nThere is further API-documentation available on the GitHub Pages [website](https://uk-ipop.github.io/open-data-pipeline/) for this repo if you want to interact with the CLI.\nI would recommend using the docker image as it is easier to use and always referring to the CLI `--help` for more information.\n\n**NOTE:** The Census has recently made changes making it harder to download files running on servers so if you add\na location to the configuration, make sure its corresponding CensusTract file is downloaded and placed into the `data/spatial` folder. You can do this by running the following command:\n\n`wget -P data/spatial \u003cURL\u003e` where the URL should be the  URL of the the TIGER TRACT zip file, for example: https://www2.census.gov/geo/tiger/TIGER2024/TRACT/tl_2024_09_tract.zip\n\nOr, an example of the url: https://www2.census.gov/geo/tiger/TIGER2024/TRACT/tl_2024_\u003cSTATE_FIPS_CODE\u003e_tract.zip\n\n### Workflow\n\nThe workflow can best be described by looking at the `pipeline.yml` file.\n\n\u003cimg width=\"1104\" alt=\"CleanShot 2023-01-18 at 10 38 29@2x\" src=\"https://user-images.githubusercontent.com/45318637/213240766-b9b26d7d-0a5a-409b-b363-be487b55a57f.png\"\u003e\n\n## Data Enhancements\n\nThe following table shows the fields that we **add** to the original datafiles:\n\n| Column Name  | Description     |\n| :------ | :------ |\n| `CaseIdentifier` | A *unique* identifier *across all* the datasets. |\n| `death_day` | Day of the Month death occurred  |\n| `death_month`            | Month Name death occurred  |\n| `death_month_num`        | Month Number death occurred  |\n| `death_year`             | Year death occurred  |\n| `death_day_of_week`      | Day of week death occurred. Starting with 0 on Monday.  Weekends are 5 (Saturday) \u0026 6 (Sunday). |\n| `death_day_is_weekend`   | Death occurred on weekend day  |\n| `death_day_week_of_year` | Week of the year (of 52) that death occurred |\n| `geocoded_latitude` | Geocoded latitude. |\n| `geocoded_longitude` | Geocoded longitude. |\n| `geocoded_score` | Confidence of geocoding. 70-100. |\n| `geocoded_address`| The address that the geocoded results correspond to. Not the address provided to the geocoder. |\n\n\n### Drug Columns\n\nIn addition to providing the extracted drugs as a separate file in each release, we also convert this data to wide-form for each dataset. This adds the following columns in the subsequent pattern:\n\n| Column Name/Pattern | Description |\n| :--- | :--- |\n| `*_1` | `*` drug found in first search column provided in drug configuration |\n| `*_2` | `*` drug found in second search column provided in drug configuration |\n| `*_meta` | Drug of `*` category/class found in this record across _any_ search column.\n\n\n## Requirements\n\n- `uv`\n\n## Installation\n\nTo install the python cli I recommend using [uv](https://github.com/astral-sh/uv).\n\n```bash\nuvx opendata-pipeline\n```\n\nTo install the docker image, you can use the following command:\n\n```bash\ndocker pull ghcr.io/uk-ipop/opendata-pipeline:latest\n```\n\n## Usage\n\nUsage is very similar to any other command line application. The most important thing is to follow the workflow defined above.\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nHelp me write some tests!\n\n## License\n\n[MIT](https://choosealicense.com/licenses/mit/)\n\n## BibTex Citation\n\nIf you use this software or the enhanced data, please cite this repository:\n\n```\n@software{Anthony_Medical_Examiner_OpenData_2022,\n  author = {Anthony, Nicholas},\n  month = {9},\n  title = {{Medical Examiner OpenData Pipeline}},\n  url = {https://github.com/UK-IPOP/open-data-pipeline},\n  version = {0.2.1},\n  year = {2022}\n}\n```\n\nThank you.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuk-ipop%2Fopen-data-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuk-ipop%2Fopen-data-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuk-ipop%2Fopen-data-pipeline/lists"}