{"id":20864691,"url":"https://github.com/the-strategy-unit/nhp_data","last_synced_at":"2026-03-07T13:04:04.138Z","repository":{"id":262800200,"uuid":"801940217","full_name":"The-Strategy-Unit/nhp_data","owner":"The-Strategy-Unit","description":"Data processing for the New Hospital Programe (NHP) demand model","archived":false,"fork":false,"pushed_at":"2026-03-02T09:56:56.000Z","size":1450,"stargazers_count":1,"open_issues_count":13,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-02T13:34:18.129Z","etag":null,"topics":["new-hospital-programme","nhp-core","nhp-operational"],"latest_commit_sha":null,"homepage":"https://connect.strategyunitwm.nhs.uk/nhp/project_information/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/The-Strategy-Unit.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-05-17T08:02:08.000Z","updated_at":"2026-02-02T13:50:51.000Z","dependencies_parsed_at":"2026-01-06T07:11:30.701Z","dependency_job_id":null,"html_url":"https://github.com/The-Strategy-Unit/nhp_data","commit_stats":null,"previous_names":["the-strategy-unit/nhp_data"],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/The-Strategy-Unit/nhp_data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The-Strategy-Unit%2Fnhp_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The-Strategy-Unit%2Fnhp_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The-Strategy-Unit%2Fnhp_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The-Strategy-Unit%2Fnhp_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/The-Strategy-Unit","download_url":"https://codeload.github.com/The-Strategy-Unit/nhp_data/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The-Strategy-Unit%2Fnhp_data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30214618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T12:15:00.571Z","status":"ssl_error","status_checked_at":"2026-03-07T12:15:00.217Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["new-hospital-programme","nhp-core","nhp-operational"],"created_at":"2024-11-18T05:43:45.568Z","updated_at":"2026-03-07T13:04:04.133Z","avatar_url":"https://github.com/The-Strategy-Unit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NHP Data\r\n\r\nA comprehensive data processing pipeline for the New Hospital Programme (NHP) [model](https://github.com/the-strategy-unit/nhp_model).\r\nThis project orchestrates the extraction, transformation, and preparation of data required for the model.\r\n\r\nBuilt to work in pyspark on Databricks, with Hospital Episode Statistics (HES) data.\r\n\r\n* Admitted Patient Care (APC)\r\n* Outpatient Appointments (OPA)\r\n* Emergency Care Dataset (ECDS), and for historical trends, Accident and Emergency (AAE)\r\n* ONS Population Projections\r\n* NHS Reference Data\r\n\r\n## Architecture\r\n\r\nThe project uses Databricks Asset Bundles to manage deployment.\r\nAll processing is orchestrated through Databricks workflows that can run independently, or as part\r\nof the main pipeline.\r\n\r\n```mermaid\r\ngraph TD\r\n    ref[Reference Data]\r\n    inputs[Inputs Data]\r\n    ecds[ECDS Data]\r\n    ip[Inpatient Data] \r\n    op[Outpatient Data]\r\n    model[Model Data Extraction]\r\n\r\n    ref --\u003e ecds\r\n    ref --\u003e ip\r\n    ref --\u003e op\r\n    \r\n    ecds --\u003e inputs\r\n    ip --\u003e inputs\r\n    op --\u003e inputs\r\n    \r\n    inputs --\u003e model\r\n```\r\n\r\nThe workflows are built into a python package, with all of the code in the `src/` folder.\r\nEach task in the workflows is defined as an entry point in `pyproject.toml`, and by convention is\r\na `main()` function which takes no arguments (parameters passed in via `sys.argv`).\r\n\r\n## Getting Started\r\n\r\n### Prerequisites\r\n\r\n* Access to Databricks workspace\r\n* Appropriate permissions for access to the data\r\n* Python 3.11+\r\n* uv\r\n\r\n### Installation\r\n\r\nThe project is packaged as a Python wheel and deployed via Databricks bundles:\r\n\r\n``` sh\r\n# Build the package\r\nuv build\r\n\r\n# Deploy to development\r\ndatabricks bundle deploy --target dev\r\n```\r\nDeployment to the `prod` target is via GitHub actions, and should not be done manually.\r\n\r\n## Running Workflows\r\n\r\n### Run the complete data pipeline:\r\n\r\n``` sh\r\ndatabricks jobs run --job-name \"Generate NHP Data\"\r\n```\r\n\r\n### Run individual components:\r\n\r\n``` sh\r\n# Process reference data only\r\ndatabricks jobs run --job-name \"Generate NHP Data (Reference Data)\"\r\n\r\n# Process emergency care data\r\ndatabricks jobs run --job-name \"Generate NHP Data (AAE/ECDS)\"\r\n\r\n# Extract data for modeling containers\r\ndatabricks jobs run --job-name \"Extract NHP for containers\"\r\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthe-strategy-unit%2Fnhp_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthe-strategy-unit%2Fnhp_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthe-strategy-unit%2Fnhp_data/lists"}