https://github.com/the-strategy-unit/nhp_data

Data processing for the New Hospital Programe (NHP) demand model
https://github.com/the-strategy-unit/nhp_data

new-hospital-programme nhp-core nhp-operational

Last synced: 4 months ago
JSON representation

Data processing for the New Hospital Programe (NHP) demand model

Host: GitHub
URL: https://github.com/the-strategy-unit/nhp_data
Owner: The-Strategy-Unit
License: mit
Created: 2024-05-17T08:02:08.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2026-03-02T09:56:56.000Z (4 months ago)
Last Synced: 2026-03-02T13:34:18.129Z (4 months ago)
Topics: new-hospital-programme, nhp-core, nhp-operational
Language: Python
Homepage: https://connect.strategyunitwm.nhs.uk/nhp/project_information/
Size: 1.38 MB
Stars: 1
Watchers: 1
Forks: 2
Open Issues: 13
Metadata Files:
- Readme: readme.md
- License: LICENSE
- Codeowners: CODEOWNERS

Awesome Lists containing this project

README

# NHP Data

A comprehensive data processing pipeline for the New Hospital Programme (NHP) [model](https://github.com/the-strategy-unit/nhp_model).
This project orchestrates the extraction, transformation, and preparation of data required for the model.

Built to work in pyspark on Databricks, with Hospital Episode Statistics (HES) data.

* Admitted Patient Care (APC)
* Outpatient Appointments (OPA)
* Emergency Care Dataset (ECDS), and for historical trends, Accident and Emergency (AAE)
* ONS Population Projections
* NHS Reference Data

## Architecture

The project uses Databricks Asset Bundles to manage deployment.
All processing is orchestrated through Databricks workflows that can run independently, or as part
of the main pipeline.

```mermaid
graph TD
ref[Reference Data]
inputs[Inputs Data]
ecds[ECDS Data]
ip[Inpatient Data]
op[Outpatient Data]
model[Model Data Extraction]

ref --> ecds
ref --> ip
ref --> op

ecds --> inputs
ip --> inputs
op --> inputs

inputs --> model
```

The workflows are built into a python package, with all of the code in the `src/` folder.
Each task in the workflows is defined as an entry point in `pyproject.toml`, and by convention is
a `main()` function which takes no arguments (parameters passed in via `sys.argv`).

## Getting Started

### Prerequisites

* Access to Databricks workspace
* Appropriate permissions for access to the data
* Python 3.11+
* uv

### Installation

The project is packaged as a Python wheel and deployed via Databricks bundles:

``` sh
# Build the package
uv build

# Deploy to development
databricks bundle deploy --target dev
```
Deployment to the `prod` target is via GitHub actions, and should not be done manually.

## Running Workflows

### Run the complete data pipeline:

``` sh
databricks jobs run --job-name "Generate NHP Data"
```

### Run individual components:

``` sh
# Process reference data only
databricks jobs run --job-name "Generate NHP Data (Reference Data)"

# Process emergency care data
databricks jobs run --job-name "Generate NHP Data (AAE/ECDS)"

# Extract data for modeling containers
databricks jobs run --job-name "Extract NHP for containers"
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/the-strategy-unit/nhp_data

Awesome Lists containing this project

README