https://github.com/aehrc/pathling
Tools that make it easier to use FHIR and clinical terminology within data analytics, built on Apache Spark.
https://github.com/aehrc/pathling
analytics fhir spark standards terminology
Last synced: about 1 month ago
JSON representation
Tools that make it easier to use FHIR and clinical terminology within data analytics, built on Apache Spark.
- Host: GitHub
- URL: https://github.com/aehrc/pathling
- Owner: aehrc
- License: apache-2.0
- Created: 2020-03-23T05:22:00.000Z (about 6 years ago)
- Default Branch: main
- Last Pushed: 2026-02-13T10:03:02.000Z (4 months ago)
- Last Synced: 2026-02-13T10:16:07.896Z (4 months ago)
- Topics: analytics, fhir, spark, standards, terminology
- Language: Java
- Homepage: https://pathling.csiro.au
- Size: 206 MB
- Stars: 118
- Watchers: 11
- Forks: 19
- Open Issues: 88
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README

[](https://sonarcloud.io/summary/new_code?id=aehrc_pathling)
Pathling is a set of tools that make it easier to
use [FHIR®](https://hl7.org/fhir) and clinical terminology within health
data analytics. It is built on [Apache Spark](https://spark.apache.org), and
it implements the [SQL on FHIR](https://sql-on-fhir.org) view specification and
the [Bulk Data Access](https://hl7.org/fhir/uv/bulkdata/STU2/) implementation
guide.
[**Read the documentation →**](https://pathling.csiro.au/docs)
## What can it do?
### Query and transformation of FHIR data
[FHIR R4](https://hl7.org/fhir) is the dominant standard for exchanging health
data. It comes in both [JSON](https://hl7.org/fhir/json.html)
or [XML](https://hl7.org/fhir/xml.html) formats, and can contain over 140
different types of resources, such
as [Patient](https://hl7.org/fhir/patient.html),
[Observation](https://hl7.org/fhir/observation.html),
[Condition](https://hl7.org/fhir/condition.html),
[Procedure](https://hl7.org/fhir/procedure.html), and many more.
Pathling is capable of reading all the different types of FHIR resources into a
format suitable for data analysis tasks. This makes the following things
possible:
- Creating SQL-friendly views from FHIR data
- Transforming data into other formats, such as CSV
or [Parquet](https://parquet.apache.org/)
- Performing terminology queries against coded fields within the FHIR data
See [Data in and out](https://pathling.csiro.au/docs/libraries/io) and
[Running queries](https://pathling.csiro.au/docs/libraries/running-queries) for
more information.
### Terminology queries
Health data often contains codes from systems such
as [SNOMED CT](https://www.snomed.org/snomed-ct/five-step-briefing), [LOINC](https://loinc.org/)
or [ICD](https://www.who.int/standards/classifications/classification-of-diseases).
These codes contain a great deal of information about diagnoses, procedures,
observations and many other aspects of a patient's clinical record.
It is common to group these codes based upon their properties, relationships to
other codes, or membership within a pre-defined set. Pathling can automate the
task of calling out to
a [FHIR terminology server](https://hl7.org/fhir/terminology-service.html) to
ask questions about the codes within your data.
Examples of the types of questions that can be answered include:
- Is this SNOMED CT procedure code a type of endoscopy?
- Does this LOINC test result code have an analyte of bilirubin?
- Is this ICD-10 code within the pre-defined list of codes within my cohort
definition?
See [Terminology functions](https://pathling.csiro.au/docs/libraries/terminology)
for more information.
### Server
Pathling Server is a FHIR R4 analytics server that exposes a range of
functionality for use by applications and implements:
- **SQL on FHIR**:
[Run](https://pathling.csiro.au/docs/server/operations/view-run) view
definitions to
preview tabular projections of FHIR data,
then [export](https://pathling.csiro.au/docs/server/operations/view-export) to
NDJSON, CSV, or Parquet
- **Bulk Data Access
**: [Export data](https://pathling.csiro.au/docs/server/operations/export) at
system,
patient, or group level using the FHIR Bulk Data Access specification
- **Bulk Import
**: [Import data](https://pathling.csiro.au/docs/server/operations/import)
from NDJSON,
Parquet, or Delta Lake sources,
or [sync with another FHIR server](https://pathling.csiro.au/docs/server/deployment/synchronization)
that supports bulk export
- **[Bulk Submit](https://pathling.csiro.au/docs/server/operations/bulk-submit)
**: An experimental
implementation of the new Bulk Submit proposal
- **[FHIRPath Search](https://pathling.csiro.au/docs/server/operations/search)
**: Query resources using
FHIRPath expressions
- **[CRUD Operations](https://pathling.csiro.au/docs/server/operations/crud)**:
Create, read, update, and
delete resources
The server is distributed as a Docker image. It
supports [authentication](https://pathling.csiro.au/docs/server/authorization)
and also can be scaled
over a cluster
on [Kubernetes](https://pathling.csiro.au/docs/server/deployment/kubernetes) or
other
Apache Spark clustering solutions.
See [Server](https://pathling.csiro.au/docs/server) for installation and
configuration details.
## Artifact signing
Published Maven artifacts and Helm charts are signed with the following GPG key:
- **Key ID**: `ED48678D`
- **Fingerprint**: `F814 751C 64B5 F5E7 08A8 C73F C3C6 291F ED48 678D`
- **User ID**: `Pathling Developers `
The public key is available
on [keys.openpgp.org](https://keys.openpgp.org/search?q=F814751C64B5F5E708A8C73FC3C6291FED48678D).
## Licensing and attribution
Pathling is copyright © 2018-2025, Commonwealth Scientific and Industrial
Research Organisation
(CSIRO) ABN 41 687 119 230. Licensed under
the [Apache License, version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
This means that you are free to use, modify and redistribute the software as
you wish, even for commercial purposes.
If you use this software in your research, please consider citing our paper,
[Pathling: analytics on FHIR](https://doi.org/10.1186/s13326-022-00277-1).
**Pathling is experimental software, use it at your own risk!** You can get a
full description of the current set of known issues
[here](https://github.com/aehrc/pathling/issues).