{"id":22641115,"url":"https://github.com/alexanderkiel/fhir2csv","last_synced_at":"2026-01-07T16:41:51.040Z","repository":{"id":141113908,"uuid":"267879394","full_name":"alexanderkiel/fhir2csv","owner":"alexanderkiel","description":"Extract Tabular Data from FHIR Resources","archived":false,"fork":false,"pushed_at":"2020-05-29T19:26:43.000Z","size":6,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-03T15:55:29.750Z","etag":null,"topics":["csv","fhir","jq"],"latest_commit_sha":null,"homepage":null,"language":"JSONiq","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alexanderkiel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-29T14:33:32.000Z","updated_at":"2023-12-15T20:49:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"16fe9601-9695-43c7-ad9f-b6ff528ad06e","html_url":"https://github.com/alexanderkiel/fhir2csv","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexanderkiel%2Ffhir2csv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexanderkiel%2Ffhir2csv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexanderkiel%2Ffhir2csv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexanderkiel%2Ffhir2csv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alexanderkiel","download_url":"https://codeload.github.com/alexanderkiel/fhir2csv/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246143868,"owners_count":20730341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","fhir","jq"],"created_at":"2024-12-09T04:17:14.195Z","updated_at":"2026-01-07T16:41:51.001Z","avatar_url":"https://github.com/alexanderkiel.png","language":"JSONiq","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Extract Tabular Data from FHIR Resources\n\n## Background\n\nFHIR resources represent medical data in hierarchical form. However data scientists work with tabular data. Currently a number of projects try to solve this impedance mismatch. One of this is [Fhir2Tables][1] from Thomas Peschel.\n\n## Other Approaches\n\nFhir2Tables uses the R package [fhirR][2]. This package provides functionality to issue FHIR search requests to a FHIR server and convert the resulting bundles into R data frames. The conversion is done by specifying XPath expressions for each column of the data frame.\n\nIMHO this is a very good fit for data scientists which work directly in R. However in other scenarios, e.g. automated ETL processes, it might be an overhead to maintain a working R installation which will process the conversation correctly. Another limitation of fhirR is that it needs the resources in XML format because it uses XPath.\n\n## My Proposal\n\nWhile exploring solutions which don't require an R installation and work with JSON, I came up with the command line tool [jq][3] which is available on all platforms and well established in the world of JSON. With jq the extraction of specific values from FHIR resources is possible. Even a direct CSV output is provided.\n\nThe jq filters, I like to show here, convert one FHIR resource into one line of CSV data. If you clone this repository, you can run the following, assuming you have jq installed:\n\n```sh\ncat blood-pressure-observation.json | jq -rf blood-pressure.jq\n```\n\nThe output should be:\n\n```\n\"402\",\"409\",12,11,\"2019-09-18T15:20:28+03:00\"\n```\n\nIn addition to a single resource, the same filter can also be used to process a stream of resources which generates a stream of CSV data lines. One possibility to generate a stream of resources is to apply another jq filter to a bundle of resources. If you run the following:\n\n```sh\ncat blood-pressure-bundle.json | jq '.entry[]? | .resource' | jq -rf blood-pressure.jq\n```\n\nThe output should be:\n\n```\n\"402\",\"409\",12,11,\"2019-09-18T15:20:28+03:00\"\n\"402\",\"408\",12,11,\"2019-09-18T15:20:28+03:00\"\n\"402\",\"405\",12,11,\"2019-09-18T15:20:28+03:00\"\n\"402\",\"406\",12,11,\"2019-09-18T15:20:28+03:00\"\n\"402\",\"407\",12,11,\"2019-09-18T15:20:28+03:00\"\n\"402\",\"417\",,,\"2019-09-18T15:20:28+03:00\"\n\"402\",\"411\",12,11,\"2019-09-18T15:20:28+03:00\"\n\"402\",\"412\",12,11,\"2019-09-18T15:20:28+03:00\"\n\"1233\",\"1245\",80,120,\"2019-09-19T14:26:26-04:00\"\n\"402\",\"1193\",11,12,\"2019-09-19T17:17:57+03:00\"\n```\n\nStreams of resources can be also obtained from [FHIR Bulk Data Access][4] which should make jq a good fit to process large FHIR data exports into CSV files.\n\n## JQ Filters in Detail\n\nIn our blood pressure example the the jq filter file is the following:\n\n```\n[\n\n# PID\n(.subject.reference | split(\"/\") | nth(1)) // null\n\n# OID\n, (.id) // null\n\n# DIA\n, first(.component[]?\n  | select(.code.coding[]? | [.system, .code] == [\"http://loinc.org\", \"8462-4\"])\n  | .valueQuantity.value\n) // null\n\n# SYS\n, first(.component[]?\n  | select(.code.coding[]? | [.system, .code] == [\"http://loinc.org\", \"8480-6\"])\n  | .valueQuantity.value\n) // null\n\n# DATE\n, (.effectiveDateTime) // null\n\n]\n| @csv\n```\n\nHere the basic shape is:\n\n```\n[ \u003ccolumn-filter-0\u003e, ..., \u003ccolumn-filter-n\u003e ] | @csv\n```\n\nwhere `[]` is an [array construction][5] for the CSV row and the `@csv` [syntax][6] actually outputs that array in CSV format. The [pipe][7] `|` operator combines that two filters. Within the array, a separate filter for each column is used. \n\nThe most simple filter used for the OID (object identifier) is,\n\n```\n(.id) // null\n```\n\nwhere we select the Observation id property using the [object identifier filter][8] `.id`. We have to put that filter inside parentheses in order to allow the next filter to access the resources root again. The [alternative operator][9] (`//`) followed by `null` is used to ensure that the column isn't omitted if there is no id property in the resource. You will find the pattern `(\u003creal-filter\u003e) // null` in all column filters, so I will not repeat it.\n\nThe next more advanced filter used for the PID (patient identifier) is:\n\n```\n.subject.reference | split(\"/\") | nth(1)\n```\n\nHere the object identifier filter `.subject.reference` is used to select the reference property inside the subject complex type. After that [split][10] is used to separate the subject type `Patient` from its identifier and [nth][11] is used to output the second part of the split, the identifier.\n\nThe last example of an filter used for DIA (diastolic blood pressure) is:\n\n```\n.component[]?\n  | select(.code.coding[]? | [.system, .code] == [\"http://loinc.org\", \"8462-4\"])\n  | .valueQuantity.value\n```\n\nHere we first descent into the component complex type which can have multiple values. That's the reason we use [array/object value iterator][12] (`.component[]?`) which will output each value individually and doesn't error on a missing component complex type. After that we continue with the [select function][13] with will select the Observation component based on the following FHIR structure:\n\n```json\n{\n  \"code\": {\n    \"coding\": [\n      {\n        \"system\": \"http://loinc.org\",\n        \"code\": \"8462-4\",\n        \"display\": \"Diastolic blood pressure\"\n      }\n    ]\n  }\n}\n```\n\nHere `.code.coding[]?` descents into the Coding followed by `[.system, .code`] which extract the system and code into an array. The array is then compared to the `[\"http://loinc.org\", \"8462-4\"]` array.\n\nAfter the appropriate Observation component is selected, the filter `.valueQuantity.value` extracts its quantity value.\n\n## Conclusion\n\nI have show that it's possible to extract tabular data in CSV format from a stream of FHIR resources of one type using the widely available command line tool jq. Streams of FHIR resources can be generated from FHIR bundles using jq itself or be obtained by FHIR Bulk Data Access.\n\nA one-shot FHIR search to CSV solution is currently not possible with my solution, because FHIR search uses paging with one FHIR bundle per page. An additional tool would be necessary, which follows the page links obtaining bundle after bundle and outputting a stream of FHIR resources. It's possible to build such functionality into [blazectl][14]. With that a one-shot solution export of resources of a single type directly into a CSV file would look like this:\n\n```sh\nblazectl --server https://hapi.fhir.org/baseR4 search --type Observation --query 'code=http://loinc.org|85354-9' | jq -rf blood-pressure.jq \u003e blood-pressure.csv\n```\n\n[1]: \u003chttps://gitlab.com/TPeschel/fhir2tables\u003e\n[2]: \u003chttps://tpeschel.github.io/fhiR/\u003e\n[3]: \u003chttps://stedolan.github.io/jq/\u003e\n[4]: \u003chttps://hl7.org/fhir/uv/bulkdata/\u003e\n[5]: \u003chttps://stedolan.github.io/jq/manual/#TypesandValues\u003e\n[6]: \u003chttps://stedolan.github.io/jq/manual/#Formatstringsandescaping\u003e\n[7]: \u003chttps://stedolan.github.io/jq/manual/#Pipe:|\u003e\n[8]: \u003chttps://stedolan.github.io/jq/manual/#Basicfilters\u003e\n[9]: \u003chttps://stedolan.github.io/jq/manual/#ConditionalsandComparisons\u003e\n[10]: \u003chttps://stedolan.github.io/jq/manual/#split(str)\u003e\n[11]: \u003chttps://stedolan.github.io/jq/manual/#first,last,nth(n)\u003e\n[12]: \u003chttps://stedolan.github.io/jq/manual/#Array/ObjectValueIterator:.[]\u003e\n[13]: \u003chttps://stedolan.github.io/jq/manual/#select(boolean_expression)\u003e\n[14]: \u003chttps://github.com/samply/blazectl\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexanderkiel%2Ffhir2csv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexanderkiel%2Ffhir2csv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexanderkiel%2Ffhir2csv/lists"}