{"id":22515129,"url":"https://github.com/rggh/exa-data","last_synced_at":"2026-04-12T10:38:34.921Z","repository":{"id":121736036,"uuid":"460084677","full_name":"RGGH/exa-data","owner":"RGGH","description":"ETL : FHIR sample data JSON to SQL","archived":false,"fork":false,"pushed_at":"2022-02-20T23:59:23.000Z","size":20342,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-28T01:55:39.437Z","etag":null,"topics":["docker-compose","etl","json","pandas","postgresql","psycopg2-binary","pytest","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RGGH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-16T16:25:54.000Z","updated_at":"2022-06-01T09:04:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"06fbd0f1-d105-4834-9947-9cd59591ae5d","html_url":"https://github.com/RGGH/exa-data","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/RGGH/exa-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RGGH%2Fexa-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RGGH%2Fexa-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RGGH%2Fexa-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RGGH%2Fexa-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RGGH","download_url":"https://codeload.github.com/RGGH/exa-data/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RGGH%2Fexa-data/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261629395,"owners_count":23187030,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker-compose","etl","json","pandas","postgresql","psycopg2-binary","pytest","python"],"created_at":"2024-12-07T03:28:25.919Z","updated_at":"2025-12-30T22:25:37.766Z","avatar_url":"https://github.com/RGGH.png","language":"Python","readme":"# Example ETL project using FHIR standard data\nFHIR is a standard for health care data exchange, published by HL7®\n\nPurpose of code : Transfer data to structured format (SQL) from 80 sample JSON files, with circa 50,000 lines in each file.\n\n# User guide : \n\n## TLDR;\n\n- Clone this repo : `git clone https://github.com/RGGH/exa-data.git`\n- Install Docker to create the database and initialize 'patient_info' table with required fields\n- Run python code (2 files below)\n---\n    python3 exa_flatten_json_csv.py\n    \n    python3 exa_parse_for_sql.py\n---\n\n## Full Steps:\n#### 1. Install Docker Compose\n\n    https://docs.docker.com/compose/install/\n    \n    docker-compose --version\n\n    \n##### *If you get an error :\n    \n    (env) rag@rag-Latitude-5490:~/env/exa-data-1/exa-data$ sudo docker-compose up\n    Creating network \"exa-data_default\" with the default driver\n    ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network\n    \n    Stop your openvpn :\n    sudo service openvpn stop\n    \n#### 2. Run Docker Comopse\n \n    sudo docker-compose up\n    \n#### 3. Install Python Packages\n\n    pip3 install -r requirements.txt\n    \n#### 4. Run the 2 Python files\n\n- These have been kept separate to allow for batch conversion from JSON to CSV, and then import into SQL\n---\n    python3 exa_flatten_json_csv.py\n    python3 exa_parse_for_sql.py\n  \n#### 5. Check the imported data in in PostgreSQL \n    python3 db_connect.py\n    \n---\n\n### About this project\n\n- used Docker compose with yaml file to create container for a PostgreSQL 14 image\n- used regex to extract nested keys from 50,000 lines of JSON\n- used Vim to remove empty rows\n- used Pandas to remove duplicates but preserve sort order\n- used Vim to add data types in bulk, plus commas\n- used Jupyter Notebook for exploratory data anlaysis and testing\n- unable to find data dictionary on FHIR site for this actual project http://hl7.org/fhir/overview-dev.html http://hl7.org/fhir/resourcelist.html\n- not all fields populated due to time constraint, have done first few in order\n- had issues with lists inside dicts. Resolved this with 'typing' module https://docs.python.org/3/library/typing.html#module-typing\n\n###### ToDo\n- Add extra tests\n- Fix issues flagged in pylint with github workflow\n- Consider more than 1 table for data, eg patients table, plus a join with notes each fullUrl\n- Create frontend in HTML/JavaScript and consume data from an API (yet to be written, in Python with FastAPI)\n- add remaining fieids in exa_parse_for_sql.py\n\n![screenshot](https://github.com/RGGH/exa-data/blob/main/notes/exadata.gif)\n  \n\n   \n    \n   \n    \n    \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frggh%2Fexa-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frggh%2Fexa-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frggh%2Fexa-data/lists"}