{"id":13693347,"url":"https://github.com/HicServices/SynthEHR","last_synced_at":"2025-05-02T21:31:57.265Z","repository":{"id":34869741,"uuid":"184585886","full_name":"HicServices/SynthEHR","owner":"HicServices","description":"Library and CLI for randomly generating medical data like you might get out of an Electronic Health Records (EHR) system","archived":false,"fork":false,"pushed_at":"2025-02-11T21:03:09.000Z","size":5822,"stargazers_count":31,"open_issues_count":1,"forks_count":3,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-06T22:46:36.871Z","etag":null,"topics":["cli","dataset","ehr","electronic-health-records","hospital-admission","nuget","patient","synthetic-data","testing-tools","tests"],"latest_commit_sha":null,"homepage":null,"language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HicServices.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-02T13:27:18.000Z","updated_at":"2025-02-11T21:03:05.000Z","dependencies_parsed_at":"2023-02-11T09:45:18.450Z","dependency_job_id":"ae4277e7-2cfb-4377-8ac0-71cf2e388be0","html_url":"https://github.com/HicServices/SynthEHR","commit_stats":{"total_commits":248,"total_committers":11,"mean_commits":"22.545454545454547","dds":0.5846774193548387,"last_synced_commit":"28a447b515c753dfdc7528343b9db127800c1a79"},"previous_names":["hicservices/synthehr","hicservices/badmedicine"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HicServices%2FSynthEHR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HicServices%2FSynthEHR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HicServices%2FSynthEHR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HicServices%2FSynthEHR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HicServices","download_url":"https://codeload.github.com/HicServices/SynthEHR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252108920,"owners_count":21696160,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","dataset","ehr","electronic-health-records","hospital-admission","nuget","patient","synthetic-data","testing-tools","tests"],"created_at":"2024-08-02T17:01:08.690Z","updated_at":"2025-05-02T21:31:56.070Z","avatar_url":"https://github.com/HicServices.png","language":"C#","funding_links":[],"categories":["Process-driven methods"],"sub_categories":["Tabular"],"readme":"# SynthEHR (Previously BadMedicine)\n\n[![Build Status](https://github.com/HICServices/SynthEHR/actions/workflows/testpack.yml/badge.svg?branch=develop)](https://travis-ci.org/HicServices/SynthEHR) [![NuGet Badge](https://buildstats.info/nuget/HIC.SynthEHR)](https://www.nuget.org/packages/HIC.SynthEHR/)\n\nLibrary and CLI for randomly generating medical data like you might get out of an Electronic Health Records (EHR) system.  It is intended for generating data for demos and testing ETL / cohort generation/ data management tools.\n\nSynthEHR differs from other random data generators e.g. Mockaroo, SQL Data Generator etc in that data generated is based on (simple) models generated from live EHR datasets collected for over 30 years in Tayside and Fife (UK).  This makes the data generated recognisable (codes used, frequency of codes etc) from a clinical perspective and representative of the problems (ontology mapping etc) that data analysts would encounter working with real medical data.\n\nDatasets generated are not suitable for training AI algorithms etc (See [What is Modelled?](#what-is-modelled))\n\n## Rename\nAs of v2.0.0 BadMedicine was renamed to SynthEHR. Previous versions of the software can be found at [nuget.org](https://www.nuget.org/packages/HIC.BadMedicine).\n\n## Datasets\n\nThe following synthetic datasets can be produced.\n\n| Dataset        | Description           |\n| ------------- |:-------------:|\n| Demography      | Address and patient details as might appear in the CHI register |\n| Biochemistry      | Lab test codes as might appear in Sci Store lab system extracts |\n| Prescribing      | Prescription data of prescribed drugs |\n| Carotid Artery Scan      | Scan results for Carotid Artery |\n| Hospital Admissions | ICD9 and ICD10 codes for admission to hospital |\n| Maternity | Records of births etc |\n\n## Usage:\n\nSynthEHR is available as a [nuget package](https://www.nuget.org/packages/HIC.SynthEHR/) for linking as a library\n\nThe standalone CLI (SynthEHR.exe) is available in the [releases section of Github](https://github.com/HicServices/SynthEHR/releases)\n\nUsage is as follows:\n\n```\nSynthEHR.exe c:\\temp\\\n```\n\nYou can change how much data is produced (e.g. 500 patients, 10000 records per dataset):\n\n```\nSynthEHR.exe c:\\temp\\ 500 10000\n```\n\nOr run only a single dataset:\n\n```\nSynthEHR.exe c:\\omg 5000 200000 -l -d CarotidArteryScan\n```\n\nYou can seed the generator (Guids generated will still differ)\n\n```\nSynthEHR.exe c:\\omg 5000 200000 -l -d CarotidArteryScan -s 5000\n```\n\n## Building\n\nBuilding requires MSBuild 15 or later (or Visual Studio 2017 or later).  You will also need to install the DotNetCore 2.2 SDK.\n\nYou can build a OS specific binary\n\nFirst build SynthEHR.csproj\n```\ndotnet publish SynthEHR.csproj -r win-x64 --self-contained\ncd .\\bin\\Debug\\netcoreapp2.2\\win-x64\\\n```\n## Direct to Database\n\nYou can generate data directly into a relational database (instead of onto disk).\n\nTo turn this mode on rename the file `SynthEHR.template.yaml` to `SynthEHR.yaml` and provide the connection strings to your database e.g.:\n\n```yaml\nDatabase:\n  # Set to true to drop and recreate tables described in the Template\n  DropTables: false\n  # The connection string to your database\n  ConnectionString: server=(localdb)\\MSSQLLocalDB;Integrated Security=true;\n  # Your DBMS provider ('MySql', 'PostgreSql','Oracle' or 'MicrosoftSQLServer')\n  DatabaseType: MicrosoftSQLServer\n  # Database to create/use on the server\n  DatabaseName: SynthEHRTestData\n```\n\n## Library Usage\n\nYou can generate test data for your program yourself by referencing the [nuget package](https://www.nuget.org/packages/HIC.SynthEHR/):\n\n```csharp\n//Seed the random generator if you want to always produce the same randomisation\nvar r = new Random(100);\n\n//Create a new person\nvar person = new Person(r);\n\n//Create test data for that person\nvar a = new HospitalAdmissionsRecord(person,person.DateOfBirth,r);\n\nAssert.IsNotNull(a.Person.CHI);\nAssert.IsNotNull(a.Person.DateOfBirth);\nAssert.IsNotNull(a.Person.Address.Line1);\nAssert.IsNotNull(a.Person.Address.Postcode);\nAssert.IsNotNull(a.AdmissionDate);\nAssert.IsNotNull(a.DischargeDate);\nAssert.IsNotNull(a.Condition1);\n```\n\n## What is Modelled?\n\nData generated by SynthEHR is driven by Aggregate distributions of real health data collected in Tayside (UK).  This means that codes appear in data with the frequency that match real data.  For example in the Hospital Admissions data we can see that ICD9 codes (denoted by dash) cease being recorded in ~1997 in favour of ICD10 codes and we can see the most common admission conditions are sensible:\n\n![alt text](./Images/MainConditionDistribution.png)\n\n*ICD 9 and ICD 10 codes in Condition1 (the main condition) upon Hospital Admission*\n\n## What is not Modelled?\n\nNo inter dataset / inter record level randomisation model exists.  For example the following would **not** be modelled:\n\n- If a patient is on Drug A they are more likely to also be on Drug B\n- Hospitalisations are more likely to be at the beginning/end of a patients life\n- Drug A is likely to be given to patients discharged having been treated for condition Y\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHicServices%2FSynthEHR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHicServices%2FSynthEHR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHicServices%2FSynthEHR/lists"}