{"id":18811347,"url":"https://github.com/chop-dbhi/data-models","last_synced_at":"2026-01-27T09:18:03.286Z","repository":{"id":29676904,"uuid":"33219268","full_name":"chop-dbhi/data-models","owner":"chop-dbhi","description":"Collection of various biomedical data models in parseable formats.","archived":false,"fork":false,"pushed_at":"2025-09-08T17:22:31.000Z","size":9828,"stargazers_count":29,"open_issues_count":17,"forks_count":8,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-10-09T23:33:48.110Z","etag":null,"topics":["biomedical","csv","data-models","schema","vocabulary"],"latest_commit_sha":null,"homepage":"https://data-models-service.research.chop.edu","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chop-dbhi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-04-01T01:19:14.000Z","updated_at":"2025-09-08T17:22:35.000Z","dependencies_parsed_at":"2023-09-26T17:15:28.866Z","dependency_job_id":"36a4826b-1446-4356-86a2-e8a2403b7d57","html_url":"https://github.com/chop-dbhi/data-models","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chop-dbhi/data-models","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fdata-models","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fdata-models/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fdata-models/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fdata-models/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chop-dbhi","download_url":"https://codeload.github.com/chop-dbhi/data-models/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fdata-models/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28810475,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T07:41:26.337Z","status":"ssl_error","status_checked_at":"2026-01-27T07:41:08.776Z","response_time":168,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biomedical","csv","data-models","schema","vocabulary"],"created_at":"2024-11-07T23:25:49.671Z","updated_at":"2026-01-27T09:18:03.254Z","avatar_url":"https://github.com/chop-dbhi.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Models\n\nData models and vocabularies in the biomedical space.\n\n## Persistent CSV Format\n\nData model descriptions are stored persistently in this repository in CSV format for portability and human readability. Each data model has its own directory with versions of the model in subdirectories. Each version directory has a `datamodel.json` file that holds metadata about the datamodel and version, so as not to rely on directory structure for interpretability. In fact, this file and a collection of CSV files with the below described header signatures is enough to signal that a data model definition exists. However, the organization and naming conventions presented below have been useful in our initial data model definitions.\n\nEach data model version should have at least `definitions` and `schema` directories and, optionally, a `constraints` directory and `indexes.csv` and `references.csv` files.\n\nThe `definitions` directory (e.g., [omop/v5/definitions](omop/v5/definitions)) holds basic information about the data model that would be of primary interest to a data user. There is a `tables.csv` file (e.g., [omop/v5/definitions/tables.csv](omop/v5/definitions/tables.csv)), which lists `name` and `description` for each table, as well as a CSV file for each table (e.g., [omop/v5/definitions/person.csv](omop/v5/definitions/person.csv)), which lists `name` and `description` for each field, whether the field is `required` (a governance, not schema, attribute), and optionally a `ref_table` and `ref_field` combination to which the field refers (typically manifested as a foreign key relationship).\n\nThe `schema` directory holds detailed information that might be used to instantiate the data model in a database or other physical storage medium. There is a CSV file for each table (e.g., [omop/v5/schema/person.csv](omop/v5/schema/person.csv)) that lists `type`, `length`, `precision`, `scale`, and `default` attributes (all optional except `type`) for each field, which is identified by `model`, `version`, `table` name, and `field` name attributes.\n\nThe `constraints` directory (e.g., [omop/v5/constraints](omop/v5/constraints)), if present, can hold any number of CSV files which list data level constraints that should be applied to any physical representation of the data model. These files (e.g., [omop/v5/constraints/not_nulls.csv](omop/v5/constraints/not_nulls.csv)) contain a `type`, an optional `name`, and the target `table` and `field` for each constraint.\n\nThe `indexes.csv` file (e.g., [omop/v5/indexes.csv](omop/v5/indexes.csv)), if present, lists indexes that should be built on a physical representation of the data model, with `name`, whether the index should be `unique`, target `table` and `field`, and `order` attributes for each index.\n\nThe `references.csv` file (e.g., [omop/v5/references.csv](omop/v5/references.csv)), if present, lists references (usually foreign keys) which should be enforced on the data model. Each reference is listed with the source `table` and `field`, the target `table` and `field`, and an optional `name`.\n\nEach data model root directory may have a `renamings.csv` file (e.g., [omop/renamings.csv](omop/renamings.csv)) that maps fields which have been renamed across versions by providing a source data model `version`, `table`, and `field` and a target `version`, `table`, and `field`.\n\nThe top-level `mappings` directory holds a series of CSV files which list field level mappings between data models. The files (e.g., [mappings/pedsnet_v2_omop_v5.csv](mappings/pedsnet_v2_omop_v5.csv)) contain a `target_model`, `target_version`, `target_table`, and `target_field` as well as a `source_model`, `source_version`, `source_table`, and `source_field` along with a free text `comment` for each mapping.\n\n## CSV Tools\n\n#### Python\n\nThe [`csv`](https://docs.python.org/2/library/csv.html) can be used in the standard library.\n\n```python\nimport csv\n\n# Writes all records to a file given a filename, a list of string representing\n# the header, and a list of rows containing the data.\ndef write_records(filename, header, rows):\n    with open('person.csv', 'w+') as f:\n        w = csv.writer(f)\n\n        w.writerow(header)\n\n        for row in rows:\n            w.writerow(row)\n```\n\n#### PostgreSQL\n\nPostgreSQL provides valid CSV output using the [`COPY`](http://www.postgresql.org/docs/9.2/static/sql-copy.html) statement. The output can be to an file using an absolute file name or to STDOUT.\n\nAbsolute path.\n\n```sql\nCOPY ( ... )\n    TO '/path/to/person.csv'\n    WITH (\n        FORMAT csv,\n        DELIMITER ',',\n        NULL '',\n        HEADER true,\n        ENCODING 'utf-8'\n    )\n```\n\nTo STDOUT.\n\n```sql\nCOPY ( ... )\n    TO STDOUT\n    WITH (\n        FORMAT csv,\n        DELIMITER ',',\n        NULL '',\n        HEADER true,\n        ENCODING 'utf-8'\n    )\n```\n\n\n##### Java\n\nThe [`opencsv`](http://opencsv.sourceforge.net/) is a popular package for reading and writing CSV files.\n\nFor loop with `rows` as a Collection or Array.\n\n```java\nCSVWriter writer = new CSVWriter(new FileWriter(fileName),\n                                 CSVWriter.DEFAULT_SEPARATOR,\n                                 CSVWriter.NO_QUOTE_CHARACTER);\n\nwriter.writeNext(header)\n\nfor (int row : rows) {\n    writer.writeNext(row);\n}\n\nwriter.close();\n```\n\nIf `rows` is a `java.sql.ResultSet`, use `writeAll` directly.\n\n```java\nCSVWriter writer = new CSVWriter(new FileWriter(fileName),\n                                 CSVWriter.DEFAULT_SEPARATOR,\n                                 CSVWriter.NO_QUOTE_CHARACTER);\n\n// Pass the result set and derive the header from the result set\n// (assuming it is valid with the spec).\nwriter.writeAll(rows, true);\n\nwriter.close();\n```\n\n##### Oracle\n\nOracle experts should feel free to chime in, but a very promising option is Oracle's new SQLcl command-line tool, available on an early-adopter basis as part of the [SQL Developer](http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html) family.  SQLcl is being touted as a modern replacement for SQL*Plus.\n\nSample usage:\n\n```\nset sqlformat csv\nspool footable.csv\nselect * from footable;\nspool off\n```\n\nAnother option is to use the [SQL Developer](http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html) GUI itself, which, although convenient, is not amenable to automation, as SQLcl is.\n\nSQL Developer (and probably SQLcl) export CSV using the following conventions: all text fields are wrapped in quotes (even NULL values, because NULL and empty string are treated the same in Oracle), and no numeric fields are wrapped in quotes. Quotes within fields are escaped via doubling. Newlines within fields are included in the output.\n\nSQL Developer usage:\n\n* On a Data tab (or a table name in the Connections panel), right-click and choose Export\n* Change format to csv\n* Change line terminator to Unix - other formatting and encoding defaults are fine\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchop-dbhi%2Fdata-models","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchop-dbhi%2Fdata-models","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchop-dbhi%2Fdata-models/lists"}