{"id":23768553,"url":"https://github.com/nci-gdc/gdc-models","last_synced_at":"2026-01-08T00:09:56.242Z","repository":{"id":18647010,"uuid":"84705010","full_name":"NCI-GDC/gdc-models","owner":"NCI-GDC","description":"Git repository centrally stores and serves GDC data models defined in static YAML files","archived":false,"fork":false,"pushed_at":"2024-05-22T20:07:42.000Z","size":1033,"stargazers_count":3,"open_issues_count":6,"forks_count":1,"subscribers_count":20,"default_branch":"develop","last_synced_at":"2024-05-22T21:34:11.021Z","etag":null,"topics":["core","library"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NCI-GDC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-12T06:25:29.000Z","updated_at":"2024-05-30T02:09:48.987Z","dependencies_parsed_at":"2023-10-13T07:31:16.311Z","dependency_job_id":"34f6f358-09a3-404f-8a4c-56f0ee0f97c8","html_url":"https://github.com/NCI-GDC/gdc-models","commit_stats":null,"previous_names":[],"tags_count":80,"template":false,"template_full_name":null,"purl":"pkg:github/NCI-GDC/gdc-models","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgdc-models","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgdc-models/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgdc-models/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgdc-models/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NCI-GDC","download_url":"https://codeload.github.com/NCI-GDC/gdc-models/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgdc-models/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263472275,"owners_count":23471811,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["core","library"],"created_at":"2025-01-01T01:37:34.384Z","updated_at":"2026-01-08T00:09:56.210Z","avatar_url":"https://github.com/NCI-GDC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/NCI-GDC/gdc-models.svg)](https://travis-ci.org/NCI-GDC/gdc-models)\n[![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/f71223e269e64eaaa9f6069ceab526c2)](https://www.codacy.com/manual/NCI-GDC/gdc-models?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=NCI-GDC/gdc-models\u0026amp;utm_campaign=Badge_Grade)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white)](https://github.com/pre-commit/pre-commit)\n\n# GDC Models\n\nGit repository centrally stores and serves GDC data models defined in static YAML files.\n\n- [GDC Models](#gdc-models)\n  - [Structure of esmodels directory](#structure-of-esmodels-directory)\n  - [Update the data models](#update-the-data-models)\n    - [Sync](#sync)\n      - [Install](#install)\n      - [Before syncing](#before-syncing)\n      - [Examples](#examples)\n      - [After Syncing](#after-syncing)\n    - [WARNING: YAML \\\u0026 Pre-Commit Hook](#warning-yaml--pre-commit-hook)\n  - [Use the data models](#use-the-data-models)\n    - [Import ES models into Python code](#import-es-models-into-python-code)\n    - [Initialize Elasticsearch index settings and mappings using command line script](#initialize-elasticsearch-index-settings-and-mappings-using-command-line-script)\n\n## Structure of esmodels directory\nFor each index, there are three files that are created and stored under the esmodels/\u003cindex_name\u003e directory:\n- mapping.yaml\n  - The elasticsearch properties are declared here\n- settings.yaml\n  - The elasticserach index-specific settings are declared here\n- vestigial.yaml\n  - For properties that are removed from the graph, we do not want to break gdcapi/portal functionality if they issue an elasticsearch query with a property that is no longer in the graph. This file\ncontains all properties that have been removed from mappings.yaml but are still needed to maintain backwards compatibility. It is expected for elasticsearch queries to return no data for\nthe vestigial properties.\n\n## Update the data models\n\n### Sync\n\nSyncing is the process of updating the models with any properties which may be derived from external sources, normalizing keywords, as well as insuring all default mapping values are set. This process should be run after the gdcdictionary is updated and when any new property is added to the viz indices.\n\nThe process can be run for any index (-i) and any of its doc-types (-d). Multiple can be specified on the command line and if none are provided for either all of the respective type are run.\n\n#### Install\npip-compile --extra=sync \\\n            --index-url=https://nexus.osdc.io/repository/pypi-gdc-releases/simple \\\n            --output-file=requirements-sync.txt \\\n            --strip-extras \\\n            --upgrade\npip install -r requirements-sync.txt\n\n#### Before syncing\nNOTE: Certain esmodels like `case_centric` are augmented from the mappings in `gdcmodels/esmodels/gdc_from_graph/case`. The mappings from `case` are overlayed on the `case_centric` mappings. This implies that previous sync operations may have added entries into the `case_centric` mapping file. In the situation where vestigial mappings are being removed from `gdc_from_graph/case`, then `case_centric` mappings will have to be hand edited to fully remove the vestigial mappings. Similar scenarios exist for the other `gdc_from_graph` folders.\n\n#### Examples\nRun all indices/doc-types:\n```bash\nsync-models\n```\n\nRun all associated doc-types:\n```bash\nsync-models -i gdc_from_graph -i case_centric\n```\n\nRun a singular doc-type:\n```bash\nsync-models -i gdc_from_graph -d file\n```\n\n#### After Syncing\nOnce the sync has been run, review and commit the generated models. These should\ncontain all new properties from the graph (graph indices) and all keywords should have\nthe clinical normalizer applied if appropriate.\n\n### WARNING: YAML \u0026 Pre-Commit Hook\n\nEdit the YAML files as usual, then commit changes to git. A pre-commit hook will\nvalidate YAML and ensure it's well formatted. It is important to keep YAML file formatted\nconsistently, such as using 2 whitespaces for indentation, across all revisions. This\nwill make change tracking much easier.\n\nIf a YAML validation issue is reported, you will need to commit again.\n\nThe pre-commit hook will automatically format all new or changed YAML files. A copy of unchanged original YAML file\nis kept with `.bak` suffix. Before proceed with retrying `git commit`, please `diff` your original YAML\nand the automatically formatted one to ensure YAML formatting did not create any error.\n\n## Use the data models\n\n### Import ES models into Python code\n\n```\nfrom gdcmodels import get_es_models\n\nes_models = get_es_models()\n```\n\n### Initialize Elasticsearch index settings and mappings using command line script\n\n```\n# get usage information by: python init_index.py -h\n# initialize Elasticsearch indexes: case_set and file_set, add prefix 'gdc_r52' to index name\npython init_index.py --index case_set file_set --host localhost --prefix gdc_r52\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnci-gdc%2Fgdc-models","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnci-gdc%2Fgdc-models","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnci-gdc%2Fgdc-models/lists"}