{"id":15989722,"url":"https://github.com/cmungall/human-cell-atlas","last_synced_at":"2025-08-01T20:13:03.859Z","repository":{"id":66914404,"uuid":"595331030","full_name":"cmungall/human-cell-atlas","owner":"cmungall","description":"EXPERIMENTAL translation of HCA json schema to LinkML","archived":false,"fork":false,"pushed_at":"2023-01-30T22:31:01.000Z","size":1682,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-04T23:13:30.449Z","etag":null,"topics":["alpha","cell-atlas","for-exploratory-purposes-only","linkml","metadata","single-cell"],"latest_commit_sha":null,"homepage":"https://cmungall.github.io/human-cell-atlas/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmungall.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-30T21:32:42.000Z","updated_at":"2023-01-30T22:27:54.000Z","dependencies_parsed_at":"2023-05-13T23:00:15.714Z","dependency_job_id":null,"html_url":"https://github.com/cmungall/human-cell-atlas","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cmungall/human-cell-atlas","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fhuman-cell-atlas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fhuman-cell-atlas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fhuman-cell-atlas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fhuman-cell-atlas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmungall","download_url":"https://codeload.github.com/cmungall/human-cell-atlas/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fhuman-cell-atlas/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268290755,"owners_count":24226646,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alpha","cell-atlas","for-exploratory-purposes-only","linkml","metadata","single-cell"],"created_at":"2024-10-08T05:01:32.152Z","updated_at":"2025-08-01T20:13:03.811Z","avatar_url":"https://github.com/cmungall.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# human-cell-atlas\n\nEXPERIMENTAL translation of HCA\n\nCaveat: this schema is entirely constructed via an automated import of the HCA json schema.\n\n- there may be parts missing\n- the direct mapping may not utilitize key parts of LinkML\n\n## Website\n\n* [https://cmungall.github.io/human-cell-atlas](https://cmungall.github.io/human-cell-atlas)\n\nThe above is generated entirely from the schema, which comes from the json schema; as such\nit may be spare on details.\n\nThis is also using the older linkml documentation framework, which doesn't show all the schema\n\n## Schema\n\n* [src/human_cell_atlas/schema](src/human_cell_atlas/schema) \n\n## How this was made\n\nThis was created using [schema-automator](https://github.com/linkml/schema-automator/)\n\nUtilizing the following HCA-specific extensions\n\n- mapping of `user_friendly` to `linkml:title`\n- mapping HCA ontology extensions to [dyanamic enums](https://linkml.io/linkml/schemas/enums.html#dynamic-enums)\n\nThe following modifications were made:\n\n- Changed “10x” to “S10x” (because otherwise this creates awkward incompatibilities between the generated python classes and the schema)\n- Modified hca/system/links.json to avoid name clashes with SupplementaryFile\n\n## Treatment of Links\n\nI need to figure out exactly how the system/links schema is used in HCA. Currently it doesn't \"connect up\" to the rest of the schema.\n\nIt seems that some kind of extra-schema information is required\n\n## Ontology Enums\n\nAll plain json enums are mapped to LinkML enums. Note that we elected not to inline these, so there are a lot of \"trivial\" enums with one value where the\nintent is to restrict the value of a field.\n\nIn future, the permissible values could be mapped to ontology terms, but this info isn't in the schema.\n\nHCA also uses a JSON schema extension for ontology enums, these are converted to LinkML dynamic enums, as below\n\n### Examples\n\n* [src/human_cell_atlas/schema//module/ontology/development_stage_ontology.yaml](src/human_cell_atlas/schema//module/ontology/development_stage_ontology.yaml)\n\nLinkML:\n\n```yaml\n  DevelopmentStageOntology_ontology_options:\n    include:\n    - reachable_from:\n        source_ontology: obo:efo\n        source_nodes:\n        - EFO:0000399\n        - HsapDv:0000000\n        - UBERON:0000105\n        relationship_types:\n        - rdfs:subClassOf\n        is_direct: false\n        include_self: false\n    - reachable_from:\n        source_ontology: obo:hcao\n        source_nodes:\n        - EFO:0000399\n        - HsapDv:0000000\n        - UBERON:0000105\n        relationship_types:\n        - rdfs:subClassOf\n        is_direct: false\n        include_self: false\n```\n\nfrom:\n\n```json\n\"ontology\": {\n            \"description\": \"An ontology term identifier in the form prefix:accession.\",\n            \"type\": \"string\",\n            \"graph_restriction\":  {\n                \"ontologies\" : [\"obo:efo\", \"obo:hcao\"],\n                \"classes\": [\"EFO:0000399\", \"HsapDv:0000000\", \"UBERON:0000105\"],\n                \"relations\": [\"rdfs:subClassOf\"],\n                \"direct\": false,\n                \"include_self\": false\n            },\n```\n\nnote the mapping is not quite direct. A seperate query is generated in linkml for each input ontology, where the\ninput seeds are repeated each time (`include` takes the union of all subqueries)\n\nI believe the semantics are the same as for the source, although some combos will yield empty sets?\n\nThe more natural way to author this in linkml would be to make the classes specific to each subquery.\n\n## Materialized Ontology Enums\n\nSee [value set toolkit](https://github.com/INCATools/ontology-access-kit/releases/tag/v0.1.58)\n\nTo expand value sets:\n\n`poetry run sh utils/expand-value-sets.sh`\n\nThis materializes the value set queries, so that:\n\n- normal non-extended json-schema tooling can use them\n- query results can be versioned alongside releases\n\nThese are included alongside as `\u003cNAME\u003e.expanded.yaml`\n\nFile sizes:\n\n| Value Set| Expanded | File Size |\n| ---| --- | --- |\n| [enrichment_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [enrichment_ontology expanded](src/human_cell_atlas/schema/module/ontology/enrichment_ontology.expanded.yaml) | 4.0K|\n| [organ_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [organ_ontology expanded](src/human_cell_atlas/schema/module/ontology/organ_ontology.expanded.yaml) | 1.5M|\n| [cell_cycle_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [cell_cycle_ontology expanded](src/human_cell_atlas/schema/module/ontology/cell_cycle_ontology.expanded.yaml) | 8.0K|\n| [biological_macromolecule_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [biological_macromolecule_ontology expanded](src/human_cell_atlas/schema/module/ontology/biological_macromolecule_ontology.expanded.yaml) | 12K|\n| [sequencing_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [sequencing_ontology expanded](src/human_cell_atlas/schema/module/ontology/sequencing_ontology.expanded.yaml) | 60K|\n| [protocol_type_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [protocol_type_ontology expanded](src/human_cell_atlas/schema/module/ontology/protocol_type_ontology.expanded.yaml) | 16K|\n| [species_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [species_ontology expanded](src/human_cell_atlas/schema/module/ontology/species_ontology.expanded.yaml) | 215M|\n| [development_stage_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [development_stage_ontology expanded](src/human_cell_atlas/schema/module/ontology/development_stage_ontology.expanded.yaml) | 64K|\n| [target_pathway_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [target_pathway_ontology expanded](src/human_cell_atlas/schema/module/ontology/target_pathway_ontology.expanded.yaml) | 108K|\n| [disease_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [disease_ontology expanded](src/human_cell_atlas/schema/module/ontology/disease_ontology.expanded.yaml) | 4.8M|\n| [strain_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [strain_ontology expanded](src/human_cell_atlas/schema/module/ontology/strain_ontology.expanded.yaml) | 16K|\n| [file_content_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [file_content_ontology expanded](src/human_cell_atlas/schema/module/ontology/file_content_ontology.expanded.yaml) | 512K|\n| [library_construction_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [library_construction_ontology expanded](src/human_cell_atlas/schema/module/ontology/library_construction_ontology.expanded.yaml) | 12K|\n| [contributor_role_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [contributor_role_ontology expanded](src/human_cell_atlas/schema/module/ontology/contributor_role_ontology.expanded.yaml) | 24K|\n| [mass_unit_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [mass_unit_ontology expanded](src/human_cell_atlas/schema/module/ontology/mass_unit_ontology.expanded.yaml) | 8.0K|\n| [cell_type_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [cell_type_ontology expanded](src/human_cell_atlas/schema/module/ontology/cell_type_ontology.expanded.yaml) | 316K|\n| [library_amplification_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [library_amplification_ontology expanded](src/human_cell_atlas/schema/module/ontology/library_amplification_ontology.expanded.yaml) | 4.0K|\n| [microscopy_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [microscopy_ontology expanded](src/human_cell_atlas/schema/module/ontology/microscopy_ontology.expanded.yaml) | 8.0K|\n| [ethnicity_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [ethnicity_ontology expanded](src/human_cell_atlas/schema/module/ontology/ethnicity_ontology.expanded.yaml) | 36K|\n| [organ_part_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [organ_part_ontology expanded](src/human_cell_atlas/schema/module/ontology/organ_part_ontology.expanded.yaml) | 1.5M|\n| [treatment_method_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [treatment_method_ontology expanded](src/human_cell_atlas/schema/module/ontology/treatment_method_ontology.expanded.yaml) | 296K|\n| [process_type_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [process_type_ontology expanded](src/human_cell_atlas/schema/module/ontology/process_type_ontology.expanded.yaml) | 84K|\n| [time_unit_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [time_unit_ontology expanded](src/human_cell_atlas/schema/module/ontology/time_unit_ontology.expanded.yaml) | 4.0K|\n| [file_format_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [file_format_ontology expanded](src/human_cell_atlas/schema/module/ontology/file_format_ontology.expanded.yaml) | 4.0K|\n| [instrument_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [instrument_ontology expanded](src/human_cell_atlas/schema/module/ontology/instrument_ontology.expanded.yaml) | 12K|\n| [cellular_component_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [cellular_component_ontology expanded](src/human_cell_atlas/schema/module/ontology/cellular_component_ontology.expanded.yaml) | 480K|\n| [length_unit_ontology](src/human_cell_atlas/schema/module/ontology/.yaml) | [length_unit_ontology expanded](src/human_cell_atlas/schema/module/ontology/length_unit_ontology.expanded.yaml) | 8.0K|\n\nNote in particular that the species expanded subset in a quarter of a gigabyte...\n\nSome of the expanded sets may be empty due to a mismatch in how HCA and OAK use CURIEs for EDAM\n\n## Repository Structure\n\n* [project/](project/) - project files (do not edit these)\n* [src/](src/) - source files (edit these)\n    * [human_cell_atlas](src/human_cell_atlas)\n        * [schema](src/human_cell_atlas/schema) -- LinkML schema (generated from HCA)\n* [datamodel](src/human_cell_atlas/datamodel) -- Generated python datamodel\n* [tests](tests/) - python tests\n\n## Developer Documentation\n\n\u003cdetails\u003e\nUse the `make` command to generate project artefacts:\n\n- `make all`: make everything\n- `make deploy`: deploys site\n\n\u003c/details\u003e\n\n## Credits\n\nthis project was made with [linkml-project-cookiecutter](https://github.com/linkml/linkml-project-cookiecutter)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmungall%2Fhuman-cell-atlas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmungall%2Fhuman-cell-atlas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmungall%2Fhuman-cell-atlas/lists"}