{"id":49640874,"url":"https://github.com/alliance-genome/agr_curation_schema","last_synced_at":"2026-05-05T19:35:02.660Z","repository":{"id":37871669,"uuid":"370446405","full_name":"alliance-genome/agr_curation_schema","owner":"alliance-genome","description":"Schema repository for the Alliance of Genome Resources persistent data store","archived":false,"fork":false,"pushed_at":"2026-04-28T19:16:37.000Z","size":15281,"stargazers_count":12,"open_issues_count":4,"forks_count":1,"subscribers_count":28,"default_branch":"main","last_synced_at":"2026-04-28T19:31:23.235Z","etag":null,"topics":["alliance-curation","alliance-of-genome-resources","linkml"],"latest_commit_sha":null,"homepage":"https://alliance-genome.github.io/agr_curation_schema/","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alliance-genome.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-05-24T18:18:44.000Z","updated_at":"2026-04-13T06:55:04.000Z","dependencies_parsed_at":"2023-11-20T00:21:24.745Z","dependency_job_id":"10a11ab6-cc2e-4e69-ae9c-3698f7c3c7b0","html_url":"https://github.com/alliance-genome/agr_curation_schema","commit_stats":null,"previous_names":[],"tags_count":52,"template":false,"template_full_name":"linkml/archived-linkml-model-template","purl":"pkg:github/alliance-genome/agr_curation_schema","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alliance-genome%2Fagr_curation_schema","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alliance-genome%2Fagr_curation_schema/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alliance-genome%2Fagr_curation_schema/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alliance-genome%2Fagr_curation_schema/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alliance-genome","download_url":"https://codeload.github.com/alliance-genome/agr_curation_schema/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alliance-genome%2Fagr_curation_schema/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32665219,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-05T11:29:49.557Z","status":"ssl_error","status_checked_at":"2026-05-05T11:29:48.587Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alliance-curation","alliance-of-genome-resources","linkml"],"created_at":"2026-05-05T19:35:01.989Z","updated_at":"2026-05-05T19:35:02.652Z","avatar_url":"https://github.com/alliance-genome.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LinkML Model to describe the Alliance curation/persistence data store\nModel to describe the Alliance curation/persistence data store\n\n## Model Components and Visualizations\nTo browse the model through html and visualisation, visit the project page [here](https://alliance-genome.github.io/agr_curation_schema/).\nThis page will always represent the model as available on the [main branch of the repository](https://github.com/alliance-genome/agr_curation_schema/).\n\nTo build the html and visualisations and browse it for a specific release or code version, checkout the repository locally\n(at that specific release tag, branch or commit) and run the following command:\n```bash\nmake serve-docs\n```\n\nThis should report the URL at which it is serving the docs locally, usually http://127.0.0.1:8000/alliance-genome/agr_curation_schema/.\n\n## Developing the AGR Curation Schema \n\nThe Alliance schema is stored in a series of interconnected YAML files in the `model/schema` directory written using\n[LinkML syntax](https://linkml.io/linkml/intro/tutorial.html). LinkML is an object-oriented modeling language, tutorial \n[here](https://linkml.io/linkml/intro/tutorial.html) with tooling that can convert simple, easy to author YAML into \nvalidatable artifacts such as: JSON schemas, SQL DDL, python and java classes, markdown documentation and others.\n  \nSome of these artifact types and generated and stored in this repository as well including JSON schemas that are used\nto submit data.  The main modeling components of any LinkML model are: classes, slots, enumerations and types.  \n\n### Classes \n\nClasses are a set or category of things having some property or attribute in common and differentiated from others by \nkind, type, or quality.\n\n### Slots\n\nSlots (synonym: attributes) are properties or attributes of classes and can be specified once and reused in many \nclasses in the model. \n\n### Types\n\nTypes can be \"string\" or \"integer\" per many language specifications, but if custom types can also be defined. \nAlliance model reuses a custom LinkML type (URIorCurie) which restricts a slot value to a URI or CURIE data type. \n\n### Enums\n\nEnumerations are objects that are used to restrict the values of a particular slot to those declared in the \nenumeration.\n\n### Alliance Model Development Conventions\n\nIn addition to LinkML conventions, this repository follows a series of local conventions.    \n\n- classes should be written in CamelCase\n- slots should be written in snake_case\n\nThe Schema is divided into several YAML files roughly by biological domain.  There are two special YAML files that\nbehave differently from the rest:\n\n`model/schema/allianceModel.yaml`\n\nallianceModel.yaml is the grouping schema file for all the other split out schemas (YAML files).  In its \"imports\"\nsection, all the other YAML files are listed by name.  This helps the automated regeneration of artifacts (like \nJSONSchema, python, java files) to find and combine all the sub schemas together.  When adding a new YAML file (a new\nbiological domain, or new schema component file) be sure to add your schema file as an import to allianceModel.yaml\n   \n`model/schema/core.yaml` \n\nThe core.yaml file holds classes and\nattributes that should or can be reused in several of the biological domains (e.g. AuditedObject, a class that holds\nall relevant data tracking information is a parent class of all other biological objects that need to be audited.  \nAuditedObject lives in core.yaml)\n\n##### Tests\n\nAny time a new domain is developed, a test should be written to exercise the resulting JSON schema artifact.  This helps\nprovide example data for people seeking to produce data to our standard, and helps test the model for model developers.\nTest data belongs in `test/data` and `test/invalid/data`.  Test files should be named following these conventions:\n\nvalid test data files: `\"test_name\"_test.json`\n\ninvalid test data files: `\"test_name\"_invalid.json`\n\nAnd test file should be added (for the moment, we can turn this into a configuration file in another iteration) in the\nMakefile collections:\nSCHEMA_TEST_EXAMPLES,\nSCHEMA_TEST_EXAMPLES_INVALID\n\n`test/data/allele_test.json` and `test/data/invalid/allele_invalid.json` are good examples of how to do this.\n\nAn easy way to write a test is to pull out the 'required' fields of a domain object from `jsonschema/allianceModel.schema.json`\nand write test data according to those required fields.  Alternatively, a few submission objects from an existing submission\nwill be a better exercise of the schema when not many fields are required.  \n\n##### Ingest classes\n\nThe use of separate classes for data ingest came about due to the inability to adequately represent requirements of data ingest\nand storage using the same model.  An example of this is the curie field for disease annotations.  This will be populated\nwith Alliance-minted IDs and will be a required field in the database.  However, because this ID is generated at the Alliance\nit cannot be a required field for ingest.  The use of separate “DTO” (data transfer object) classes to represent the\nrequirements for ingest, and ingest only, avoids issues such as this.\n\nIn addition to avoiding conflicting requirements, the use of separate classes for ingest makes the generated JSON schema for\nthe ingest classes much cleaner.  All fields that are populated via Alliance business logic can be excluded from the ingest\nclasses.  Slot naming and descriptions can also be tailored towards ingest, making it much clearer to DQMs exactly what is\nrequired for submission, e.g. `evidence_codes` in the `DiseaseAnnotation` class vs. `evidence_code_curies` in the\n`DiseaseAnnotationDTO` class.  In both these cases, the generated JSON schema represents these slots as a list of strings but\nin the case of the former it is not clear which field is expected to be represented in the ingest file - it could just as well be\nthe evidence code name or abbreviation that could be expected.  Furthermore, the description in the `evidence_code_curies` slot\ndefinition could be used for ingest-specific instructions that would be propagated to the generated JSON schema.\n\nIn many cases, the inheritance pattern of the DTO classes will mirror that of the corresponding non-DTO classes, but that does\nnot have to be the case.  It may make sense to remove levels in the hierarchical structure in order to simplify the ingest schema\nand/or give more descriptive slot names.  An example of this can be found in the `DiseaseAnnotation` and `DiseaseAnnotationDTO`\nclasses.  The `DiseaseAnnotation` class inherits from `Association`, which in turn inherits from `AuditedObject`.  However, the\n`DiseaseAnnotationDTO` class inherits directly from the `AuditedObjectDTO` class and the slots corresponding to those in the\n`Association` class are moved up to the `DiseaseAnnotationDTO` class itself and its child classes.  This allows the ingest slots\ncorresponding to the `Association` class slots `subject`, `predicate` and `object` to be specific to the data type being ingested\nand be more descriptive - `agm_curie`, `allele_curie` and `gene_curie` in the `AGMDiseaseAnnotationDTO`, `AlleleDiseaseAnnotationDTO`,\nand `GeneDiseaseAnnotationDTO` classes correspond to `subject`; `disease_relation_name` and `do_term_curie` in the\n`DiseaseAnnotationDTO` class correspond to `predicate` and `object`.  This makes the generated JSON schema much more transparent to\nDQMs.\n\nClass names for ingest classes simply follow the naming of the corresponding non-ingest classes and add the DTO suffix although,\nas described above, not all classes in a hierarchy are necessarily represented.\n\nSlot names for slots used in ingest classes should be descriptive of exactly what is required to be submitted.  In many cases this\nwill simply be the same name as the corresponding non-ingest slot with a suffix to indicate which field is required to be submitted.\nFor classes that have a curie this is usually the snake case form of the class name with the suffix `_curie` (e.g. `eco_term_curie`\nor `reference_curie`), for slots where the corresponding non-ingest slot range is a `VocabularyTerm`  it is typically the name of the\nvocabulary with the suffix `_name`, e.g. (`genetic_sex_name`).  As with non-ingest slots, multivalued slot names should be pluralised\n(e.g. `disease_qualifier_names`).  For inlined classes, where the complete class object is submitted as part of the submission of\nanother class, the suffixes `_dto` or `_dtos` should be used (e.g. `condition_relation_dtos` or `note_dtos`).  In cases where the\nnon-ingest slot has a range of string, curie, or boolean and the slot name is sufficiently descriptive, there is no need to define a\ncorresponding ingest slot and the same slot definition can be used in both ingest and non-ingest classes (e.g. `is_extinct`).\n\n## Building the Artifacts of the AGR Curation Schema\n\nArtifacts of the AGR Curation Schema are defined as all the schema transformations that are automatically conducted \nvia the Makefile in this repository.   These artifact are generated using LinkML software added as a dependency in this\nrepository.\n\nIn the `Makefile` here, all the different kinds of generated artifact targets are prefixed with 'gen-' to indicate that\nthose targets create a specific kind of file.  For example, the `gen-jsonschema` target generates the allianceModel.schema.json\nfile are executed when `make` or `make all` is run from the command line.  The github actions in this repository also\nuse these Makefile targets to generate the artifacts for a pull request or for a release of the repository.\n\nTo regenerate python, java, jsonschema, etc. locally, run `make` from the command line.\nTwo other important targets exist in the Makefile for this repository: stage and test.  Stage moves all the assembled \nartifacts that are generated in a non-checked-in directory (`target` directory) into the top of the artifacts directory (`/generated/`)\nfor easier discoverability and packaging of these artifacts and to enable ignoring these during PR review.\n`stage` is executed as part of the build targets in the Makefile via github actions (GA)\nand so developers can ignore these targets in favor of automated builds via GA.\n\nTo make a schema change and test your changes:\n\n1. checkout a new branch for development\n   ```bash\n   git checkout -b new_branch_name\n   ```\n2. change the appropriate YAML file in `model/schema`\n3. rebuild all your artifacts (this in itself is a test of the validity of your schema change):\n    ```bash\n    make\n    ```\n4. add or modify an existing test in `test/data` and `test/data/invald`\n5. add your new test, or verify the test you changed is available in the Makefile for execution.\n6. run the tests to confirm that your test data is validated against the generated JSONschema artifact that you made in \nstep 2 above.\n    ```bash\n    git add test/data/[new_]test.json # optional step if you added a new test\n    make clean\n    make test \n    ```\n7. commit your change and open pull request.\n    ```bash\n   git commit -a -m \"message indicating what you changed.\" \n   ```\n\n## Examples of valid ingest files and tests\n\nThe Makefile in this repository runs a series of json schema validation tests using test data stored in this repository.\nTest data are located in the `test/data` directory which is split into example data that are valid JSON according\nto the Alliance JSON schema, and data that are invalid.\n\nThe invalid tests will report how they are invalid, and expect to be invalid so that a passing 'invalid' test will \nreport how its invalid but not break the build.  This is a convenient way to exercise the schema and make subtle errors\nmore clearly stated.\n\nTo run the tests in this repository:\n```bash\nmake clean\nmake test\n```\n\nThis will run both the valid and invalid tests. \n\nNote there is a convention in this repository to make a 'set' slot for each ingest as a convenient convention for \nbeing able to validate objects in a JSON list.  In the example below, an \"allele ingest set\" is defined in the \nmodel as being a multivalued slot that contains Allele objects.  In this way, the JSON validator and automated\ntests can validate that the submission is a list of Allele objects (as opposed to a list of Gene objects, or a \nlist of GOTerm objects).  A simple example (using only the required fields of the Allele object), would look like this:\n\n```json\n{\n  \"allele_ingest_set\": [\n    {\n      \"curie\": \"ZFIN:ZDB-ALT-123456-1\",\n      \"taxon\": \"NCBITaxon:7955\",\n      \"created_by\": \"ZFIN\",\n      \"modified_by\": \"ZFIN\"\n    },\n    {\n      \"curie\": \"ZFIN:ZDB-ALT-123456-2\",\n      \"taxon\": \"NCBITaxon:7955\",\n      \"created_by\": \"ZFIN\",\n      \"modified_by\": \"ZFIN\"\n    }\n  ]\n}\n```\n\n## Generating MOD JSON files and validating them\n\nThe Alliance JSON schema is generated from the Alliance model YAML in `model/schema` in this repository.  Also provided in \nthis repository is a simple JSON schema validator (`util/validate_agr_schema.py`) that has a command line interface.\n\nFor users wishing to run a validator locally before submitting their files to the Alliance, this can be done by:\n\n1. cloning the agr_curation_schema locally: \n```bash\ngit clone https://github.com/alliance-genome/agr_curation_schema\n```\n\n2. validating a local JSON file against the schema:\nmake sure you have linkml installed (see above for instructions on how to install the agr_curation_schema dependencies into your \npython environment).  Then run the validator:\n```bash\npoetry run linkml-validate -C Ingest -s model/schema/allianceModel.yaml -s model/schema/allianceModel.yaml [path/to/your/submission/file.json]\n```\n\nexample command using test data (replace test/allele_ingest_test.json with the path to your submission.json):\n```bash\npoetry run linkml-validate -C Ingest -s model/schema/allianceModel.yaml -s model/schema/allianceModel.yaml test/allele_ingest_test.json\n```\n\nnote: it's good practice to use a python virtual environment when running commands as the installed version of python\n(for example, on most Mac operating systems is version 2 vs. version 3).  \nHere is a guide for working with python and poetry virtual environments: https://berkeleybop.github.io/best_practice/python_environments\nNote, the guide above has been tested with a wide variety of biomedical python projects and is a great learning guide\nfor working with python and virtual environments.\n\nThe basic steps for setting up a system with multiple versions of python and virtual environments are:\n1) Install pyenv, homebrew is a good method for installing pyenv:\n```bash\nbrew update\nbrew install pyenv\n```\nonce installed, you can add many versions of python to your system using pyenv:\n```bash\npyenv install 3.8.15\npyenv global 3.8.15\n```\n\n2) Install poetry\n```bash\ncurl -sSL https://install.python-poetry.org | python3 -\npoetry config virtualenvs.in-project true\npoetry config virtualenvs.prefer-active-python true\n```\n\n3) Creating a virtual environment with poetry\n```bash\ngit clone https://github.com/alliance-genome/agr_curation_schema\ngit checkout -b my_new_branch\ncd agr_curation_schema\npoetry install  \n```\n\n5) Run the tests\n```bash\nmake test\n```\n\n## Alternate development environment - Using Docker\n\n1) remove containers if exist\n```bash\nmake remove-container\n```\n2) build and run the tests in the container\n```bash\nmake build-container\nmake run-tests\n````\nThis will mount and run your local directory to the docker container.  Any changes you make to the local \ndirectory will be reflected in the container.  This is a convenient way to run the tests in a clean \nenvironment without having to install the dependencies locally.\n\nMake a change to the schema or tests files and run again:\n```bash\nmake run-tests\n```\n\n3) remove the container when done\n```bash\nmake remove-container\n```\n\n## Alternate alternate test environment - just use pipx on MacOS\n1) install pipx\n```bash\nbrew install pipx --user\n```\n2) run tests in an isolated pipx environment\n```bash\npipx run --spec linkml linkml-validate -C Ingest -s model/schema/allianceModel.yaml -s model/schema/allianceModel.yaml test/data/allele_test.json\n```\n\nIf you need to make changes based on test results, you may wish to configure git \naccording to https://docs.github.com/en/get-started/getting-started-with-git/about-remote-repositories#cloning-with-https-urls\nin order to avoid having to enter your github credentials every time you push a change.\n\n## All else fails\n1) make schema changes in a branch\n2) push the branch to github; github actions will run all the tests for you in an isolated environment even on a draft PR.  Use the actions to see if your tests pass.\n\n## GitHub Actions\n\nThere are three GitHub action files that are triggered based on actions that occur on this repository.\n\n1. ```check-pull-request.yaml```\n\nOn each pull request, the artifacts (JSONSchema, doc, python, java etc) are run in order to generate fresh models\nin those languages based on the changed schema YAML files.  \n\nOnce the artifacts are built, the action also runs the test suite against the new schema to insure the tests data still\npasses the schema.  One should change the tests when changing the schema in order to have PRs pass this check.\n\n2. ```build-deploy-documentation.yaml```\n\nOn each merge into main, the docs are regerated (regenerated with a [linkml generator](https://linkml.io/linkml/generators/markdown.html)) using the model/schema/\\*.yaml files, generates markdown documentation and UML diagrams and pushes them into the gh-pages branch of this repository.  \nNote: the documentation is only stored in the gh-pages branch only, not in the main repository.\n\nThis process is controlled in part by the mkdocs.yaml file in this repo which declares the source and site documentation repos and the location of the navigation index.html.\n\n3. ```pypi-publish.yaml```\n\nOn release, this action is run to assembly and push this repository to PYPI automatically.  This is currently disabled for\nAlliance Schema.  \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falliance-genome%2Fagr_curation_schema","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falliance-genome%2Fagr_curation_schema","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falliance-genome%2Fagr_curation_schema/lists"}