Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cmungall/linkml-phenopackets

EXPERIMENTAL rendering of phenopackets in linkml
https://github.com/cmungall/linkml-phenopackets

human-phenotype-ontology linkml metadata monarchinitiative phenopackets phenotypes rdf semweb

Last synced: 3 months ago
JSON representation

EXPERIMENTAL rendering of phenopackets in linkml

Awesome Lists containing this project

README

        

# Phenopackets EXPERIMENTAL linkml schema

Browse the autogenerated schema documentation here:

* [https://cmungall.github.io/linkml-phenopackets/](https://cmungall.github.io/linkml-phenopackets/)

Note the linkml markdown rendering is still incomplete, for a full schema, see

* [src/phenopackets/schema](src/phenopackets/schema)

## What is this repo and how was it made?

This is an experiment in rendering the [phenopackets](https://phenopacket-schema.readthedocs.io/) in LinkML.
It is NOT the official GA4GH phenopackets schema.

The intent is to demonstrate some of the tooling and integrative capabilities of LinkML over Protobuf, in particular:

- Additional validation not possible in Protobuf, including:
- required fields
- ontology constraints
- Generation of python classes and tooling
- Export/Import phenopackets from to and from RDF
- Semantic annotation of schemas
- Cross-schema integration

The LinkML schema is generated using a script [proto2linkml](util/proto2linkml.pl) that converts from the Protobuf source. This makes
use of specific *conventions* in the Protobuf source, such as the use of particular controlled
keywords in `//` comments. As such, this code is *not* generalizable to other protobuf schemas.

Note we intentionally don't use out of band-info. Currently some records in the protobuf are undefined,
so they will be undefined in the LinkML. We have not done additional curation based on the `.rst` docs,
it comes from protobuf.

## Ontology Enhancements

The file [cv_terms.yaml](https://github.com/cmungall/linkml-phenopackets/blob/main/src/phenopackets/schema/cv_terms.yaml)
is hand-curated, rather than derived from the YAML. It is based on: [recommended ontologies](https://phenopacket-schema.readthedocs.io/en/latest/recommended-ontologies.html)
from the official phenopackets repo. It makes use of *dynamic enums* which allows for more advanced
ontology checking; for example:

* Uberon anatomy terms must be found under the "anatomical entity branch"
* HPO abnormality terms must be found under the "phenotypic abnormality branch"

etc

We also include [constants.yaml](https://github.com/cmungall/linkml-phenopackets/blob/main/src/phenopackets/schema/constants.yaml)
which is a direct transform from the phenopackets-tools repo.

## How to use this repo

You can browse the [schema docs](https://cmungall.github.io/linkml-phenopackets/) which are generated from the LinkML
schema.

You can also explore the schema in the [schema](src/phenopackets/schema) directory.

As part of the build process, we also validate and convert all canonical Phenopacket examples into YAML, JSON,
and RDF.

* [examples/](examples/) - converted examples

You can also use the generated python classes in combination with the linkml-runtime. Note that we have NOT
released this to PyPI to avoid confusion with official Phenopackets libraries, so to run this you will need to clone
the repo and install it locally.

```bash
poetry install
```

There will also be demonstrator Jupyter notebooks here:

- [notebooks](src/docs/notebooks) directory

## Validation

Use `p3 validate` to validate objects. This goes beyond what can be done with JSON-Schema alone, and
includes ontology validation using OAK and CURIE validation using BioRegistry.

See [this notebook](https://github.com/cmungall/linkml-phenopackets/blob/main/src/docs/notebooks/Updating-Packets-Using-Ontology.ipynb)

## Repairing ontology terms

Phenopackets include ontology terms, which are liable to become stale.

This toolkit uses OAK to assist in auto-migration of obsoletes or stale labels.

See [this notebook](https://github.com/cmungall/linkml-phenopackets/blob/main/src/docs/notebooks/Updating-Packets-Using-Ontology.ipynb)

## Querying Phenopackets as RDF

TODO: Add documentation here

## Using Phenopackets in conjunction with OAK

TODO: Add documentation here