https://github.com/dockstore/ontology
https://github.com/dockstore/ontology
Last synced: 5 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/dockstore/ontology
- Owner: dockstore
- License: other
- Created: 2026-04-21T02:36:26.000Z (about 2 months ago)
- Default Branch: develop
- Last Pushed: 2026-05-30T05:48:15.000Z (17 days ago)
- Last Synced: 2026-05-30T06:11:56.405Z (17 days ago)
- Language: Python
- Size: 668 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# ontology
This repo manages the ontologies that Dockstore uses to automatically categorize entries, and code that reproducibly generates them.
## Overview
Currently, we generate six ontologies with the following names, intended to classify an entry as follows:
* "operation": Operations that an entry performs (ex: "sequence quality control").
* "topic": Domains or fields of study that relate to the entry (ex: "oncology").
* "input-data": Types of input data that an entry supports (ex: "sequence").
* "output-data": Types of output data that an entry generates (ex: "sequence statistics").
* "input-format": Input file formats that the entry supports (ex: "fastq").
* "output-format": Output file formats that the entry generates (ex: "BAM").
Currently, each of the above ontologies is derived from one of the four main subontologies of the [EDAM ontology](https://edamontology.org).
To convert the EDAM ontology to our six ontologies, we apply the following steps:
1. Download a recent tagged version of the EDAM OWL file (XML).
2. Convert the EDAM file to a simplified JSON representation (see below). We use this simplified format in subsequent steps.
3. Map British spellings to American spellings.
4. Correct misspellings and improve definitions.
5. Produce each of the target ontologies by extracting the appropriate hierarchy from the simplified-and-processed EDAM representation, then adjusting further, as necessary.
We represent processed EDAM and each target ontology in a simplified JSON format, as a list of objects, each of which represents an ontology DAG node. Each node object has the following properties:
* "id": Unique human-readable node ID, consisting solely lowercase alphanumeric characters and dashes, in a form that can be used as the name of a corresponding Dockstore category.
* "label": Short term, similar to a title, that describes what the node represents.
* "definition": More detailed description of the node, up to several sentences long.
* "source": A representaton of the canonical origin of the node. If the node was derived from EDAM, the canonical EDAM URL.
* "categorical": A boolean that indicates whether, during classification, the node should be represented by a Dockstore category.
* "parent_ids": A list of the IDs of this node's parents.
## Build
Run `build.sh` to generate the JSON file corresponding to each ontology.