https://github.com/obophenotype/brain_data_standards_ontologies
A repository for co-ordinating work on ontologies for the Brain Data Standards Project
https://github.com/obophenotype/brain_data_standards_ontologies
anatomy-ontology brain brain-data cell-types metadata neuroanatomy neuron obofoundry ontology
Last synced: about 2 months ago
JSON representation
A repository for co-ordinating work on ontologies for the Brain Data Standards Project
- Host: GitHub
- URL: https://github.com/obophenotype/brain_data_standards_ontologies
- Owner: obophenotype
- License: apache-2.0
- Created: 2020-09-28T18:57:54.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-07-04T20:54:18.000Z (11 months ago)
- Last Synced: 2025-07-04T21:31:05.044Z (11 months ago)
- Topics: anatomy-ontology, brain, brain-data, cell-types, metadata, neuroanatomy, neuron, obofoundry, ontology
- Language: Python
- Homepage:
- Size: 300 MB
- Stars: 11
- Watchers: 7
- Forks: 3
- Open Issues: 32
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Brain Data Standards Ontologies 
A repository for building ontologies for the Brain Data Standards Project.
Status: Draft
### Cite:
[BioRxiv Preprint](https://www.biorxiv.org/content/10.1101/2021.10.10.463703)
### Overview:
The main purpose of this repo is to automate data driven cell-type ontology development for the Brain Data Standards initiative. The main inputs are:
1. Dendrograms (JSON) format, provided by the Allen, encoding data driven classification of brain cell types. These files also include a nomenclature standard (and mapping system) developed by the Allen: https://arxiv.org/abs/2006.05406. See [dendrogram spec](https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/doc/dendrogram_spec.md) for details.
2. CSV files identifying and summarising dendograms - including species & anatomical region
2. CSV mapping files that combine dendogram nodes into groupings tht do not correspond to any single dendrogram node, but do correspond to known cell types.
3. Marker files (robot templates) that map marker combinations with high predictive capacity for dendrogram nodes (generated by NS-Forest) onto those nodes
4. Automatically seeded, manually curated robot templates mapping nodes to classes in CL and to various properties (e.g. soma location)
***Figure 1: Build overview***

The Build system is an extended version of the [Ontology Development Kit](https://github.com/INCATools/ontology-development-kit) - an automated ontology build system using [ROBOT](http://robot.obolibrary.org/) and MakeFiles. As well as managing the build from input files, this also automatically generated modules from referenced ontologies and integrates these into the build.
### Schema
[Schema doc](https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/doc/ontology_schema.md)
### Building
You will need Docker installed. Running a build will pull the required containers with all required dependencies.
To build
```sh
cd src/ontology
sh ./run.sh make prepare_release
```
This dynamically updates imports as well as building reasoned release files. The slowest part of the build is mirroring (downloading and reserialising) external ontologies. If you've run a build recently, mirrored versions will already be stored in the src/ontology/mirror. To run a build without mirroring:
```sh
cd src/ontology
sh ./run.sh make prepare_release MIR=false
```
To extend the ontologies imported from. Edit [bdscratch-odk.yaml](https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/src/ontology/bdscratch-odk.yaml) to add the required ontology to import_group.products, then run:
```
sh ./run.sh make update_repo
```
The update the import statements in src/ontology/bdscratch-edit.owl.
### Extensions to the standard ODK MakeFile build
Extensions to the build are specified (as per ODK standard) in [bdscratch.Makefile](https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/src/ontology/bdscratch.Makefile).
#### Building robot templates from Dendrograms
Dendrograms live in [/src/dendrograms/](https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/src/dendrograms/). They are named according to their Allen Dendrogram ID, e.g. CCN201908210.json
We expect dendrograms to remain stable for relatively long periods of time and at least some generated Robot templates are intended to be manually edited to map to CL classes / property driven classification. For these reasons, we store generated templates on the repo and build them as needed using a separate MakeFile - [src/dendrograms/Makefile](https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/src/dendrograms/Makefile).
To build (be careful you don't wipe out curation!):
```sh
cd src/dendrograms
# Build all
sh ./run.sh make
# Build specific template
sh ./run.sh make
# Build a specific set of tempaltes
sh ./run.sh make JOBS=
```
Tempaltes are build from dendrograms using python scripts in [src/scripts](https://github.com/obophenotype/brain_data_standards_ontologies/tree/master/src/scripts)
Extended information about groupings of taxonomy nodes that are candidates for curation are stored in additional tsv files (accession.tsv)
Support for incorporating this informtion into templates is TBA.
### Robot templates
Robot templates live in [/src/tempaltes/](https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/src/templates/).
filename | e.g. | Description
-- | -- | --
{accession}.tsv | CCN201810310.tsv | Template for generating taxonomy as OWL individuals
{accession}\_class.tsv | CCN201810310_class.tsv | Templates for generating classes corresponding to OWL individuals in taxonomy. Includes slots for curating cell type & properties
{accession}\_markers.tsv | CCN201810310_markers.tsv | Templates for adding markers. Referenced markers must be present in gene reference files.
ensmusg.tsv | {ensembl_gene_file}.tsv | Robot template listing all genes (all possible markers) for analysis/dendrogams of some specific species.
ensembl_gene_file name follows standard ensembl ID prefixes but in lowercase e.g. ensmusg.tsv (ensembl mouse gene) has genes with IDs of the form: ENSMUSG{numeric_accession}
#### Markers
Markers are referenced by enembl ID using an [identifiers.org URL scheme](https://registry.identifiers.org/registry/ensembl)
ensembl gene file templates are used to generate mirror files, which act as source files for import generation, so that only referenced markers end up in the release files.
### Reference Gene Files
GTF files used as reference for BDSO can be found in this [google drive folder](https://drive.google.com/drive/folders/1rOYwiIxGgEolWsO3a-7g6rxUefsXIcKB)