https://github.com/flyconnectome/hnf
Documentation for the hierarchical neuron format
https://github.com/flyconnectome/hnf
annotations data dotprops hdf5 mesh neurons skeleton storage
Last synced: 5 months ago
JSON representation
Documentation for the hierarchical neuron format
- Host: GitHub
- URL: https://github.com/flyconnectome/hnf
- Owner: flyconnectome
- License: gpl-3.0
- Created: 2021-02-15T13:35:41.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2021-02-15T13:59:07.000Z (over 5 years ago)
- Last Synced: 2024-01-27T12:09:54.423Z (over 2 years ago)
- Topics: annotations, data, dotprops, hdf5, mesh, neurons, skeleton, storage
- Homepage:
- Size: 16.6 KB
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Hierarchical Neuron Format
The Hierarchical Neuron Format (HNF) is a schema for storing neuron morphologies
and meta data in Hdf5 files.
We provide read/write implementations for R and Python:
- for R see [nat.hdf5](https://github.com/schlegelp/nat.hdf5)
- for Python see [navis](https://navis.readthedocs.io/en/latest/)
## Preamble
There are a few file formats that can store neuron morphology. To name but a few:
- [SWC](http://www.neuronland.org/NLMorphologyConverter/MorphologyFormats/SWC/Spec.html)
for simple skeletons
- [NeuroML](https://en.wikipedia.org/wiki/NeuroML) is an XML-based format
primarily used for modelling but can store compartment models (i.e. skeletons)
of neurons and meta data
- [NWB](https://www.nwb.org) (neurodata without borders) is an HDF5-based format
focused on physiological data
- [NRRD](http://teem.sourceforge.net/nrrd/format.html) files can be used to
store dotprops
_Why then start a new format?_
Because none of the existing formats tick all the boxes! We need a file format
that can hold:
1. thousands of neurons
2. multiple representations (mesh, skeleton, dotprops) of a given neuron
3. annotations (e.g. synapses) associated with each neuron
4. meta data such as names, soma positions, etc.
Enter HDF5: basically a filesystem-in-a-file. The important thing is that we
don't have to worry about how data is en-/decoded because other libraries
(like `h5py` for Python or `hdf5r` for R) take care of that. All we have to
do is come up with a schema.
### Schema
HDF5 knows "groups" (=folders), "datasets" and "attributes". The basic idea for
our schema is this:
- the `root` contains info about the format as attributes
- each group in `root` represents a neuron and the group's name is the neuron's ID
- a neuron group holds and meta data, and the neuron's representations (mesh,
skeleton and/or dotprops) and annotations in separate sub-groups
To illustrate the basic principle:
```
.
├── attrs: format-related meta data
├── group: neuron1
│ ├── attrs: neuron-related meta data
│ ├── group: skeleton
│ | ├── attrs: skeleton-related meta data
| | └── datasets: node table, etc
│ ├── group: dotprops
│ | ├── attrs: dotprops-related meta data
| | └── datasets: points, tangents, alpha, etc
│ ├── group: mesh
│ | ├── attrs: mesh-related meta data
| | └── datasets: vertices, faces, etc
| └── group: annotations
| └── group: e.g. connectors
| ├── attrs: connector-related meta data
| └── datasets: connector data
├── group: neuron2
| ├── ...
...
```
#### Root attributes
The root meta data must contain two attributes:
- `format_spec` specifies format and version
- `format_url` points to a library or format specifications
```
.
├── attr['format_spec']: str = 'hnf_v1'
├── attr['format_url']: str = 'https://github.com/schlegelp/navis'
...
```
#### Neuron base groups
Each neuron group contains properties that apply to all the neuron's potential
representations - for example a `neuron_name`. Note that if an attribute is
defined at the neuron level and again at a deeper level (i.e. the skeleton,
mesh or dotprops), the more proximal attribute takes precedence for a given
representation.
```
.
└── group['123456'] # note that numeric IDs will be "stringified"
├── attr["neuron_name"]: str = "some name"
...
```
#### Skeletons
Attributes:
- `units_nm` (float | int | tuple, optional): specifies the units in
nanometer space - can be a tuple of `(x, y, z)` if units are
non-isotropic
- `soma` (int, optional): the node ID of the soma
Datasets:
- `node_id` (int): IDs for the nodes
- `parent_id` (int): for each node, the ID of it's parent; nodes with
out parents (i.e. roots) have `parent_id` of `-1`
- `x`, `y`, `z` (float | int): node coordinates
- `radius` (float | int, optional): radius for each node
```
└── group['123456']
├── attr['neuron_name'] = "example neuron with a skeleton"
├── attr['units_nm'] = (4, 4, 40)
└── grp['skeleton']
├── attr['soma']: 1
├── ds['node_id']: (N, ) array
├── ds['parent_id']: (N, ) array
├── ds['x']: (N, ) array
├── ds['y']: (N, ) array
├── ds['z']: (N, ) array
└── ds['radius']: (N, ) array, optional
```
#### Meshes
Meshes are principally represented as vertices + triangular faces (`navis`
is using trimesh under the hood).
Attributes:
- `units_nm` (float | int | tuple, optional): specifies the units in
nanometer space - can be a tuple of `(x, y, z)` if units are
non-isotropic
- `soma` (tuple, optional): tuple of `(x, y, z)` coordinates of the soma
Datasets:
- `vertices` (int | float): (N, 3) array of vertex positions
- `faces` (int): (M, 3) array of vertex indices forming the faces (indices start
at 0)
- `skeleton_map` (int, optional): (N, ) array mapping each vertex to a
node ID in the skeleton
```
└── group['4353421']
├── attr['neuron_name'] = "example neuron with a mesh"
├── attr['units_nm'] = (4, 4, 40)
└── grp['mesh']
├── attr['soma']: (1242, 6533, 400)
├── ds['vertices']: (N, 3) array
├── ds['faces']: (M, 3) array
└── ds['skeleton_map']: (N, ) array, optional
```
#### Dotprops
Attributes:
- `k` (int): number of k-nearest neighbours used to calculate the tangent
vectors from the point cloud
- `units_nm` (float | int | tuple, optional): specifies the units in
nanometer space - can be a tuple of `(x, y, z)` if units are
non-isotropic
- `soma` (tuple, optional): tuple of `(x, y, z)` coordinates of the soma
Datasets:
- `points` (int | float): (N, 3) array of x/y/z positions
- `vect` (int | float, optional): (N, 3) array of tangent vectors -
generated if not provided
- `alpha` (int | float, optional): (N, ) array of alpha values for each
point in ``points`` generated if not provided
```
└── group['65432']
├── attr['neuron_name'] = "example neuron with dotprops"
└── grp['dotprops']
├── attr['k'] = 5
├── attr['units_nm'] = (4, 4, 40)
├── attr['soma']: (1242, 6533, 400)
├── ds['points']: (N, 3) array
├── ds['vect']: (N, 3) array
└── ds['alpha']: (N, ) array
```
#### Annotations
Annotations are meant to be flexible and are principally parsed into
pandas DataFrames. Because they won't follow a common format, it is
good practice to leave some (optional) meta data pointing to columns
containing data relevant for e.g. plotting:
Attributes:
- `point_col` (str | list thereof): pointer to the column(s) containing
x/y/z positions
- `type_col` (str): pointer to a column specifying types
- `skeleton_map` (str): pointer to a column associating the row with
a node ID in the skeleton
Let's illustrate this with a mock synapse table:
```
└── group['32434566']
├── attr['neuron_name'] = "example neuron with synapse annotations"
├── attr['units_nm'] = 1
└── grp['annotations']
└── grp['synapses']
├── attr['points']: ['x', 'y', 'z']
├── attr['types']: 'prepost'
├── attr['skeleton_map']: 'node_id'
├── ds['x']: (N, ) array
├── ds['x']: (N, ) array
├── ds['z']: (N, ) array
├── ds['prepost']: (N, ) array of [0, 1, 2, 3, 4]
└── ds['node_id']: (N, )
```
#### "Hidden" attributes & datasets
It can be useful to have attributes and datasets that contain information that's
only pertinent for the reader/writer but does not directly relate to the neuron.
For this, we prefix the attribute/dataset with a `.`:
```
└── group['4353421']
├── attr['neuron_name'] = "example neuron with a mesh"
├── attr['units_nm'] = (4, 4, 40)
├── attr['.hidden_attribute'] = "typically ignored when reading"
└── grp['mesh']
├── attr['soma']: (1242, 6533, 400)
```
We use hidden attributes to e.g. store a serialized version of a neuron instead/
in addition to the raw data to speed up reading the data.
### A final remark
The above schema describes a "minimal" layout - i.e. we expect no less
data than that. However, e.g. the `navis` implementations for reading/writing
the schema are flexible: you can add more attributes or datasets
and `navis` will by default try to read and attach them to the neuron.
### Is this stable?
Ish? The format is versioned and I will maintain readers/writers for
past versions in ``navis``. In other good news: the HDF5 backend is
stable - so even if `navis` acts up when parsing your file, you can
always read it manually using `h5py`.
### Changelog
The current version of the format is 1.0.
Changes:
- 2021/02/01: Version 1.0