https://github.com/plantinformatics/pretzel-data
Data structure for https://github.com/plantinformatics/pretzel
https://github.com/plantinformatics/pretzel-data
Last synced: 4 months ago
JSON representation
Data structure for https://github.com/plantinformatics/pretzel
- Host: GitHub
- URL: https://github.com/plantinformatics/pretzel-data
- Owner: plantinformatics
- License: gpl-3.0
- Created: 2018-02-23T05:32:54.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2025-02-13T11:49:55.000Z (4 months ago)
- Last Synced: 2025-02-13T12:35:08.048Z (4 months ago)
- Homepage:
- Size: 3.13 MB
- Stars: 0
- Watchers: 5
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pretzel-data
Data structure resources for plantinformatics/pretzel.
## High level structure
Data is organised into *datasets* which contain *blocks*. Think of a *dataset* as a container for a
collection of data which can be naturally organised into subsets which form *blocks*. Blocks must
have *scope* which differentiates them from other blocks.For example, we can define a simple physical genome as follows:
```
{
"name": "myGenome",
"meta": {
"year": "2018"
},
"blocks": [
{
"scope": "1A",
"featureType": "linear",
"range": [
1,
500000000
]
},
{
"scope": "1B",
"featureType": "linear",
"range": [
1,
450000000
]
}
]
}
````myGenome` has two chromosomes (which correspond to *blocks* here), `1A` and `1B`. `featureType`
indicates the type of feature contained in the block. Presently, all data is `linear`, which defines
a range and features as positions or sub-ranges within the range. The plan is for future data such
as genotypes to be of `observational` type.The `meta` field contains a set of arbitrary key-value pairs. It can be empty. It can be used to
record any associated metadata, such as publication DOI, year, source, details of the organism,
variety, etc.The above dataset defines a physical genome with two chromosomes of given sizes. This is like
defining the "reference" in a genome viewer tool. Next, we want to define an annotation inside this
space.## Defining features inside a dataset
We define features, such as genes, inside another dataset by using the field `parent`:
```
{
"name": "myAnnotation",
"parent": "myGenome",
"namespace" : "myGenome:myAnnotation",
"blocks": [
{
"scope": "1A",
"featureType": "linear",
"features": [
{
"name": "my1AGene1",
"range": [
3000,
5150
]
},
...
```By specifying the `parent` as `myGenome` (defined previously), we are indicating that the `scope` we
reference in the following dataset is refering to the parent blocks. Here we have defined a gene
`my1AGene1` spanning positions 3000 to 5150 in chromosome (`scope`) `1A`. Negative orientation of a
gene can be defined by having the second value in the range smaller than the first.