Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sdsc-ordes/cat-plus-zarr-converters
This repository hosts the CAT+ Zarr converters for different data formats
https://github.com/sdsc-ordes/cat-plus-zarr-converters
rdf rust zarr
Last synced: about 1 month ago
JSON representation
This repository hosts the CAT+ Zarr converters for different data formats
- Host: GitHub
- URL: https://github.com/sdsc-ordes/cat-plus-zarr-converters
- Owner: sdsc-ordes
- License: cc-by-sa-4.0
- Created: 2024-10-10T12:24:11.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-12-20T11:39:19.000Z (about 1 month ago)
- Last Synced: 2024-12-20T12:33:06.284Z (about 1 month ago)
- Topics: rdf, rust, zarr
- Language: Rust
- Homepage: https://swisscatplus.ch
- Size: 174 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Cat+ Zarr Converters
## About
This repository contains all the Zarr converters for the different data types in the Cat+ project (Agilent, UV, IR, etc.)
The data types are all in different formats, their data and metadata colluded together. The goal will be to convert the metadata to [an established ontology](https://github.com/sdsc-ordes/cat-plus-ontology/tree/main), and -as much as data format allow- convert the data in [Zarr array](https://zarr.readthedocs.io/en/stable/index.html).## Tools
### synth-converter
The Synth-converter parses a json input into an rdf graph and serializes the graph to either turtle or jsonld.
It expects the input to conform to the cat+ ontology and the struct `synth-converter/src/batch.rs`. An example input file is provided in `example/1-Synth.json`.#### Usage
The `synth-converter` has three parameters:
- inputfile: path to input file (relative to top level of the repo or absolute)
- outputfile: path to output file (relative to top level of the repo or absolute)
- format: default is "ttl", the other option is jsonldThe `synth-converter` turns the inputfile into a rdf graph and serilizes it to either turtle or jsonld. The serialization is written to an outputfile.
```
just run example/1-Synth.json output.ttl
just run example/1-Synth.json output.json --format jsonld
```### Architecture
The json input is read with `serde_json`: the transformation of fields is described in the struct `synth-converter/src/batch.rs`
The graph is build via `synth-converter/src/graph/graph_builder.rs` and uses `sophia_rs`. Besides `rdf` and `xsd` that have build in namespaces in `sophia_rs`, all namespaces and terms are provided in `synth-converter/src/graph/namespaces` as constants. This makes the code more readable and also ensures that the rdf iris and namespaces are controlled and spelt correctly.
Graph serializers and parsers are provided in `synth-converter/src/rdf`. The turtle serializer there is needed for the test.
The conversion is done in the public crate `synth-converter/src/convert.rs`### Shacl Validation
The rdf graph confirms to the cat+ ontology: https://github.com/sdsc-ordes/cat-plus-ontology. Currently rust offeres no Shacl Validation Library, but once such a library exists, it would make sense to add a Shacl Validation.
TheShacl Validation can be done manually here: https://www.itb.ec.europa.eu/shacl/any/upload
## Installation guidelines
The repo is setup with nix.
```
git clone [email protected]:sdsc-ordes/cat-plus-zarr-converters.git
cd cat-plus-zarr-converters
cargo build
```From here on you can work with a just file:
The rust commands can be started via a justfile:
```
just --list
Available recipes:
build *args # Build the synth-converter.
default # Default recipe to list all recipes.
nix-develop *args # Enter a Nix development shell.
run input_file output_file *args # Run the synth-converter.
test *args # Test the synth-converter.
fmt *arg # Format the synth-converter.
```### Tests
Run the tests with `just test`: only integration tests have been integrated that ensure that the serialized graph in turtle is isomorphic to an expected turtle serialization per valid substructure of the input data: this substructures are action that occur in the synthesis process.
### Contribute
The repo is a Poc under heavy development and not yet ready to take contributions.