Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/materials-data-science-and-informatics/dirschema
Spec and validator for directories, files and metadata based on JSON Schema and regexes.
https://github.com/materials-data-science-and-informatics/dirschema
json-schema metadata python validation
Last synced: 1 day ago
JSON representation
Spec and validator for directories, files and metadata based on JSON Schema and regexes.
- Host: GitHub
- URL: https://github.com/materials-data-science-and-informatics/dirschema
- Owner: Materials-Data-Science-and-Informatics
- License: mit
- Created: 2021-11-15T08:31:52.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-12-07T09:34:43.000Z (about 1 year ago)
- Last Synced: 2025-01-16T10:58:51.992Z (8 days ago)
- Topics: json-schema, metadata, python, validation
- Language: Python
- Homepage: https://materials-data-science-and-informatics.github.io/dirschema/
- Size: 1.88 MB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
![Project status](https://img.shields.io/badge/project%20status-alpha-%23ff8000)
[
![Docs](https://img.shields.io/badge/read-docs-success)
](https://materials-data-science-and-informatics.github.io/dirschema)
[
![CI](https://img.shields.io/github/actions/workflow/status/Materials-Data-Science-and-Informatics/dirschema/ci.yml?branch=main&label=ci)
](https://github.com/Materials-Data-Science-and-Informatics/dirschema/actions/workflows/ci.yml)
[
![Test Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage_badge.svg)
](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage)
[
![PyPIPkgVersion](https://img.shields.io/pypi/v/dirschema)
](https://pypi.org/project/dirschema/)# dirschema
A directory structure and metadata linter based on JSON Schema.
[JSON Schema](https://json-schema.org/) is great for validating (files containing) JSON
objects that e.g. contain metadata, but these are only the smallest pieces in the
organization of a whole directory structure, e.g. of some dataset of project.
When working on datasets of a certain kind, they might contain various types of data,
each different file requiring different accompanying metadata, based on its file type
and/or location.**DirSchema** combines JSON Schemas and regexes into a solution to enforce structural
dependencies and metadata requirements in directories and directory-like archives.
With it you can for example check that:* only files of a certain type are in a location (e.g. only `jpg` files in directory `img`)
* for each data file there exists a metadata file (e.g. `test.jpg` has `test.jpg_meta.json`)
* each metadata file is valid according to some JSON SchemaIf validating these kinds of constraints looks appealing to you, this tool is for you!
**Dirschema features:**
* Built-in support for schemas and metadata stored as JSON or YAML
* Built-in support for checking contents of ZIP and HDF5 archives
* Extensible validation interface for advanced needs beyond JSON Schema
* Both a Python library and a CLI tool to perform the validation## Installation
```
pip install dirschema
```## Getting Started
The `dirschema` tool needs as input:
* a DirSchema YAML file (containing a specification), and
* a path to a directory or file (e.g. zip file) that should be checked.You can run it like this:
```
dirschema my_dirschema.yaml DIRECTORY_OR_ARCHIVE_PATH
```If the validation was successful, there will be no output.
Otherwise, the tool will output a list of errors (e.g. invalid metadata, missing files, etc.).You can also use `dirschema` from other Python code as a library:
```python
from dirschema.validate import DSValidator
DSValidator("/path/to/dirschema").validate("/dataset/path")
```Similarly, the method will return an error dict, which will be empty if the validation succeeded.
**You can find more information on using and contributing to this repository in the
[documentation](https://materials-data-science-and-informatics.github.io/dirschema/main).**## How to Cite
If you want to cite this project in your scientific work,
please use the [citation file](https://citation-file-format.github.io/)
in the [repository](https://github.com/Materials-Data-Science-and-Informatics/dirschema/blob/main/CITATION.cff).## Acknowledgements
We kindly thank all
[authors and contributors](https://materials-data-science-and-informatics.github.io/dirschema/latest/credits).
This project was developed at the Institute for Materials Data Science and Informatics
(IAS-9) of the Jülich Research Center and funded by the Helmholtz Metadata Collaboration
(HMC), an incubator-platform of the Helmholtz Association within the framework of the
Information and Data Science strategic initiative.