{"id":24546780,"url":"https://github.com/materials-data-science-and-informatics/dirschema","last_synced_at":"2025-04-15T16:43:16.890Z","repository":{"id":129975909,"uuid":"428181958","full_name":"Materials-Data-Science-and-Informatics/dirschema","owner":"Materials-Data-Science-and-Informatics","description":"Spec and validator for directories, files and metadata based on JSON Schema and regexes.","archived":false,"fork":false,"pushed_at":"2023-12-07T09:34:43.000Z","size":1975,"stargazers_count":8,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T20:58:54.261Z","etag":null,"topics":["json-schema","metadata","python","validation"],"latest_commit_sha":null,"homepage":"https://materials-data-science-and-informatics.github.io/dirschema/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Materials-Data-Science-and-Informatics.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2021-11-15T08:31:52.000Z","updated_at":"2025-03-27T14:55:18.000Z","dependencies_parsed_at":null,"dependency_job_id":"f7cc3c75-d506-4939-8fdc-d3fbb39028a2","html_url":"https://github.com/Materials-Data-Science-and-Informatics/dirschema","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Materials-Data-Science-and-Informatics%2Fdirschema","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Materials-Data-Science-and-Informatics%2Fdirschema/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Materials-Data-Science-and-Informatics%2Fdirschema/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Materials-Data-Science-and-Informatics%2Fdirschema/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Materials-Data-Science-and-Informatics","download_url":"https://codeload.github.com/Materials-Data-Science-and-Informatics/dirschema/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249110659,"owners_count":21214375,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json-schema","metadata","python","validation"],"created_at":"2025-01-22T22:17:09.768Z","updated_at":"2025-04-15T16:43:16.859Z","avatar_url":"https://github.com/Materials-Data-Science-and-Informatics.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Project status](https://img.shields.io/badge/project%20status-alpha-%23ff8000)\n[\n![Docs](https://img.shields.io/badge/read-docs-success)\n](https://materials-data-science-and-informatics.github.io/dirschema)\n[\n![CI](https://img.shields.io/github/actions/workflow/status/Materials-Data-Science-and-Informatics/dirschema/ci.yml?branch=main\u0026label=ci)\n](https://github.com/Materials-Data-Science-and-Informatics/dirschema/actions/workflows/ci.yml)\n[\n![Test Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage_badge.svg)\n](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage)\n[\n![PyPIPkgVersion](https://img.shields.io/pypi/v/dirschema)\n](https://pypi.org/project/dirschema/)\n\n\u003c!-- --8\u003c-- [start:abstract] --\u003e\n# dirschema\n\n\u003cbr /\u003e\n\u003cdiv\u003e\n\u003cimg style=\"center-align: middle;\" alt=\"DirSchema Logo\" src=\"https://raw.githubusercontent.com/Materials-Data-Science-and-Informatics/Logos/main/DirSchema/DirSchema_Logo_Text.png\" width=70% height=70% /\u003e\n\u0026nbsp;\u0026nbsp;\n\u003c/div\u003e\n\u003cbr /\u003e\n\nA directory structure and metadata linter based on JSON Schema.\n\n[JSON Schema](https://json-schema.org/) is great for validating (files containing) JSON\nobjects that e.g. contain metadata, but these are only the smallest pieces in the\norganization of a whole directory structure, e.g. of some dataset of project.\nWhen working on datasets of a certain kind, they might contain various types of data,\neach different file requiring different accompanying metadata, based on its file type\nand/or location.\n\n**DirSchema** combines JSON Schemas and regexes into a solution to enforce structural\ndependencies and metadata requirements in directories and directory-like archives.\nWith it you can for example check that:\n\n* only files of a certain type are in a location (e.g. only `jpg` files in directory `img`)\n* for each data file there exists a metadata file (e.g. `test.jpg` has `test.jpg_meta.json`)\n* each metadata file is valid according to some JSON Schema\n\nIf validating these kinds of constraints looks appealing to you, this tool is for you!\n\n**Dirschema features:**\n\n* Built-in support for schemas and metadata stored as JSON or YAML\n* Built-in support for checking contents of ZIP and HDF5 archives\n* Extensible validation interface for advanced needs beyond JSON Schema\n* Both a Python library and a CLI tool to perform the validation\n\n\u003c!-- --8\u003c-- [end:abstract] --\u003e\n\u003c!-- --8\u003c-- [start:quickstart] --\u003e\n\n## Installation\n\n```\npip install dirschema\n```\n\n## Getting Started\n\nThe `dirschema` tool needs as input:\n\n* a DirSchema YAML file (containing a specification), and\n* a path to a directory or file (e.g. zip file) that should be checked.\n\nYou can run it like this:\n\n```\ndirschema my_dirschema.yaml DIRECTORY_OR_ARCHIVE_PATH\n```\n\nIf the validation was successful, there will be no output.\nOtherwise, the tool will output a list of errors (e.g. invalid metadata, missing files, etc.).\n\nYou can also use `dirschema` from other Python code as a library:\n\n```python\nfrom dirschema.validate import DSValidator\nDSValidator(\"/path/to/dirschema\").validate(\"/dataset/path\")\n```\n\nSimilarly, the method will return an error dict, which will be empty if the validation succeeded.\n\n\u003c!-- --8\u003c-- [end:quickstart] --\u003e\n\n**You can find more information on using and contributing to this repository in the\n[documentation](https://materials-data-science-and-informatics.github.io/dirschema/main).**\n\n\u003c!-- --8\u003c-- [start:citation] --\u003e\n\n## How to Cite\n\nIf you want to cite this project in your scientific work,\nplease use the [citation file](https://citation-file-format.github.io/)\nin the [repository](https://github.com/Materials-Data-Science-and-Informatics/dirschema/blob/main/CITATION.cff).\n\n\u003c!-- --8\u003c-- [end:citation] --\u003e\n\u003c!-- --8\u003c-- [start:acknowledgements] --\u003e\n\n## Acknowledgements\n\nWe kindly thank all\n[authors and contributors](https://materials-data-science-and-informatics.github.io/dirschema/latest/credits).\n\n\u003cdiv\u003e\n\u003cimg style=\"vertical-align: middle;\" alt=\"HMC Logo\" src=\"https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/HMC/HMC_Logo_M.png\" width=50% height=50% /\u003e\n\u0026nbsp;\u0026nbsp;\n\u003cimg style=\"vertical-align: middle;\" alt=\"FZJ Logo\" src=\"https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/FZJ/FZJ.png\" width=30% height=30% /\u003e\n\u003c/div\u003e\n\u003cbr /\u003e\n\nThis project was developed at the Institute for Materials Data Science and Informatics\n(IAS-9) of the Jülich Research Center and funded by the Helmholtz Metadata Collaboration\n(HMC), an incubator-platform of the Helmholtz Association within the framework of the\nInformation and Data Science strategic initiative.\n\n\u003c!-- --8\u003c-- [end:acknowledgements] --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaterials-data-science-and-informatics%2Fdirschema","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaterials-data-science-and-informatics%2Fdirschema","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaterials-data-science-and-informatics%2Fdirschema/lists"}