{"id":29005457,"url":"https://github.com/eth-library/data-archive-models","last_synced_at":"2025-06-25T11:33:19.668Z","repository":{"id":298354921,"uuid":"926786071","full_name":"eth-library/data-archive-models","owner":"eth-library","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-10T16:36:25.000Z","size":10,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-10T18:43:11.775Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Nix","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eth-library.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-03T21:16:05.000Z","updated_at":"2025-06-10T16:41:03.000Z","dependencies_parsed_at":"2025-06-10T18:53:54.202Z","dependency_job_id":null,"html_url":"https://github.com/eth-library/data-archive-models","commit_stats":null,"previous_names":["eth-library/data-archive-models"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eth-library/data-archive-models","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-library%2Fdata-archive-models","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-library%2Fdata-archive-models/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-library%2Fdata-archive-models/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-library%2Fdata-archive-models/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eth-library","download_url":"https://codeload.github.com/eth-library/data-archive-models/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-library%2Fdata-archive-models/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261863531,"owners_count":23221567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-25T11:32:04.295Z","updated_at":"2025-06-25T11:33:19.636Z","avatar_url":"https://github.com/eth-library.png","language":"Nix","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Archive Models\n\nA collection of JSON schemas defining data models for digital archiving systems based on the OAIS (Open Archival Information System) reference model. These schemas provide standardized definitions for various components of a data archive.\n\n## Core Functionality\n\nThe project defines JSON schemas for key entities in a digital archiving workflow:\n\n- **Producer**: Entity that provides data to be archived\n- **Deposit**: Information about a submission to the archive\n- **SIP (Submission Information Package)**: Package of information submitted to the archive\n- **IntellectualEntity**: Conceptual object being preserved\n- **Representation**: Digital manifestation of an intellectual entity\n- **File**: Individual digital file within a representation\n- **Fixity**: Integrity information for digital files\n\nAll schemas are currently at version 0.1.0, indicating this is an early-stage project.\n\n## Setup/Installation\n\n### Prerequisites\n\n- Java 21\n- Maven\n- Python 3.12\n- Nix with flakes enabled (recommended)\n- direnv for environment management (recommended)\n\n### Recommended Nix + Direnv Setup\n\nWe recommend using the fully automatic setup method using Nix Flakes and Direnv:\n\n#### Prerequisites\n- Nix package manager with flakes enabled\n- direnv for environment management\n\n#### Steps\n1. Clone the repository\n2. Allow direnv in the project directory:\n   ```bash\n   direnv allow\n   ```\n\nThis will automatically:\n- Create a Python 3.12 virtual environment in .venv\n- Install all dependencies using UV package manager\n- Set up the development environment\n\nIf you'd like to activate the environment manually without direnv:\n```bash\nnix develop\n```\n\n## Development Workflow\n\n### Python Model Generation\n\nPython models are automatically generated from JSON schemas:\n\n```bash\n# Generate Python models from JSON schemas\ndatamodel-codegen \\\n  --input-file-type jsonschema \\\n  --input schemas/data-archive/ \\\n  --output src/data_archive/ \\\n  --output-model-type pydantic_v2.BaseModel \\\n  --field-constraints \\\n  --use-schema-description\n```\n\nThe generated models use Pydantic v2 and are stored in the `src/data_archive/` directory.\n\n### Java Class Generation\n\nJava classes are automatically generated from JSON schemas using the jsonschema2pojo Maven plugin:\n\n```xml\n\u003c!-- Plugin configuration in pom.xml --\u003e\n\u003cplugin\u003e\n    \u003cgroupId\u003eorg.jsonschema2pojo\u003c/groupId\u003e\n    \u003cartifactId\u003ejsonschema2pojo-maven-plugin\u003c/artifactId\u003e\n    \u003cversion\u003e1.2.1\u003c/version\u003e\n    \u003cconfiguration\u003e\n        \u003csourceDirectory\u003e${project.basedir}/schemas/data-archive\u003c/sourceDirectory\u003e\n        \u003coutputDirectory\u003e${project.build.directory}/generated-sources/\u003c/outputDirectory\u003e\n        \u003ctargetPackage\u003ech.ethz.library.darc.model\u003c/targetPackage\u003e\n        \u003cexcludes\u003e\n            \u003cexclude\u003e_shared/**\u003c/exclude\u003e\n            \u003cexclude\u003ecatalog.json\u003c/exclude\u003e\n        \u003c/excludes\u003e\n    \u003c/configuration\u003e\n\u003c/plugin\u003e\n```\n\nThe plugin is executed during the Maven `prepare-package` phase and generates Java classes from the JSON schemas in the `schemas/data-archive/` directory. The generated classes are stored in the `target/generated-sources/` directory under the package `ch.ethz.library.darc.model`.\n\nTo generate the Java classes, you can use one of the Maven commands listed in the [Java Build Options](#java-build-options) section.\n\n### Dependency Management\n\nThe project uses `uv` for Python dependency management:\n\n```bash\n# Generate lock file\nuv lock\n\n# Install dependencies\nuv sync\n\n# Build Python package\nuv build\n```\n\n### Java Build Options\n\nThe project provides several Maven build commands:\n\n1. **Validate JSON schemas only**:\n   ```bash\n   mvn -Dtest=JsonSchemaValidationTest test\n   ```\n\n2. **Generate Java classes without validation** (skip tests):\n   ```bash\n   mvn prepare-package -DskipTests\n   ```\n\n3. **Standard build** (validate schemas then generate classes):\n   ```bash\n   mvn package\n   ```\n\n## Continuous Integration\n\nThe project uses GitHub Actions for CI. The workflow automatically:\n- Sets up the Nix environment shell\n- Implements Maven dependency caching for all jobs\n- Runs schema validation tests\n- Generates Java classes from JSON schemas\n- Generates Python models from JSON schemas\n- Publishes Java artifacts to GitHub Packages\n- Publishes Python packages to TestPyPI","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feth-library%2Fdata-archive-models","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feth-library%2Fdata-archive-models","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feth-library%2Fdata-archive-models/lists"}