https://github.com/eth-library/data-archive-models
https://github.com/eth-library/data-archive-models
Last synced: 12 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/eth-library/data-archive-models
- Owner: eth-library
- Created: 2025-02-03T21:16:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-10T16:36:25.000Z (about 1 year ago)
- Last Synced: 2025-06-10T18:43:11.775Z (about 1 year ago)
- Language: Nix
- Size: 9.77 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Archive Models
A collection of JSON schemas defining data models for digital archiving systems based on the OAIS (Open Archival Information System) reference model. These schemas provide standardized definitions for various components of a data archive.
## Core Functionality
The project defines JSON schemas for key entities in a digital archiving workflow:
- **Producer**: Entity that provides data to be archived
- **Deposit**: Information about a submission to the archive
- **SIP (Submission Information Package)**: Package of information submitted to the archive
- **IntellectualEntity**: Conceptual object being preserved
- **Representation**: Digital manifestation of an intellectual entity
- **File**: Individual digital file within a representation
- **Fixity**: Integrity information for digital files
All schemas are currently at version 0.1.0, indicating this is an early-stage project.
## Setup/Installation
### Prerequisites
- Java 21
- Maven
- Python 3.12
- Nix with flakes enabled (recommended)
- direnv for environment management (recommended)
### Recommended Nix + Direnv Setup
We recommend using the fully automatic setup method using Nix Flakes and Direnv:
#### Prerequisites
- Nix package manager with flakes enabled
- direnv for environment management
#### Steps
1. Clone the repository
2. Allow direnv in the project directory:
```bash
direnv allow
```
This will automatically:
- Create a Python 3.12 virtual environment in .venv
- Install all dependencies using UV package manager
- Set up the development environment
If you'd like to activate the environment manually without direnv:
```bash
nix develop
```
## Development Workflow
### Python Model Generation
Python models are automatically generated from JSON schemas:
```bash
# Generate Python models from JSON schemas
datamodel-codegen \
--input-file-type jsonschema \
--input schemas/data-archive/ \
--output src/data_archive/ \
--output-model-type pydantic_v2.BaseModel \
--field-constraints \
--use-schema-description
```
The generated models use Pydantic v2 and are stored in the `src/data_archive/` directory.
### Java Class Generation
Java classes are automatically generated from JSON schemas using the jsonschema2pojo Maven plugin:
```xml
org.jsonschema2pojo
jsonschema2pojo-maven-plugin
1.2.1
${project.basedir}/schemas/data-archive
${project.build.directory}/generated-sources/
ch.ethz.library.darc.model
_shared/**
catalog.json
```
The plugin is executed during the Maven `prepare-package` phase and generates Java classes from the JSON schemas in the `schemas/data-archive/` directory. The generated classes are stored in the `target/generated-sources/` directory under the package `ch.ethz.library.darc.model`.
To generate the Java classes, you can use one of the Maven commands listed in the [Java Build Options](#java-build-options) section.
### Dependency Management
The project uses `uv` for Python dependency management:
```bash
# Generate lock file
uv lock
# Install dependencies
uv sync
# Build Python package
uv build
```
### Java Build Options
The project provides several Maven build commands:
1. **Validate JSON schemas only**:
```bash
mvn -Dtest=JsonSchemaValidationTest test
```
2. **Generate Java classes without validation** (skip tests):
```bash
mvn prepare-package -DskipTests
```
3. **Standard build** (validate schemas then generate classes):
```bash
mvn package
```
## Continuous Integration
The project uses GitHub Actions for CI. The workflow automatically:
- Sets up the Nix environment shell
- Implements Maven dependency caching for all jobs
- Runs schema validation tests
- Generates Java classes from JSON schemas
- Generates Python models from JSON schemas
- Publishes Java artifacts to GitHub Packages
- Publishes Python packages to TestPyPI