{"id":48804909,"url":"https://github.com/r4stin/provenance-aware-metadata","last_synced_at":"2026-04-14T04:32:15.740Z","repository":{"id":316626615,"uuid":"1064060801","full_name":"r4stin/provenance-aware-metadata","owner":"r4stin","description":"Prototype pipeline for provenance-aware metadata: JSON-LD + SHACL validation + C2PA signing","archived":false,"fork":false,"pushed_at":"2025-10-20T18:55:14.000Z","size":2261,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-20T19:25:46.981Z","etag":null,"topics":["c2pa","fastapi","iiif","json-ld","linked-data","metadata","open-data","provenance","python","semantic-web","shacl"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/r4stin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-25T13:39:32.000Z","updated_at":"2025-10-20T18:54:48.000Z","dependencies_parsed_at":"2025-09-25T19:35:32.782Z","dependency_job_id":null,"html_url":"https://github.com/r4stin/provenance-aware-metadata","commit_stats":null,"previous_names":["r4stin/provenance-aware-metadata"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/r4stin/provenance-aware-metadata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r4stin%2Fprovenance-aware-metadata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r4stin%2Fprovenance-aware-metadata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r4stin%2Fprovenance-aware-metadata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r4stin%2Fprovenance-aware-metadata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/r4stin","download_url":"https://codeload.github.com/r4stin/provenance-aware-metadata/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r4stin%2Fprovenance-aware-metadata/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31782736,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T02:24:21.117Z","status":"ssl_error","status_checked_at":"2026-04-14T02:24:20.627Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c2pa","fastapi","iiif","json-ld","linked-data","metadata","open-data","provenance","python","semantic-web","shacl"],"created_at":"2026-04-14T04:32:09.580Z","updated_at":"2026-04-14T04:32:15.729Z","avatar_url":"https://github.com/r4stin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Provenance-Aware Metadata (Phase 3)\n\nThis project demonstrates a general workflow for **provenance-aware metadata** on digital assets:\n- **Model** metadata in **JSON-LD** (Dublin Core, PROV-O, Schema.org)\n- **Validate** with **SHACL** (custom policy rules: EDM, PREMIS)\n- **Sign** assets with **C2PA** (actions + rights)\n- **Serve** via **FastAPI** and a **dynamic IIIF Presentation 3.0** manifest\n\n---\n\n## ✨ What’s new in Phase 3 (v0.3)\n\n- **Fetch from Wikimedia Commons** and auto-populate `metadata/source.yml`  \n  → `python src/cli.py build-from-commons --title \"File:Leibniz_University_Hannover.jpg\"`\n- **Richer standards \u0026 policies**: extended **SHACL** shapes to cover **EDM** and **PREMIS** (e.g., event typing \u0026 dateTime checks)\n- **Verification endpoint**: `/verify` returns **c2patool --detailed** output for the signed image\n- **Publish container to GHCR**: GitHub Actions builds \u0026 pushes on `v*` tags → `ghcr.io/\u003cowner\u003e/\u003crepo\u003e:v0.3`\n- **Tests in CI**: basic pytest to ensure build + validation keep passing\n- **Builder normalizations**: clean creator (no HTML), ISO date, CC license URL trailing `/`, clearer `prov:wasDerivedFrom` (binary)\n\n---\n\n## 🗂️ Repository Structure\n\n```\ndata/                    # Input/output assets\n  image.jpg              # Original image (fetched or provided)\n  image.c2pa.jpg         # Signed image (generated)\n\nmetadata/\n  source.yml             # Source fields (auto from Commons or manual)\n  record.jsonld          # Built JSON-LD (do not hand-edit)\n  shacl.ttl              # SHACL shapes (EDM + PREMIS rules)\n  claim.json             # C2PA claim (actions + rights)\n\nsrc/\n  fetch_commons.py       # Fetch \u0026 map Commons metadata + download image\n  build_metadata.py      # Build JSON-LD from source.yml (normalizes fields)\n  validate_metadata.py   # SHACL validation\n  sign_c2pa.sh           # Signing (env-driven with dev fallback)\n  api.py                 # FastAPI app (record, image, dynamic IIIF, /verify)\n  cli.py                 # CLI: build/validate/sign/serve/info/build-from-commons\n\n.github/\n  workflows/validate.yml       # CI: SHACL + tests on push/PR\n  workflows/docker-publish.yml # CI: publish Docker image to GHCR on tags\n\ntests/\n  test_build_and_validate.py   # Basic build+validate test\n\nMakefile                 # make build | validate | sign | serve | info | all\nrequirements.txt         # Runtime deps (incl. pytest)\nenvironment.yml          # Conda environment (local dev)\nDockerfile               # Containerized API service\n```\n\n---\n\n## 🔧 Setup\n\n### A) Conda (local dev)\n```bash\nconda env create -f environment.yml\nconda activate Provenance-Aware-Metadata\n```\n\n### B) Install `c2patool` (CLI)\nChoose one:\n\n**Prebuilt (recommended)**\n```bash\n# download release for your OS (e.g., v0.9.12), then:\nsudo mv c2patool /usr/local/bin/\nc2patool -V\n```\n\n**Cargo (pin version)**\n```bash\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\nsource \"$HOME/.cargo/env\"\ncargo install --locked c2patool --version 0.9.12\n~/.cargo/bin/c2patool -V\n```\n\n---\n\n## ▶️ Run\n\n**A) Build from Commons (Phase 3)**\n```bash\n# fetch \u0026 normalize source.yml and image.jpg, then build \u0026 validate\npython src/cli.py build-from-commons --title \"File:Leibniz_University_Hannover.jpg\"\n```\nIf your network blocks the image CDN, you can skip the binary fetch:\n```bash\nSKIP_DOWNLOAD=1 python src/cli.py build-from-commons --title \"File:Leibniz_University_Hannover.jpg\"\n# or prefetch with IPv4:\nwget -4 -O data/image.jpg \"https://upload.wikimedia.org/wikipedia/commons/e/ea/Leibniz_University_Hannover.jpg\"\n```\n\n**B) Manual workflow (Phase 2 style)**\n```bash\npython src/cli.py build\npython src/cli.py validate          # expect: Conforms: True\npython src/cli.py sign              # or: bash src/sign_c2pa.sh\npython src/cli.py serve             # open http://127.0.0.1:8000\n```\n\nEndpoints:\n- `/record` → JSON-LD metadata  \n- `/image`  → signed image (falls back to unsigned if missing)  \n- `/iiif/manifest` → dynamic IIIF Presentation 3.0  \n- `/verify` → C2PA verification report (c2patool `--detailed`)\n\n**Makefile shortcuts**\n```bash\nmake build\nmake validate\nmake sign\nmake serve\n# or everything (pipeline except docker):\nmake all\n```\n\n---\n\n## 🐋 Docker\n\n**Build**\n```bash\ndocker build -t provenance-metadata:dev .\n```\n\n**Run**\n```bash\n# image baked in container\ndocker run --rm -p 8000:8000 provenance-metadata:dev\n\n# OR mount host data folder (to use local signed file)\ndocker run --rm -p 8000:8000 -v \"$(pwd)/data:/app/data\" provenance-metadata:dev\n```\n\n**Run from GHCR (after tagging v0.3)**\n```bash\ndocker run --rm -p 8000:8000 ghcr.io/\u003cowner\u003e/\u003crepo\u003e:v0.3\n```\n\n---\n\n## 🚧 CI\n- **Validation workflow**: `.github/workflows/validate.yml` runs SHACL + tests on push/PR to `main`/`dev`.  \n- **Publish to GHCR**: `.github/workflows/docker-publish.yml` builds \u0026 pushes on tags `v*`.\n\nBadge:\n![Validate Metadata](https://github.com/r4stin/provenance-aware-metadata/actions/workflows/validate.yml/badge.svg)\n\n---\n\n## 🚧 Roadmap\n\n- **Phase 1 (done)**  \n  - Manual JSON-LD modeling  \n  - SHACL validation  \n  - C2PA (dev key)\n  - Static IIIF\n\n- **Phase 2 (done)**  \n  - CLI workflow (build/validate/sign/serve/info)  \n  - YAML→JSON-LD\n  - Dynamic IIIF \n  - CI validation\n  - Env-driven signing (fallback)\n  - Dockerized API  \n\n- **Phase 3 (this release)**  \n  - Fetch from Commons\n  - Extended EDM/PREMIS SHACL\n  - `/verify` endpoint \n  - GHCR publish\n  - Tests in CI\n  - Builder normalizations (creator/date/license/provenance)\n\n- **Phase 4 (planned)**  \n  - Integrations (Europeana/Zenodo)\n  - Secure key mgmt (remote signer/KMS)\n  - More datasets \u0026 examples\n  - Packaging for production\n\n---\n\n## 📜 License\n- Code: MIT  \n- Image: *Leibniz University Hannover* (CC BY-SA 3.0), from [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Leibniz_University_Hannover.jpg).\n\n---\n\n## 🔖 Versioning\n- **v0.1** — Manual prototype (metadata, SHACL, C2PA, static IIIF).  \n- **v0.2** — Automation \u0026 containerization (CLI, YAML→JSON-LD, dynamic IIIF, API fallback, CI, Docker, env-driven signing).\n- **v0.3** — Integrations \u0026 policies (Commons fetch, PREMIS/EDM, `/verify`, GHCR, tests)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fr4stin%2Fprovenance-aware-metadata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fr4stin%2Fprovenance-aware-metadata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fr4stin%2Fprovenance-aware-metadata/lists"}