{"id":45824209,"url":"https://github.com/ulbmuenster/dataasee","last_synced_at":"2026-02-26T21:36:10.500Z","repository":{"id":256225079,"uuid":"807021854","full_name":"ulbmuenster/dataasee","owner":"ulbmuenster","description":"DatAasee - A Metadata-Lake for Libraries","archived":false,"fork":false,"pushed_at":"2025-11-13T10:59:04.000Z","size":4921,"stargazers_count":22,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-11-13T12:23:08.860Z","etag":null,"topics":["data-catalog","data-engineering","data-lake","data-lakehouse","datacite","library","library-catalogue","marc21","metadata","metadata-catalog","metadata-lake","metadata-management","metadata-mapping","metalake","oai-pmh","xml2json"],"latest_commit_sha":null,"homepage":"https://ulbmuenster.github.io/dataasee/","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ulbmuenster.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-05-28T10:38:52.000Z","updated_at":"2025-11-13T10:54:08.000Z","dependencies_parsed_at":"2024-09-09T18:17:11.762Z","dependency_job_id":"95eb9cf2-9b3c-41fe-b220-6b0f4aa24ea9","html_url":"https://github.com/ulbmuenster/dataasee","commit_stats":null,"previous_names":["ulbmuenster/dataasee"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ulbmuenster/dataasee","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulbmuenster%2Fdataasee","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulbmuenster%2Fdataasee/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulbmuenster%2Fdataasee/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulbmuenster%2Fdataasee/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ulbmuenster","download_url":"https://codeload.github.com/ulbmuenster/dataasee/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulbmuenster%2Fdataasee/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29873316,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T21:05:00.265Z","status":"ssl_error","status_checked_at":"2026-02-26T20:57:13.669Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-catalog","data-engineering","data-lake","data-lakehouse","datacite","library","library-catalogue","marc21","metadata","metadata-catalog","metadata-lake","metadata-management","metadata-mapping","metalake","oai-pmh","xml2json"],"created_at":"2026-02-26T21:36:09.761Z","updated_at":"2026-02-26T21:36:10.483Z","avatar_url":"https://github.com/ulbmuenster.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"![DatAasee Logo](assets/dataasee-logo.png) DatAasee (0.5)\n=========================================================\n\nDatAasee centralizes and interlinks distributed library/research metadata into an API‑first union catalog.\n\n![DatAasee schematic](docs/images/dataasee.gif)\n\n## A Metadata-Lake for Libraries\n\n### Repository: [github.com/ulbmuenster/dataasee](https://github.com/ulbmuenster/dataasee) (nb [sources backup](https://doi.org/10.5281/zenodo.13734194))\n### Maintainer: [Christian Himpe](https://github.com/gramian) (at [University and State Library of Münster](https://github.com/ulbmuenster))\n### Licenses: [MIT](LICENSE) (add. [CC-BY](https://creativecommons.org/licenses/by/4.0/) for [openapi.yaml](api/openapi.yaml))\n### Function: Metadata-Lake, Metadata Catalog, Metadata Aggregator, Union Catalog\n### Audience: University Libraries, Research Libraries, Academic Libraries, Scientific Libraries\n\n## Documentation\n\n* [Dependencies Overview](docs/deps.md)\n* [Software Documentation](docs/docs.md)\n* [Architecture Documentation](docs/arc42.md)\n* [Database Schema](docs/schema.md)\n* [OpenAPI Schema](https://petstore.swagger.io/?url=https://raw.githubusercontent.com/ulbmuenster/dataasee/refs/heads/main/api/openapi.yaml) (Swagger UI)\n* [`DatAasee`: A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake](https://arxiv.org/abs/2409.05512) (Companion Paper, Open Access)\n\n## Getting Started (Deployment)\n\n**Quick Start** (Prepare a dedicated directory, inside run:)\n\n```shell\n$ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.5/compose.yaml\n$ mkdir -p -m 766 backup\n$  DL_PASS=password1 DB_PASS=password2 docker compose up\n```\n\n**Web:** http://localhost:8000 (**API:** http://localhost:8343/api/v1/ )\n\n* Depends on `docker compose` (and compatible to `docker` and `podman`)\n* To deploy, no need to clone, just use the [`compose.yaml`](compose.yaml) file.\n* See the [Deploy Documentation](docs/docs.md#deploy) for details.\n\n## Tech Stack Canvas\n\n* **Setting:** Many distributed data and metadata sources\n* **Goals:**\n    * Centralize metadata\n    * Interlinked metadata catalog\n    * Super-index for bibliographic and research data\n* **Features:**\n    * Interact through HTTP-API (JSON)\n    * Search by filter, full-text, source, doi\n    * Custom query via: `SQL`, `Gremlin`, `Cypher`, `MQL`, `GraphQL`\n* **Frontend:** [Lowdefy](https://www.lowdefy.com) (Optional)\n* **Backend:** [Connect](https://docs.redpanda.com/redpanda-connect/about/) (fmr. Benthos)\n* **Data Storage:** [ArcadeDB](https://arcadedb.com) (Graph Database)\n* **Infrastructure:** [Compose](https://compose-spec.io) (via [Docker](https://www.docker.com) or [Podman](https://podman.io))\n* **Deployment:** via [Harbor](https://harbor.uni-muenster.de) (at Uni Münster)\n* **Monitoring:** Container Logs (local logging driver)\n* **Integrations:**\n    * **Protocols:** `OAI-PMH` (HTTP), `S3` (HTTP), `GET` (HTTP), `DatAasee` (HTTP)\n    * **Encodings:** `XML` (Plain-Text)\n    * **Formats:** `DataCite` (XML), `DC` (XML), `LIDO` (XML), `MARC` (XML), `MODS` (XML)\n* **Exports:** `DataCite` (JSON), `BibJSON` (JSON)\n* **Security:** Privileged endpoints (CQRS)\n* **Testing:** [check-jsonschema](https://check-jsonschema.readthedocs.io/en/stable/)\n* **Development:** [Github](https://github.com/ulbmuenster/dataasee)\n\n## Default Ports\n\n* `8343` DatAasee API\n* `8000` Web Frontend\n* `2480` Database API (Development Container Images Only)\n* `9999` Database JMX (Development Container Images Only)\n\n## API Cheat Sheet\n\n* `GET`  [`api/v1/api`](docs/docs.md#api-endpoint)           Returns API specification and schemas.\n* `GET`  [`api/v1/ready`](docs/docs.md#ready-endpoint)       Returns service readiness.\n* `GET`  [`api/v1/metadata`](docs/docs.md#metadata-endpoint) **Returns queried metadata records.**\n* `GET`  [`api/v1/sources`](docs/docs.md#sources-endpoint)   Returns ingested metadata sources.\n* `GET`  [`api/v1/schema`](docs/docs.md#schema-endpoint)     Returns database schema.\n* `GET`  [`api/v1/enums`](docs/docs.md#enums-endpoint)       Returns enumerated attributes.\n* `GET`  [`api/v1/stats`](docs/docs.md#stats-endpoint)       Returns metadata record statistics.\n* `POST` [`api/v1/backup`](docs/docs.md#backup-endpoint)     Triggers database backup.\n* `POST` [`api/v1/ingest`](docs/docs.md#ingest-endpoint)     Triggers async ingest of metadata.\n* `POST` [`api/v1/insert`](docs/docs.md#insert-endpoint)     Inserts single metadata record.\n* `POST` [`api/v1/health`](docs/docs.md#health-endpoint)     Probes and returns service liveness.\n\n## Repository Contents\n\n* `api/`       API definition and message schemas\n* `assets/`    Logos and style definition\n* `backend/`   Processor pipeline and component definitions\n* `container/` Dockerfiles\n* `database/`  Database initialization, schemas and enumerated data\n* `docs/`      Documentation of software, data and architecture\n* `frontend/`  Prototype frontend definition\n* `tests/`     Test definitions and data\n\n## Getting Started (Development)\n\n* Available `make` targets:\n    * `make setup` Build server images (builds development images)\n    * `make start` Start servers\n    * `make stop`  Stop servers\n    * `make reset` Stop and start servers\n    * `make build` Build release images (pass `REGISTRY=` to set container image registry)\n    * `make empty` Delete database backups\n    * `make logs`  Show logs (requires `grep`)\n    * `make peak`  Report peak database memory usage (requires `grep`)\n    * `make test`  Run tests (requires `check-jsonschema`, `busybox`, `wget`)\n    * `make tidy`  List violations of StrictYAML (requires `yamllint`)\n    * `make todo`  List inline TODOs in repo (requires `grep`)\n* Custom `make` variable: [`COMPOSE`](docs/docs.md#compose-setup) (set Compose implementation)\n\n## Contributors\n\n* [See here](CONTRIBUTORS.md)\n\n## tl;dr\n\n**DatAasee is centralized Metasearch for distributed Metadata.**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulbmuenster%2Fdataasee","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fulbmuenster%2Fdataasee","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulbmuenster%2Fdataasee/lists"}