{"id":21182633,"url":"https://github.com/fandreuz/open-data-repository-server","last_synced_at":"2026-04-17T01:02:50.072Z","repository":{"id":172223540,"uuid":"648382285","full_name":"fandreuz/open-data-repository-server","owner":"fandreuz","description":"Quarkus+MongoDB CRUD application to store heterogeneous datasets with metadata","archived":false,"fork":false,"pushed_at":"2023-06-19T07:54:54.000Z","size":1341,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-14T22:12:40.360Z","etag":null,"topics":["gradle","java","mongodb","quarkus","restful-api"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fandreuz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-06-01T20:59:40.000Z","updated_at":"2024-11-16T22:05:56.000Z","dependencies_parsed_at":"2023-07-15T08:51:28.010Z","dependency_job_id":null,"html_url":"https://github.com/fandreuz/open-data-repository-server","commit_stats":null,"previous_names":["fandreuz/root-data-server","fandreuz/open-data-server"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fandreuz/open-data-repository-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fopen-data-repository-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fopen-data-repository-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fopen-data-repository-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fopen-data-repository-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fandreuz","download_url":"https://codeload.github.com/fandreuz/open-data-repository-server/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fopen-data-repository-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31910584,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T18:22:33.417Z","status":"ssl_error","status_checked_at":"2026-04-16T18:21:47.142Z","response_time":69,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gradle","java","mongodb","quarkus","restful-api"],"created_at":"2024-11-20T17:57:33.314Z","updated_at":"2026-04-17T01:02:50.046Z","avatar_url":"https://github.com/fandreuz.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Open data repository server\n\nRESTful web service for storage and metadata extraction of Open Data repositories.\n\n## Technologies\n\n### Software stack\n\n- Java 11\n- Quarkus\n    - RESTEasy Reactive\n    - Jackson extension\n- Hibernate Validator\n- Jakarta EE CDI\n- MongoDB Java driver\n- Apache Commons CSV\n- Apache Commons Lang3\n- JSoup\n- Slf4j + Logback\n- Swagger (OpenAPI)\n\n### Build tools \u0026 plugins\n\n- Quarkus CLI\n- Gradle (Kotlin DSL)\n- Spotless\n- Lombok\n- Docker\n\n## Abstract code structure\n\nThe following schema summarizes the interconnection of the interfaces defined in the public section of each main\npackage (`model`, `controller`, `database`, `fetch`, `conversion`):\n\n![](https://plantuml-server.kkeisuke.dev/svg/VLJBRi8m4BpdAwpUGzKU4K9KgL2f-b9BUwbwCFOI8eoDx0MfGltthaCCmJ4S4E2P7S_ERZ9oo2rkLYfJC4U6XjcgN22JbGM1bT5PjkPYoKjWmcYqHYcmR9Snvd6kImNidYDtWE_WpCOAI67FW5pIpnPdRfGagORmP0H7Ozat8OmDPiFJyy7rR5WZUPxNty8RgGrEP7qmhnIyy9LN_g5FCBtbggABYTT8J_HwWr-7qm-msqh4LII64CnyEh1t9MWSxqzhxjynbvMHeDAHXBPJM66CbPNc28vW6eDO1cZwkpvDiJXqUqanzDBoTkMvCmAl8eDJoxNZjMHnc6j7r5TwCx9G5GNGLgPjs89rFjXTv3K0nsmrHTG5NgrOW4DxElYBjCuU56wRkf2nHrTtba1wlLuymZcWM4HzXAIZBfginxwYWJfBsmOxZdsUDrtnFN2R0bg6mpXfwNHfv2pB-XjQppxBloLt2v0_akuPneyawxE7wVJjCZb-HaDH5elbzbWKV9xJI76pq_y-cOOKxrkIc5xTivhHFDB4YqktDnnvstTs6CC8jAItw3y0.svg)\n\n## Model objects\n\nThe following models are defined in the package `io.github.fandreuz.root.data.server.models`:\n\n### `CollectionMetadata`\n\nMetadata for a collection contains metadata related to the experiment which generated the collection of datasets. It\ncontains compulsory [DataCite](https://schema.datacite.org/) information and a subset of the recommended ones.\n\n**Example**:\n\n```json\n{\n  \"id\": \"cern-open-data:13128\",\n  \"name\": \"13128\",\n  \"experimentName\": \"OPERA\",\n  \"eventsCount\": 1,\n  \"type\": \"Derived\",\n  \"keyword\": \"\",\n  \"tag\": \"CERN-SPS\",\n  \"citeText\": \"Cite as: OPERA collaboration (2019). OPERA neutrino-induced charmed hadron event 237040910. CERN Open Data Portal. \",\n  \"doi\": \"10.7483/OPENDATA.OPERA.Q74R.SYBQ\",\n  \"license\": \"Creative Commons CC0 waiver\",\n  \"creator\": \"curl/7.76.1\",\n  \"title\": \"OPERA neutrino-induced charmed hadron event 237040910\",\n  \"publisher\": \"OPERA\",\n  \"publicationYear\": 2019,\n  \"language\": \"English\",\n  \"subject\": \"High Energy Physics, Theoretical Physics\",\n  \"description\": \"This OPERA detector event is a muon neutrino interaction with the lead target where a charmed hadron was reconstructed in the final state. The event data consist of Electronic Detector files (such as Drift Tube, RPC, and Target Tracker files) and Emulsion Detector files (such as Tracks and Vertex files). For more information, see the description of the whole dataset.\",\n  \"geoLocation\": \"46.233832398 6.053166454\",\n  \"fundingReference\": \"https://perma.cc/L34T-TCTG\"\n}\n```\n\n### `DatasetMetadata`\n\nDataset metadata contain metadata strictly related to the dataset file, and wrap the metadata of the collection the\ndataset belongs to.\n\n**Example**:\n\n```json\n{\n  \"datasetId\": \"cern-open-data:13128:237040910_EventInfo\",\n  \"fileName\": \"237040910_EventInfo.csv\",\n  \"type\": \"CSV\",\n  \"sizeInBytes\": 52,\n  \"numberOfColumns\": 3,\n  \"commaSeparatedColumnNames\": \"evID,timestamp,muMom\",\n  \"importTimestamp\": 1686615824740,\n  \"collectionMetadata\": {\n    ...\n  }\n}\n```\n\n## Unique identifiers\n\nUniform Resource Name (URN) standard: `schema:namespace:resourceName`\n\n### Dataset ID\n\nA dataset is identified by the schema, the namespace name (i.e. the collection it belongs to) and the file name.\n\n### Example\n\nWe identify the file `experimentData` in the collection `19090` with the following URN:\n\n```\ncern-open-data:19090:experimentData\n```\n\n### Collection ID\n\nA dataset is identified by the schema and its name. The last part of the URN is omitted since it's not needed.\n\n### Example\n\nWe identify the collection `19090` with the following URN:\n\n```\ncern-open-data:19090\n```\n\n### Consequences\n\nWe can infer the collection URN based on the dataset URN by removing the trailing part.\n\n## REST endpoints\n\n### `PUT /v1`\n\nIdempotent creation of a new dataset in the database.\n\nExample request body:\n\n```json\n{\n  \"collectionId\": \"211\",\n  \"fileName\": \"qcd.root\"\n}\n```\n\nThe locator above will trigger the creation of [this](http://opendata.cern.ch/record/211/files/qcd.root) dataset in the\ndatabase. If the operation succeeds the endpoint will return a JSON representation of `DatasetMetadata` representing the\nimported dataset.\n\nSample interaction:\n\n```\ncurl --header \"Content-Type: application/json\" \\\n    --request PUT \\\n    --data '{\"collectionId\":\"13128\",\"fileName\":\"237040910_EventInfo.csv\"}' \\\n    http://localhost:8080/v1\n```\n\n```json\n{\n  \"datasetId\": \"cern-open-data:13128:237040910_EventInfo\",\n  \"fileName\": \"237040910_EventInfo.csv\",\n  \"type\": \"CSV\",\n  \"sizeInBytes\": 52,\n  \"numberOfColumns\": 3,\n  \"commaSeparatedColumnNames\": \"evID,timestamp,muMom\",\n  \"importTimestamp\": 1686615824740,\n  \"collectionMetadata\": {\n    \"id\": \"cern-open-data:13128\",\n    \"name\": \"13128\",\n    \"experimentName\": \"OPERA\",\n    \"eventsCount\": 1,\n    \"type\": \"Derived\",\n    \"keyword\": \"\",\n    \"tag\": \"CERN-SPS\",\n    \"citeText\": \"Cite as: OPERA collaboration (2019). OPERA neutrino-induced charmed hadron event 237040910. CERN Open Data Portal. \",\n    \"doi\": \"10.7483/OPENDATA.OPERA.Q74R.SYBQ\",\n    \"license\": \"Creative Commons CC0 waiver\",\n    \"creator\": \"curl/7.76.1\",\n    \"title\": \"OPERA neutrino-induced charmed hadron event 237040910\",\n    \"publisher\": \"OPERA\",\n    \"publicationYear\": 2019,\n    \"language\": \"English\",\n    \"subject\": \"High Energy Physics, Theoretical Physics\",\n    \"description\": \"This OPERA detector event is a muon neutrino interaction with the lead target where a charmed hadron was reconstructed in the final state. The event data consist of Electronic Detector files (such as Drift Tube, RPC, and Target Tracker files) and Emulsion Detector files (such as Tracks and Vertex files). For more information, see the description of the whole dataset.\",\n    \"geoLocation\": \"46.233832398 6.053166454\",\n    \"fundingReference\": \"https://perma.cc/L34T-TCTG\"\n  }\n}\n```\n\n### `GET /v1/{id}/{column-name}`\n\nGet the content of a column for the given dataset.\n\nSample interaction:\n\n```\ncurl -i --request GET \\\n    http://localhost:8080/v1/cern-open-data:13128:237040910_DTHitsXZ/posX\n```\n\n```json\n{\n  \"647fa76f10c98516828586ae\": \"66.55\",\n  \"647fa76f10c98516828586af\": \"64.45\",\n  \"647fa76f10c98516828586b0\": \"211.49\",\n  ...\n}\n```\n\n### `GET /v1/{id}`\n\nUse request body to query the dataset identified by the given ID, and returns a list of entries satisfying the\ncondition.\n\nSample interaction\n\n```\ncurl -i --request GET \\\n    --header \"Content-Type: application/json\" \\\n    --data '{posX: \"65.15\"}' \\\n    http://localhost:8080/v1/cern-open-data:13128:237040910_DTHitsXZ\n```\n\n```json\n[\n  {\n    \"_id\": \"648dabde466012629fcf0842\",\n    \"driftDist\": \"0.62\",\n    \"posX\": \"65.15\",\n    \"posZ\": \"1061.69\"\n  }\n]\n```\n\n### `GET /v1/metadata/{id}`\n\nThe given `{id}` is used to locate metadata for an imported dataset. If found, the JSON representation of the\nappropriate `DatasetMetadata` is returned.\n\nSample interaction:\n\n```\ncurl -i --request GET \\\n    http://localhost:8080/v1/metadata/cern-open-data:13128:237040910_EventInfo\n```\n\n```json\n{\n  \"datasetId\": \"cern-open-data:13128:237040910_EventInfo\",\n  \"fileName\": \"237040910_EventInfo.csv\",\n  \"type\": \"CSV\",\n  \"sizeInBytes\": 52,\n  \"numberOfColumns\": 3,\n  \"commaSeparatedColumnNames\": \"evID,timestamp,muMom\",\n  \"importTimestamp\": 1686615824740,\n  \"collectionMetadata\": {\n    \"id\": \"cern-open-data:13128\",\n    \"name\": \"13128\",\n    \"experimentName\": \"OPERA\",\n    \"eventsCount\": 1,\n    \"type\": \"Derived\",\n    \"keyword\": \"\",\n    \"tag\": \"CERN-SPS\",\n    \"citeText\": \"Cite as: OPERA collaboration (2019). OPERA neutrino-induced charmed hadron event 237040910. CERN Open Data Portal. \",\n    \"doi\": \"10.7483/OPENDATA.OPERA.Q74R.SYBQ\",\n    \"license\": \"Creative Commons CC0 waiver\",\n    \"creator\": \"curl/7.76.1\",\n    \"title\": \"OPERA neutrino-induced charmed hadron event 237040910\",\n    \"publisher\": \"OPERA\",\n    \"publicationYear\": 2019,\n    \"language\": \"English\",\n    \"subject\": \"High Energy Physics, Theoretical Physics\",\n    \"description\": \"This OPERA detector event is a muon neutrino interaction with the lead target where a charmed hadron was reconstructed in the final state. The event data consist of Electronic Detector files (such as Drift Tube, RPC, and Target Tracker files) and Emulsion Detector files (such as Tracks and Vertex files). For more information, see the description of the whole dataset.\",\n    \"geoLocation\": \"46.233832398 6.053166454\",\n    \"fundingReference\": \"https://perma.cc/L34T-TCTG\"\n  }\n}\n```\n\n### `GET /v1/metadata`\n\nReturn a sorted collection of all metadata objects stored in the database.\n\nSample interaction:\n\n```\ncurl -i --request GET \\\n    http://localhost:8080/v1/metadata\n```\n\nIf a request body is attached, it will be used to query the collection entries, and the result will contain all the\nentries which satisfy the condition.\n\nSample interaction:\n\n```\ncurl -i --request GET \\\n     --header \"Content-Type: application/json\" \\\n     --data '{publicationYear: 2019}' \\\n     http://localhost:8080/v1/metadata\n```\n\n```json\n[\n  {\n    \"id\": \"cern-open-data:13128\",\n    \"name\": \"13128\",\n    \"experimentName\": \"OPERA\",\n    \"eventsCount\": 1,\n    \"type\": \"Derived\",\n    \"keyword\": \"\",\n    \"tag\": \"CERN-SPS\",\n    \"citeText\": \"Cite as: OPERA collaboration (2019). OPERA neutrino-induced charmed hadron event 237040910. CERN Open Data Portal. \",\n    \"doi\": \"10.7483/OPENDATA.OPERA.Q74R.SYBQ\",\n    \"license\": \"Creative Commons CC0 waiver\",\n    \"creator\": \"curl/7.76.1\",\n    \"title\": \"OPERA neutrino-induced charmed hadron event 237040910\",\n    \"publisher\": \"OPERA\",\n    \"publicationYear\": 2019,\n    \"language\": \"English\",\n    \"subject\": \"High Energy Physics, Theoretical Physics\",\n    \"description\": \"This OPERA detector event is a muon neutrino interaction with the lead target where a charmed hadron was reconstructed in the final state. The event data consist of Electronic Detector files (such as Drift Tube, RPC, and Target Tracker files) and Emulsion Detector files (such as Tracks and Vertex files). For more information, see the description of the whole dataset.\",\n    \"geoLocation\": \"46.233832398 6.053166454\",\n    \"fundingReference\": \"https://perma.cc/L34T-TCTG\"\n  }\n]\n```\n\n## TODO\n\n- [x] CRU~~D~~ operations for datasets\n    - [x] Idempotent create (update not needed)\n    - [x] Thread-safe create endpoint\n    - [x] Import CSV datasets\n    - [x] Import ROOT datasets\n- [ ] ~~Make sure columns have the right typing at import-time~~\n- [ ] ~~Endpoints for simple calculations~~ (won't do, data is stored as strings for now)\n- [x] Endpoints for simple querying\n    - [x] Endpoint to get column names\n    - [x] Endpoint to extract column content\n    - [x] Endpoint to extract IDs satisfying a condition\n- [ ] Data lifecycle\n- [x] Docker image\n    - [x] Quarkus native image\n- [ ] ~~Tests~~\n- [x] Document REST endpoints to be more FAIR (`/q/swagger-ui`, parsable version at `q/openapi`)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffandreuz%2Fopen-data-repository-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffandreuz%2Fopen-data-repository-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffandreuz%2Fopen-data-repository-server/lists"}