{"id":20629731,"url":"https://github.com/agile-lab-dev/data-platform-shaper","last_synced_at":"2026-04-18T08:02:47.523Z","repository":{"id":206393337,"uuid":"699861606","full_name":"agile-lab-dev/data-platform-shaper","owner":"agile-lab-dev","description":null,"archived":false,"fork":false,"pushed_at":"2023-12-19T16:00:45.000Z","size":4351,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2023-12-19T17:21:33.109Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/agile-lab-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-10-03T13:41:57.000Z","updated_at":"2023-12-21T13:07:32.759Z","dependencies_parsed_at":"2023-11-27T17:26:41.461Z","dependency_job_id":"7788f063-4966-4dd8-bdc5-a674c41e8c77","html_url":"https://github.com/agile-lab-dev/data-platform-shaper","commit_stats":null,"previous_names":["agile-lab-dev/data-platform-shaper"],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/agile-lab-dev/data-platform-shaper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agile-lab-dev%2Fdata-platform-shaper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agile-lab-dev%2Fdata-platform-shaper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agile-lab-dev%2Fdata-platform-shaper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agile-lab-dev%2Fdata-platform-shaper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/agile-lab-dev","download_url":"https://codeload.github.com/agile-lab-dev/data-platform-shaper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agile-lab-dev%2Fdata-platform-shaper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31961348,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"online","status_checked_at":"2026-04-18T02:00:07.018Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T14:05:49.072Z","updated_at":"2026-04-18T08:02:47.499Z","avatar_url":"https://github.com/agile-lab-dev.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"![build-badge][]\n\n# Data Platform Shaper: An RDF-based specialized catalog system for defining and managing data platform assets.\n\n## Introduction\n\nThe design and management of modern big data platforms are highly complex. \nIt requires carefully integrating multiple storage and computational platforms and implementing approaches to protect and audit data access. \n\nTherefore, onboarding new data and implementing new data transformation processes is typically time-consuming and expensive. Enterprises often construct their data platforms without distinguishing between logical and technical concerns. \nConsequently, these platforms lack sufficient abstraction and are closely tied to particular technologies, making the adaptation to technological evolution very costly. \n\nA data platform designer should approach the design of a complex data platform using a logical model where he defines his concept of data collection with its metadata attributes, including the relationship with other items. \nAs a more complex example, a logical model of a data mesh is based on assets like the data product and the relationship with its constituent parts like output and input ports.\n\nOnce the logical model has been defined, it's possible to determine the rules for mapping the different kinds of assets into the available technology. From a data platform analytical model to the corresponding physical model, this approach is paramount to making any data platform technologically agnostic and resilient to the evolution of technology.\n\nThis project aims to implement an RDF-based catalog supporting a novel approach to designing data platform models based on a formal ontology that structures various domain components across distinct levels of abstraction. \n\nThis catalog should be the base for defining data platform assets and ontologies capable of describing data collection, data mesh, and data products, i.e., the typical items that compose a modern data platform.\n\n## How to build and run the project\n\nThe project is currently based on the [rdf4j](https://rdf4j.org) library and the [GraphDB](https://graphdb.ontotext.com) knowledge graph from Ontotext.\n\nIt's a Scala 3 project based on the [sbt building](https://www.scala-sbt.org) tool that needs to be previously installed; for running the tests, you need to have a docker daemon up and running, please check that the docker daemon is accessible from a non-root user. The tests are performed using a GraphDB instance running in a container. A docker-compose file is provided for running GraphDB and the microservice together.\n\nYou also need to have [Cue](https://cuelang.org/docs/introduction/installation/) installed.\n\n### Build and test\n\nThis is a Scala-based project; we used [Scala 3](https://www.scala-lang.org) in combination with the [Typelevel libraries](https://typelevel.org), in particular, [Cats Effect](https://typelevel.org/cats-effect/) for managing effect using the tag-less final pattern. \n```\ngit clone https://github.com/agile-lab-dev/data-platform-shaper.git\ncd data-platform-shaper\nsbt compile test\n```\n\n### Build the scoverage-report\n\nThe project supports building an (code-coverage) [scoverage-report][]. To generate one you need to run ...\n\n```\nsbt clean coverage test coverageAggregate coverageReport\n```\n\nAfterwards you can look at the generated report with ...\n\n```\ngoogle-chrome ./target/scala-3.3.3/scoverage-report/index.html\n```\n\n### Build the documentation\n\n```\nsbt paradox previewSite\n```\n\nThen open this [URL](http://localhost:4000/paradox/site/main/).\n\n### Run everything\n\nYou can run GraphDB and the microservice together using a docker-compose file, so first build the image locally:\n\n```\nsbt docker:publishLocal\n```\n\nThen run everything with:\n\n```\ndocker compose up\n```\n\nAfter a while, you can connect to the Swagger UI [here](http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/swagger-ui/index.html).\n\nThen, you can try to create a user-defined type:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity-type' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n  \"name\": \"DataCollectionType\",\n  \"traits\": [],\n  \"schema\": [\n    {\n      \"name\": \"name\",\n      \"typeName\": \"String\",\n      \"mode\": \"Required\"\n    },\n    {\n      \"name\": \"organization\",\n      \"typeName\": \"String\",\n      \"mode\": \"Required\"\n    },\n    {\n      \"name\": \"domain\",\n      \"typeName\": \"String\",\n      \"mode\": \"Required\"\n    }\n  ]\n}'\n```\n\nYou could create a user-defined type posting a YAML file:\n\n```\nname: DataCollectionType\ntraits:\nschema:\n- name: name\n  typeName: String\n  mode: Required\n  attributeTypes: null\n- name: organization\n  typeName: String\n  mode: Required\n  attributeTypes: null\n- name: domain\n  typeName: String\n  mode: Required\n  attributeTypes: null\nfatherName: null\n```\n\nJust put the content in a file entity-type.yaml, and then:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity-type/yaml' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/octet-stream' \\\n  --data-binary '@entity-type.yaml'\n```\n\nAfter creating the user-defined type, it's now possible to create instances of it, either using a REST API or just posting a YAML document:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n  \"entityId\": \"\",\n  \"entityTypeName\": \"DataCollectionType\",\n  \"values\": {\n    \"name\": \"Person\",\n    \"organization\": \"HR\",\n    \"domain\": \"Registrations\"\n  }\n}'\n```\n\nYAML file (put it into a file entity.yaml):\n\n```\nentityId: \"\"\nentityTypeName: DataCollectionType\nvalues:\n  domain: Registrations\n  organization: HR\n  name: Person\n```\n\nAnd submit it:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity/yaml' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/octet-stream' \\\n  --data-binary '@entity.yaml'\n```\n\nCreating an instance returns its ID, so you can use that ID for retrieving it:\n\n```\ncurl -X 'GET' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity/9c2a7c0a-575d-4572-9d33-610d7b4977af' \\\n  -H 'accept: application/json'\n```\n\n### A more complex example\n\nIn this example, we will create two traits and link them to show how the trait relationship is automatically inherited by the instances of types linked to those traits.\n\nLet's create the first trait, representing a data collection: \n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/trait' \\\n  -H 'accept: application/json' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n  \"name\": \"DataCollection\"\n}'\n```\n\nLet's create the second trait representing a table schema:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/trait' \\\n  -H 'accept: application/json' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n  \"name\": \"TableSchema\"\n}'\n```\n\nNow, we link the two traits, representing the fact that a data collection could be associated with a table schema:\n\n```\ncurl -X 'PUT' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/trait/link/DataCollection/hasPart/TableSchema' \\\n  -H 'accept: application/json'\n```\n\nNow, let's create a type DataCollectionType using the trait DataCollection:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity-type' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n  \"name\": \"DataCollectionType\",\n  \"traits\": [\n    \"DataCollection\"\n  ],\n  \"schema\": [\n    {\n      \"name\": \"name\",\n      \"typeName\": \"String\",\n      \"mode\": \"Required\"\n    },\n    {\n      \"name\": \"organization\",\n      \"typeName\": \"String\",\n      \"mode\": \"Required\"\n    },\n    {\n      \"name\": \"domain\",\n      \"typeName\": \"String\",\n      \"mode\": \"Required\"\n    }\n  ]\n}'\n```\n\nAlternatively, you could use the following YAML file:\n\n```\nname: DataCollectionType\ntraits:\n- DataCollection\nschema:\n- name: name\n  typeName: String\n  mode: Required\n- name: organization\n  typeName: String\n  mode: Required\n- name: domain\n  typeName: String\n  mode: Required\n\n```\nAnd the following call:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity-type/yaml' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/octet-stream' \\\n  --data-binary '@data-collection-type.yaml'\n```\n\nNext, let's create a type TableSchemaType using the trait TableSchema:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity-type' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n  \"name\": \"TableSchemaType\",\n  \"traits\": [\n    \"TableSchema\"\n  ],\n  \"schema\": [\n    {\n      \"name\": \"tableName\",\n      \"typeName\": \"String\",\n      \"mode\": \"Required\"\n    },\n    {\n      \"name\": \"columns\",\n      \"typeName\": \"Struct\",\n      \"mode\": \"Repeated\",\n      \"attributeTypes\": [\n            {\n              \"name\": \"columnName\",\n              \"typeName\": \"String\",\n              \"mode\": \"Required\"\n            },\n            {\n              \"name\": \"columnType\",\n              \"typeName\": \"String\",\n              \"mode\": \"Required\"\n            }\n      ]\n    }\n  ]\n}'\n```\n\nOr by using the following YAML file:\n\n```\nname: TableSchemaType\ntraits:\n- TableSchema\nschema:\n- name: tableName\n  typeName: String\n  mode: Required\n- name: columns\n  typeName: Struct\n  mode: Repeated\n  attributeTypes:\n    - name: columnName\n      typeName: String \n      mode: Required\n    - name: columnType\n      typeName: String\n      mode: Required\n```\n\nAnd the corresponding call:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity-type/yaml' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/octet-stream' \\\n  --data-binary '@table-schema-type.yaml'\n```\n\nAt this point, we have two types, DataCollectionType and TableSchemaType, associated with the traits DataCollection and TableSchema. So, since we have this relationship in place, we can associate any instance of DataCollectionType with any instance of TableSchemaType with the relationship we used for linking the two traits.\n\nLet's first create an instance of a DataCollectionType:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n  \"entityId\": \"\",\n  \"entityTypeName\": \"DataCollectionType\",\n  \"values\": {\n     \"name\": \"Person\",\n     \"organization\": \"HR\",\n     \"domain\": \"Registrations\"\n   }\n}\n'\n```\n\nThis call will return the id of the newly created instance, for example: ```29147f92-db7a-41be-abe1-a28f29418ce1```\n\nNext, let's create an instance of a TableSchemaType:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity' \\\n  -H 'accept: application/text' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n  \"entityId\": \"\",\n  \"entityTypeName\": \"TableSchemaType\",\n  \"values\": {\n    \"tableName\": \"Person\",\n    \"columns\": [\n       {\n         \"columnName\": \"firstName\",\n         \"columnType\": \"String\"\n       },\n       {\n         \"columnName\": \"familyName\",\n         \"columnType\": \"String\"\n       },\n       {\n         \"columnName\": \"age\",\n         \"columnType\": \"Int\"\n       }\n    ]\n  }\n}'\n```\n\ncatching the instance id: ```51a2af02-6504-4c12-8cc4-8dd3874af5c4```.\n\nAt this point, it is possible to link the two instances with the same relationship we used for linking the two traits associated with the two instance types:\n\n```\ncurl -X 'POST' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity/link/29147f92-db7a-41be-abe1-a28f29418ce1/hasPart/51a2af02-6504-4c12-8cc4-8dd3874af5c4' \\\n  -H 'accept: application/text'\n```\n\nIt's also possible to search instances by attribute values:\n\n```\ncurl -X 'GET' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity?entityTypeName=DataCollectionType\u0026query=name%20%3D%20%27Person%27' \\\n  -H 'accept: application/json'\n```\nGiven a search string (name = 'Person'), this API returns a list of instance IDs.\nAnother API given a search string returns a list of instances:\n\n```\ncurl -X 'GET' \\\n  'http://127.0.0.1:8093/dataplatform.shaper.uservice/0.0/ontology/entity/id?entityTypeName=DataCollectionType\u0026query=name%20%3D%20%27Person%27' \\\n  -H 'accept: application/json'\n```\n\nThat's all!\n\n**After a run with ```docker compose up``` you should always run ```docker compose rm``` to clean up everything.**\n\n#Attention\n\nIt's still a work in progress; more documentation explaining the overall model and the internal APIs will be written.\nYou can look at the various tests to get a better understanding of its internal working:\n\n* This test suite shows how the system manages tuples, parsing, and unparsing driven by a schema\n```\ndomain/src/test/scala/it/agilelab/dataplatformshaper/domain/DataTypeSpec.scala \n```\n* This test suite shows the inheritance mechanism:\n```\ndomain/src/test/scala/it/agilelab/dataplatformshaper/domain/EntityTypeSpec.scala\n```\n* This test suite shows the user-defined types and their instances creation:\n```\ndomain/src/test/scala/it/agilelab/dataplatformshaper/domain/OntologyL0Spec.scala\n```\n* This test suite shows everything about traits and their usage:\n```\ndomain/src/test/scala/it/agilelab/dataplatformshaper/domain/OntologyL1Spec.scala\n```\n* This test suite shows the REST API usage:\n```\nuservice/src/test/scala/it/agilelab/dataplatformshaper/uservice/api/ApiSpec.scala\n```\n\n#Credits\n\nThis project is the result of a collaborative effort:\n\n| Name                      | Affiliation                                                                           |\n| ------------------------- | --------------------------------------------------------------------------------------|\n| Diego Reforgiato Recupero | Department of Math and Computer Science, University of Cagliari (Italy)               |\n| Francesco Osborne         | KMi, The Open University (UK) and University of Milano-Bicocca (Italy)                |\n| Andrea Giovanni Nuzzolese | Institute of Cognitive Sciences and Technologies National Council of Research (Italy) |\n| Simone Pusceddu.          | Department of Math and Computer Science, University of Cagliari (Italy)               |\n| David Greco               | Big Data Laboratory, AgileLab S.r.L. (Italy)                                          |\n| Nicolò Bidotti            | Big Data Laboratory, AgileLab S.r.L. (Italy)                                          |\n| Paolo Platter             | Big Data Laboratory, AgileLab S.r.L. (Italy)                                          |\n\n[build-badge]: https://github.com/agile-lab-dev/data-platform-shaper/actions/workflows/test.yml/badge.svg\n[scoverage-report]: https://github.com/scoverage\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagile-lab-dev%2Fdata-platform-shaper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagile-lab-dev%2Fdata-platform-shaper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagile-lab-dev%2Fdata-platform-shaper/lists"}