{"id":15102739,"url":"https://github.com/cloudprivacylabs/lsa-selective-disclosure","last_synced_at":"2025-04-05T11:42:38.086Z","repository":{"id":228732998,"uuid":"774638239","full_name":"cloudprivacylabs/lsa-selective-disclosure","owner":"cloudprivacylabs","description":"Demonstration of selective disclosure of a JSON document using layered schema architecture","archived":false,"fork":false,"pushed_at":"2024-03-20T15:51:46.000Z","size":111,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-10T22:45:37.459Z","etag":null,"topics":["json","json-schema","selective-disclosure"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloudprivacylabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-19T22:41:44.000Z","updated_at":"2024-03-20T05:51:12.000Z","dependencies_parsed_at":"2024-09-20T08:00:25.987Z","dependency_job_id":null,"html_url":"https://github.com/cloudprivacylabs/lsa-selective-disclosure","commit_stats":{"total_commits":3,"total_committers":1,"mean_commits":3.0,"dds":0.0,"last_synced_commit":"60ec88fcd1becdf1ee747dbd54069afa534d987b"},"previous_names":["cloudprivacylabs/lsa-selective-disclosure"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudprivacylabs%2Flsa-selective-disclosure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudprivacylabs%2Flsa-selective-disclosure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudprivacylabs%2Flsa-selective-disclosure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudprivacylabs%2Flsa-selective-disclosure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloudprivacylabs","download_url":"https://codeload.github.com/cloudprivacylabs/lsa-selective-disclosure/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247332524,"owners_count":20921852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json","json-schema","selective-disclosure"],"created_at":"2024-09-25T19:05:44.006Z","updated_at":"2025-04-05T11:42:38.049Z","avatar_url":"https://github.com/cloudprivacylabs.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Selective Disclosure using the Layered Schema Architecture\n\nSelective disclosure allows a party to share a limited set of\ninformation with other parties. In personal data exchange scenarios,\nselective disclosure is a privacy mechanism that allows an individual\nto share only the necessary information with others. Layered schema\narchitecture enables selective disclosure by annotating schemas with\nprivacy tags. These privacy tags classify data elements with different\nprivacy levels that the data owner can enable or disable based on the\ndata exchange context.\n\nThe following figure illustrates selective disclosure of a JSON\ndocument to different parties. For each different party or use case,\nan overlay is defined (or auto-generated) that marks certain fields as\n\"sensitive\". This overlay is combined with a schema to form a \"schema\nvariant\", which is a schema adjusted and annotated for a specific use\ncase. \n\nIngesting the JSON document with this schema variant results in a\nlabeled property graph representation of the input with annotations\ncontainins the \"sensitive\" tags for the selected fields. The semantic\npipeline ingests the JSON document with the given schema variant,\nremoves all fields marked as \"sensitive\", and translates the labeled\nproperty graph back into JSON document which is shared with the\nrecipient. This real-time filtering allows decoupling the use-case\nspecific selective disclosure logic from the backend (a database, or\na wallet.)\n\n![Selective Disclosure](selective-disclosure.png)\n\n\nTo demonstrate this operation using the LSA tooling, let's consider a\nsample user profile data structure containing some demographic\ninformation, represented as a JSON schema\n[profile.schema.json](profile.schema.json). This schema contains\nperson's name, address, and phone information.\n\n```\n{\n    \"definitions\": {\n        \"Profile\": {\n            \"type\": \"object\",\n            \"properties\": {\n                \"address\": {\n                    \"$ref\": \"#/definitions/Address\"\n                },\n                \"phone\": {\n                    \"type\": \"array\",\n                    \"items\": {\n                        \"$ref\": \"#/definitions/Phone\"\n                    }\n                },\n                \"firstName\": {\n                    \"type\": \"string\"\n                },\n                ...\n            }\n        },\n        \"Address\": {\n            \"type\": \"object\",\n            \"properties\": {\n                \"street\": {\n                    \"type\": \"string\"\n                },\n                ...\n            }\n        },\n        \"Phone\": {\n            \"type\": \"object\",\n            \"properties\": {\n                \"number\": {\n                    \"type\": \"string\"\n                },\n                \"type\": {\n                    \"type\": \"string\"\n                }\n            }\n        }\n    }\n}\n\n```\n\nThe following is a sample JSON document (given in\n[profile.json](profile.json)):\n\n```\n{\n    \"firstName\": \"john\",\n    \"lastName\": \"doe\",\n    \"address\": {\n        \"street\": \"123 Main St.\",\n        \"city\": \"Anycity\",\n        \"state\": \"CO\",\n        \"postalCode\": \"80000\",\n        \"country\": \"US\"\n    },\n    \"phone\": [\n        {\n            \"type\": \"cell\",\n            \"number\": \"123-123 1234\"\n        }\n    ]\n}\n```\n\nFor this example, we will mark phone number, street address, and last\nname as \"sensitive\" using the following overlay:\n\n```\n{\n    \"definitions\": {\n        \"Profile\": {\n            \"properties\": {\n                \"lastName\": {\n                    \"x-ls\": {\n                        \"privacyLevel\": \"sensitive\"\n                    }\n                }\n            }\n        },\n        \"Address\": {\n            \"properties\": {\n                \"street\": {\n                    \"type\": \"string\",\n                    \"x-ls\": {\n                        \"privacyLevel\": \"sensitive\"\n                    }\n                }\n            }\n        },\n        \"Phone\": {\n            \"properties\": {\n                \"number\": {\n                    \"x-ls\": {\n                        \"privacyLevel\": \"sensitive\"\n                    }\n                }\n            }\n        }\n    }\n}\n```\n\nNote that the overlay matches the JSON schema structure. It adds\n`x-ls/privacyLevel` property to the sensitive fields (`x-ls` is\nrecognized by the LSA tooling.)\n\nNext step is to combine the schema with this overlay to create a\nschema variant. This is done with a schema bundle as below:\n\n```\n# Combine the JSON schema and the overlay\n# The resulting schema has id http://example.org/ProfileSchema\njsonSchemas:\n  - name: profile.schema.json\n    id: http://example.org/ProfileSchema\n    overlays:\n      - profile-sensitive.ovl.json\n      \n# Declare the data type variants based on the schema\nvariants:\n  # The Profile data type is defined at #/definitions/Profile\n  # of the combined JSON schema\n  http://example.org/Profile:\n    jsonSchema:\n      ref: http://example.org/ProfileSchema#/definitions/Profile\n      layerId: http://example.org/Profile\n```\n\nThis schema bundle reads the JSON schema `profile.schema.json`,\ncombines it with the overlay `profile-sensitive.ovl.json`, then\ndefines a data type `http://example.org/Profile` by pointing to the\nlocation in the schema variant where `Profile` object is defined. \n\nTo process data using this schema, we need a pipeline. The below\npipeline first ingests JSON data using the `profile-sensitive`\nbundle. The output of this operation is a [labeled property\ngraph](ingested-graph.svg) containing the `privacyLevel` annotations\ngiven in `profile-sensitive.ovl.json`. This is **self-describing\ndata**, data that contains the schema information together with data\nelements. The `oc` operation runs openCypher expressions on this\ngraph, removing all nodes that are marked with `privacyLevel:\nsensitive`. The final step translates the graph to JSON.\n\n```\n# Ingest a Profile object with the schema using the sensitive overlay\n# The output of this stage is a graph\n- operation: ingest/json\n  params:\n    bundle:\n      - profile-sensitive.bundle.yaml\n    type: http://example.org/Profile\n\n# Remove all graph nodes that are marked sensitive\n- operation: oc\n  params:\n    expr:\n      - match (k {`privacyLevel`:\"sensitive\"}) detach delete k\n    \n# Convert the graph back to JSON\n- operation: export/json\n```\n\nThis pipeline can be run using:\n\n```\nlayers pipeline --file sensitive.pipeline.yaml profile.json \n```\n\n(The `layers` program can be downloaded from https://github.com/cloudprivacylabs/lsa/releases)\n\nThe output is:\n\n```\n{\n  \"firstName\": \"john\",\n  \"address\": {\n    \"city\": \"Anycity\",\n    \"state\": \"CO\",\n    \"postalCode\": \"80000\",\n    \"country\": \"US\"\n  },\n  \"phone\": [\n    {\n      \"type\": \"cell\"\n    }\n  ]\n}\n```\n\nAs you can see, the output does not contain those fields that are\nmarkes as sensitive.\n\nNow we can create a second overlay to add more sensitive fields. The \n[profile-moresensitive.ovl.json](profile-moresensitive.ovl.json) overlay declares `firstName`, `middlename` and `city` fields as sensitive.\n\n```\n{\n    \"definitions\": {\n        \"Profile\": {\n            \"properties\": {\n                \"firstName\": {\n                    \"x-ls\": {\n                        \"privacyLevel\": \"sensitive\"\n                    }\n                },\n                \"middleName\": {\n                    \"x-ls\": {\n                        \"privacyLevel\": \"sensitive\"\n                    }\n                }\n            }\n        },\n        \"Address\": {\n            \"properties\": {\n                \"city\": {\n                    \"type\": \"string\",\n                    \"x-ls\": {\n                        \"privacyLevel\": \"sensitive\"\n                    }\n                }\n            }\n        }\n    }\n}\n```\n\nThen a new schema bundle combines both sensitive data overlays ([profile-moresensitive.bundle.yaml](profile-moresensitive.bundle.yaml)):\n\n```\njsonSchemas:\n  - name: profile.schema.json\n    id: http://example.org/ProfileSchema\n    overlays:\n      - profile-sensitive.ovl.json\n      - profile-moresensitive.ovl.json\nvariants:\n  http://example.org/Profile:\n    jsonSchema:\n      ref: http://example.org/ProfileSchema#/definitions/Profile\n      layerId: http://example.org/Profile\n```\n\nWith a new pipeline using this bundle:\n\n```\n- operation: ingest/json\n  params:\n    bundle:\n      - profile-moresensitive.bundle.yaml\n    type: http://example.org/Profile\n\n- operation: oc\n  params:\n    expr:\n      - match (k {`privacyLevel`:\"sensitive\"}) detach delete k\n    \n- operation: export/json\n\n```\n\nTo get the output, run:\n\n```\nlayers pipeline --file moresensitive.pipeline.yaml  profile.json \n```\n\nWhich gives:\n\n```\n{\n  \"address\": {\n    \"state\": \"CO\",\n    \"postalCode\": \"80000\",\n    \"country\": \"US\"\n  },\n  \"phone\": [\n    {\n      \"type\": \"cell\"\n    }\n  ]\n}\n```\n\nSo we created two overlays, two schema bundles, and two pipelines that\nare applicable to two separate data exchange scenarios.\n\nThis example only illustrates the basics of selective disclosure using\nthe layered schema architecture, which is the foundation for our\nreal-time data filtering engine.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudprivacylabs%2Flsa-selective-disclosure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudprivacylabs%2Flsa-selective-disclosure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudprivacylabs%2Flsa-selective-disclosure/lists"}