{"id":22302607,"url":"https://github.com/dataoneorg/sonormal","last_synced_at":"2025-03-26T00:27:26.836Z","repository":{"id":83563521,"uuid":"313651340","full_name":"DataONEorg/sonormal","owner":"DataONEorg","description":"Schema.org extraction and normalization web service","archived":false,"fork":false,"pushed_at":"2023-09-28T20:56:02.000Z","size":760,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-01-30T21:17:10.872Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataONEorg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-17T14:51:54.000Z","updated_at":"2023-03-28T02:36:24.000Z","dependencies_parsed_at":"2024-12-03T18:41:24.936Z","dependency_job_id":"e2e31ed0-d3f6-4865-aff7-bd0b4deabbb0","html_url":"https://github.com/DataONEorg/sonormal","commit_stats":null,"previous_names":["dataoneorg/sonormal"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataONEorg%2Fsonormal","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataONEorg%2Fsonormal/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataONEorg%2Fsonormal/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataONEorg%2Fsonormal/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataONEorg","download_url":"https://codeload.github.com/DataONEorg/sonormal/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245564771,"owners_count":20636175,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-03T18:40:00.817Z","updated_at":"2025-03-26T00:27:26.801Z","avatar_url":"https://github.com/DataONEorg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sonormal\n\n`sonormal` is a python library to assist with extraction and processing of schema.org content with emphasis on the [`Dataset`](https://schema.org/Dataset) class.\n\nIncluded is a command line tool `jld` for retrieving and extracting JSON-LD from a web page or other resource and performing various operations on JSON-LD.\n\nThis library and tool is focussed on supporting Schema.org harvesting for the DataONE infrastructure.\n\n## Operation\n\n```\nUsage: jld [OPTIONS] COMMAND [ARGS]...\n\n  Retrieve and process JSON-LD.\n\nOptions:\n  -b, --base TEXT             Base URI\n  -p, --profile TEXT          JSON-LD Profile\n  -P, --request-profile TEXT  JSON-LD Request Profile\n  -r, --response              Show response information\n  -R, --relaxed-json          Relax strict JSON deserialization\n  -W, --webpage               Render SPA page\n  --soprod                    Use schema.org production context instead of v12 https\n  --help                      Show this message and exit.\n\nCommands:\n  cache        Cache management, list or purge\n  canon        Normalize and render canonical form\n  compact      Compact the JSON-LD SOURCE\n  frame        Apply frame to source\n  get          Retrieve JSON-LD\n  identifiers  Extract Dataset identifiers\n  nquads       Transform JSON-LD to N-Quads\n  play         Load in JSON-LD Playground\n```\n\n`cache` lists entries in the local cache (in folder `~/.local/sonormal/cache`) and optionally purges entries.\n\n`canon` canonicalizes the source JSON-LD by expanding and applying the URDNA 2015 algorithm, then serializes with ordered terms, no new lines, and no spaces between delimiters. Checksums computed on the result are consistent between various arrangements of the same input source.\n\n`compact` applies the JSON-LD compaction algorithm to the source using the context:\n```\n{\"@context\": [\n    \"https://schema.org/\", \n    { \n      \"id\": \"id\", \n      \"type\": \"type\" \n    }\n  ]\n}\n```\n\n`frame` applies the JSON-LD framing algorithm to structure the JSON-LD for ease of identifier extraction from a `Dataset` instance using the frame:\n```\n{\n    \"@context\": {\"@vocab\":\"https://schema.org/\"},\n    \"@type\": \"Dataset\",\n    \"identifier\": {},\n    \"creator\": {}\n}\n```\n\n`get` retrieves the document from a file or URL, following redirects and Link headers as appropriate. Content is extracted from HTML pages, and optionally (with the `-W` flag set) from single page applications where the JSON-LD may be generated on the fly.\n\n`identifiers` extracts `Dataset` identifier values and computes checksums of the JSON-LD.\n\n`nquads` serializes the JSON-LD to N-Quads format.\n\n## Examples\n\nDownload and extract JSON-LD from [Hydroshare](https://www.hydroshare.org/):\n\n```\njld get \"https://www.hydroshare.org/resource/058d173af80a4784b471d29aa9ad7257/\"\n{\n  \"@context\": {\n    \"@vocab\": \"https://schema.org/\",\n    \"datacite\": \"http://purl.org/spar/datacite/\"\n  },\n  \"@id\": \"https://www.hydroshare.org/resource/058d173af80a4784b471d29aa9ad7257\",\n  \"url\": \"https://www.hydroshare.org/resource/058d173af80a4784b471d29aa9ad7257\",\n  \"@type\": \"Dataset\",\n  \"additionalType\": \"http://www.hydroshare.org/terms/CompositeResource\",\n...\n```\n\nDownload and extract JSON-LD from a DataONE single page application (with JSON-LD rendered by the client):\n\n```\njld -W get \"https://search.dataone.org/view/urn%3Auuid%3Add9ad874-ded8-48fe-908a-06732b9a6297\"\n[\n  {\n    \"@context\": {\n      \"@vocab\": \"https://schema.org/\"\n    },\n    \"@type\": \"Dataset\",\n    \"@id\": \"https://dataone.org/datasets/urn%3Auuid%3Add9ad874-ded8-48fe-908a-06732b9a6297\",\n    \"datePublished\": \"2013-10-23T00:00:00Z\",\n    \"publisher\": {\n      \"@type\": \"Organization\",\n      \"name\": \"California Ocean Protection Council Data Repository\"\n    },\n    \"identifier\": \"urn:uuid:dd9ad874-ded8-48fe-908a-06732b9a6297\",\n...\n```\n\nProcessing operations can take stdin as input. For example, normalize JSON-LD using the URDNA 2015 algorithm for assigning ids to blank nodes. Note the source is expanded and canonicalized, output is serialized with no new lines and no spaces between delimiters in preparation for calculating checksums. \n\n```\njld get \"https://www.hydroshare.org/resource/058d173af80a4784b471d29aa9ad7257/\" | jld canon\n\n[{\"@id\":\"_:c14n0\",\"@type\":[\"http://purl.org/spar/datacite/ResourceIdentifier\",\"https://schema.org/PropertyValue\"],\n\"http://purl.org/spar/datacite/usesIdentifierScheme\":[{\"@id\":\"http://purl.org/spar/datacite/\nlocal-resource-identifier-scheme\"}],\"https://schema.org/propertyId\":[{\"@value\":\"UUID\"}],\"https://schema.org/value\":\n[{\"@value\":\"uuid:058d173af80a4784b471d29aa9ad7257\"}]},{\"@id\":\"_:c14n1\",\"@type\":[\"https://schema.org/Place\"],\n...\n```\n\nExtract identifiers and compute checksums:\n\n```\njld get \"https://www.hydroshare.org/resource/058d173af80a4784b471d29aa9ad7257/\" | jld identifiers -c\n[\n  {\n    \"@id\": [\n      \"https://www.hydroshare.org/resource/058d173af80a4784b471d29aa9ad7257\"\n    ],\n    \"url\": [\n      \"https://www.hydroshare.org/resource/058d173af80a4784b471d29aa9ad7257\"\n    ],\n    \"identifier\": [\n      \"uuid:058d173af80a4784b471d29aa9ad7257\"\n    ],\n    \"hashes\": {\n      \"sha256\": \"a8cb4e5806045032fc2e7ad0b762336ff76f3792271ddc071c0d8c85d6b69ac5\",\n      \"sha1\": \"f6abef03156a5adb6d395f385628a2894e7b920e\",\n      \"md5\": \"03a357ba8043ac734aa3b9e9bb514ff9\"\n    }\n  }\n]\n```\n\nOpen the canonical form of the BCO-DMO dataset `https://www.bco-dmo.org/dataset/839373` in [JSON-LD Playground](https://json-ld.org/playground/):\n\n```\njld get \"https://www.bco-dmo.org/dataset/839373\" | jld canon | jld play -B\nNew public gist created at: \n  https://gist.github.com/datadavev/4f3cad1a104263bcf1c1bb96723911fc\nLink to JSON-LD playground:\n  https://json-ld.org/playground/#startTab=tab-expanded\u0026json-ld=https%3A%2F%2Fgist.githubusercontent.com%2Fdatadavev%2F4f3cad1a104263bcf1c1bb96723911fc%2Fraw\n```\n\n\n## Installation\n\nInstall using [`poetry`](https://python-poetry.org/). For example:\n\n```\ngit clone https://github.com/datadavev/sonormal.git\ncd sonormal\npoetry install\n```\nThen run using:\n```\npoetry run jld\n```\n\nAlternatively, install into a separately created virtual environment:\n```\npoetry install\n```\nThen run like:\n```\njld\n```\n\nNote that the `play` command for uploading to the [JSON-LD Playground](https://json-ld.org/playground/) requires that the GitHub [command line tool `gh`](https://github.com/cli/cli) is available on the path, and that you have authenticated the tool.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdataoneorg%2Fsonormal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdataoneorg%2Fsonormal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdataoneorg%2Fsonormal/lists"}