{"id":21657606,"url":"https://github.com/sourcemeta-research/json-taxonomy","last_synced_at":"2025-04-11T22:33:07.986Z","repository":{"id":44715816,"uuid":"448360717","full_name":"sourcemeta-research/json-taxonomy","owner":"sourcemeta-research","description":"A formal taxonomy to classify JSON documents based on their size, type of content, characteristics of their structure and redundancy criteria.","archived":false,"fork":false,"pushed_at":"2025-01-23T19:28:46.000Z","size":2071,"stargazers_count":6,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-25T18:40:09.424Z","etag":null,"topics":["json","json-document","taxonomic-classification","taxonomy"],"latest_commit_sha":null,"homepage":"https://sourcemeta.github.io/json-taxonomy/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sourcemeta-research.png","metadata":{"files":{"readme":"README.markdown","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"sourcemeta","patreon":"sourcemeta","open_collective":"sourcemeta"}},"created_at":"2022-01-15T18:36:05.000Z","updated_at":"2025-01-23T19:28:51.000Z","dependencies_parsed_at":"2024-05-02T17:00:08.647Z","dependency_job_id":null,"html_url":"https://github.com/sourcemeta-research/json-taxonomy","commit_stats":{"total_commits":16,"total_committers":1,"mean_commits":16.0,"dds":0.0,"last_synced_commit":"80a86eeaec9b14e2d98f0e7dc795c5883e226562"},"previous_names":["sourcemeta/json-taxonomy"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sourcemeta-research%2Fjson-taxonomy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sourcemeta-research%2Fjson-taxonomy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sourcemeta-research%2Fjson-taxonomy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sourcemeta-research%2Fjson-taxonomy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sourcemeta-research","download_url":"https://codeload.github.com/sourcemeta-research/json-taxonomy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248490036,"owners_count":21112679,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json","json-document","taxonomic-classification","taxonomy"],"created_at":"2024-11-25T09:27:14.752Z","updated_at":"2025-04-11T22:33:07.961Z","avatar_url":"https://github.com/sourcemeta-research.png","language":"JavaScript","readme":"Taxonomy for JSON documents\n===========================\n\nThis project presents a formal taxonomy to classify\n[JSON](https://www.json.org) documents based on their size, type of content,\ncharacteristics of their structure and redundancy criteria.\n\n![JSON Taxonomy Online Tool Screenshot](./screenshot.png)\n\nOpen the online demo [here](https://sourcemeta.github.io/json-taxonomy).\n\nWhy is this useful?\n-------------------\n\nSoftware systems make use of JSON to model diverse and domain-specific data\nstructures. Each of these data structures have characteristics that distinguish\nthem from other data structures. For example, a data structure that models a\nperson is fundamentally different from a data structure that models sensor\ndata. These characteristics describe the essence of the data structure.\nTherefore, two instances of the same data structure inherit the same or similar\ncharacteristics despite having different values.\n\nWhile we intuitively know these characteristics exist, we lack a common\nterminology to describe them in unambiguous ways. In an attempt to solve this\nproblem, this taxonomy presents a formal vocabulary to describe, reason and\ntalk about JSON documents in a high-level manner given the characteristics of\nthe data structures they represent.\n\nTaxonomy\n--------\n\n| Size                               | Content | Redundancy    | Structure | Acronym    |\n|------------------------------------|---------|---------------|-----------|------------|\n| Tier 1 Minified \u003c 100 bytes        | Numeric | Redundant     | Flat      | Tier 1 NRF |\n| Tier 1 Minified \u003c 100 bytes        | Numeric | Redundant     | Nested    | Tier 1 NRN |\n| Tier 1 Minified \u003c 100 bytes        | Numeric | Non-Redundant | Flat      | Tier 1 NNF |\n| Tier 1 Minified \u003c 100 bytes        | Numeric | Non-Redundant | Nested    | Tier 1 NNN |\n| Tier 1 Minified \u003c 100 bytes        | Textual | Redundant     | Flat      | Tier 1 TRF |\n| Tier 1 Minified \u003c 100 bytes        | Textual | Redundant     | Nested    | Tier 1 TRN |\n| Tier 1 Minified \u003c 100 bytes        | Textual | Non-Redundant | Flat      | Tier 1 TNF |\n| Tier 1 Minified \u003c 100 bytes        | Textual | Non-Redundant | Nested    | Tier 1 TNN |\n| Tier 1 Minified \u003c 100 bytes        | Boolean | Redundant     | Flat      | Tier 1 BRF |\n| Tier 1 Minified \u003c 100 bytes        | Boolean | Redundant     | Nested    | Tier 1 BRN |\n| Tier 1 Minified \u003c 100 bytes        | Boolean | Non-Redundant | Flat      | Tier 1 BNF |\n| Tier 1 Minified \u003c 100 bytes        | Boolean | Non-Redundant | Nested    | Tier 1 BNN |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Numeric | Redundant     | Flat      | Tier 2 NRF |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Numeric | Redundant     | Nested    | Tier 2 NRN |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Numeric | Non-Redundant | Flat      | Tier 2 NNF |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Numeric | Non-Redundant | Nested    | Tier 2 NNN |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Textual | Redundant     | Flat      | Tier 2 TRF |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Textual | Redundant     | Nested    | Tier 2 TRN |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Textual | Non-Redundant | Flat      | Tier 2 TNF |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Textual | Non-Redundant | Nested    | Tier 2 TNN |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Boolean | Redundant     | Flat      | Tier 2 BRF |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Boolean | Redundant     | Nested    | Tier 2 BRN |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Boolean | Non-Redundant | Flat      | Tier 2 BNF |\n| Tier 2 Minified ≥ 100 \u003c 1000 bytes | Boolean | Non-Redundant | Nested    | Tier 2 BNN |\n| Tier 3 Minified ≥ 1000 bytes       | Numeric | Redundant     | Flat      | Tier 3 NRF |\n| Tier 3 Minified ≥ 1000 bytes       | Numeric | Redundant     | Nested    | Tier 3 NRN |\n| Tier 3 Minified ≥ 1000 bytes       | Numeric | Non-Redundant | Flat      | Tier 3 NNF |\n| Tier 3 Minified ≥ 1000 bytes       | Numeric | Non-Redundant | Nested    | Tier 3 NNN |\n| Tier 3 Minified ≥ 1000 bytes       | Textual | Redundant     | Flat      | Tier 3 TRF |\n| Tier 3 Minified ≥ 1000 bytes       | Textual | Redundant     | Nested    | Tier 3 TRN |\n| Tier 3 Minified ≥ 1000 bytes       | Textual | Non-Redundant | Flat      | Tier 3 TNF |\n| Tier 3 Minified ≥ 1000 bytes       | Textual | Non-Redundant | Nested    | Tier 3 TNN |\n| Tier 3 Minified ≥ 1000 bytes       | Boolean | Redundant     | Flat      | Tier 3 BRF |\n| Tier 3 Minified ≥ 1000 bytes       | Boolean | Redundant     | Nested    | Tier 3 BRN |\n| Tier 3 Minified ≥ 1000 bytes       | Boolean | Non-Redundant | Flat      | Tier 3 BNF |\n| Tier 3 Minified ≥ 1000 bytes       | Boolean | Non-Redundant | Nested    | Tier 3 BNN |\n\nThe taxonomy aims to classify JSON documents into a limited and useful set of\ncategories that is easy to reason about rather than exhaustively considering\nevery possible aspect of a data structure. The taxonomy categorizes JSON\ndocuments according to their size, content, redundancy and nesting\ncharacteristics.\n\n### Size\n\n- **Tier 1**: A JSON document is in this category if its UTF-8 minified form\n  occupies less than 100 bytes.\n\n- **Tier 2**: A JSON document is in this category if its UTF-8 minified form\n  occupies 100 bytes or more, but less than 1000 bytes.\n\n- **Tier 3**: A JSON document is in this category if its UTF-8 minified form\n  occupies 1000 bytes or more.\n\n### Content\n\n- **Textual**: A JSON document is in this category if it has at least one\n  string value and its number of string values multiplied by the cummulative\n  byte-size occupied by its string values is greater than or equal to the\n  boolean and numeric counterparts.\n\n- **Numeric**: A JSON document is in this category if it has at least one\n  number value and its number of number values multiplied by the cummulative\n  byte-size occupied by its number values is greater than or equal to the\n  textual and boolean counterparts.\n\n- **Boolean**: A JSON document is in this category if it has at least one\n  boolean or null value and its number of boolean and null values multiplied by\n  the cummulative byte-size occupied by its boolean and null values is greater\n  than or equal to the textual and numeric counterparts.\n\n- **Structural**: A JSON document is in this category if it does not include\n  any string, boolean, null or number values.\n\nA JSON document can be categorizes as textual, numeric and boolean at the same\ntime.\n\n### Redundancy\n\n- **Non-redundant**: A JSON document is in this category if less than 25%\n  percent of its scalar and composite values are redundant.\n\n- **Redundant**: A JSON document is in this category if at least 25% percent of\n  its scalar and composite values are redundant.\n\n### Nesting\n\n- **Flat**: A JSON document is in this category if the height of the document\n  multiplied by the non-root level with the largest byte-size when taking\n  textual, numeric and boolean values into account is less than 10. If two\n  levels have the byte size, the highest level is taken into account.\n\n- **Nested**: A JSON document is in this category if it is considered\n  *structural* and its height is greater than or equal to 5, or if the height\n  of the document multiplied by the non-root level with the largest byte-size\n  when taking textual, numeric and boolean values into account is greater than\n  or equal to 10. If two levels have the byte size, the highest level is taken\n  into account.\n\nUsage (JavaScript)\n------------------\n\nThis repository publishes an [npm](https://www.npmjs.com) package which can be\ninstalled as follows:\n\n```sh\nnpm install --save @sourcemeta/json-taxonomy\n```\n\nThe module exposes a single function that takes any JSON value and returns the\nsequence of taxonomy qualifiers as an array of strings:\n\n```js\nconst taxonomy = require('@sourcemeta/json-taxonomy')\n\nconst value = {\n  foo: 2\n}\n\nconsole.log(taxonomy(value))\n// [ 'tier 1', 'numeric', 'non-redundant', 'flat' ]\n```\n\nUsage (CLI)\n-----------\n\nThe published [npm](https://www.npmjs.com) package includes a simple\ncommand-line interface program that can be globally installed as follows:\n\n```sh\nnpm install --global @sourcemeta/json-taxonomy\n```\n\nThe CLI program takes the path to a JSON document as an argument and outputs\nthe taxonomy to standard output:\n\n```sh\njson-taxonomy path/to/document.json\n```\n\nLicense\n-------\n\nThis project is released under the terms specified in the\n[license](https://github.com/sourcemeta/json-taxonomy/blob/master/LICENSE).\nThis project extends [previous academic work](https://arxiv.org/abs/2201.03051)\nby the same author at University of Oxford.\n","funding_links":["https://github.com/sponsors/sourcemeta","https://patreon.com/sourcemeta","https://opencollective.com/sourcemeta"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsourcemeta-research%2Fjson-taxonomy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsourcemeta-research%2Fjson-taxonomy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsourcemeta-research%2Fjson-taxonomy/lists"}