{"id":16330982,"url":"https://github.com/justsml/schema-analyzer","last_synced_at":"2025-07-26T07:10:09.919Z","repository":{"id":40783781,"uuid":"236819276","full_name":"justsml/schema-analyzer","owner":"justsml","description":"Heuristic JSON \"column\" type \u0026 size analysis w/ enumeration detection.","archived":false,"fork":false,"pushed_at":"2023-01-05T05:56:52.000Z","size":5083,"stargazers_count":5,"open_issues_count":18,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-09T01:40:43.830Z","etag":null,"topics":["analysis","data-analysis","data-processing","data-visualization","detected-types","detection","enum-generator","enumeration-detection","hacktoberfest","heuristic","identify-fields","schema","schema-analyzer","type-definitions","type-detection"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/justsml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-28T19:17:14.000Z","updated_at":"2024-07-03T12:26:50.000Z","dependencies_parsed_at":"2023-02-03T14:32:01.502Z","dependency_job_id":null,"html_url":"https://github.com/justsml/schema-analyzer","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/justsml/schema-analyzer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justsml%2Fschema-analyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justsml%2Fschema-analyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justsml%2Fschema-analyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justsml%2Fschema-analyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/justsml","download_url":"https://codeload.github.com/justsml/schema-analyzer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justsml%2Fschema-analyzer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267133958,"owners_count":24040789,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-26T02:00:08.937Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","data-analysis","data-processing","data-visualization","detected-types","detection","enum-generator","enumeration-detection","hacktoberfest","heuristic","identify-fields","schema","schema-analyzer","type-definitions","type-detection"],"created_at":"2024-10-10T23:25:09.241Z","updated_at":"2025-07-26T07:10:09.820Z","avatar_url":"https://github.com/justsml.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/justsml/schema-analyzer.svg?branch=master)](https://travis-ci.org/justsml/schema-analyzer)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/b2c9bcb087db456a991655c3e87126a7)](https://www.codacy.com/manual/justsml/schema-analyzer?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=justsml/schema-analyzer\u0026amp;utm_campaign=Badge_Grade)\n[![GitHub package version](https://img.shields.io/github/package-json/v/justsml/schema-analyzer.svg?style=flat)](https://github.com/justsml/schema-analyzer)\n[![GitHub stars](https://img.shields.io/github/stars/justsml/schema-analyzer.svg?label=Stars\u0026style=flat)](https://github.com/justsml/schema-analyzer)\n[![Node.js CI](https://github.com/justsml/schema-analyzer/workflows/Node.js%20CI/badge.svg)](https://github.com/justsml/schema-analyzer/actions)\n![npm bundle size](https://img.shields.io/bundlephobia/minzip/schema-analyzer?color=green)\n![npm downloads](https://img.shields.io/npm/dm/schema-analyzer?color=yellow\u0026label=npm%20downloads\u0026logo=npm)\n[![codecov](https://codecov.io/gh/justsml/schema-analyzer/branch/master/graph/badge.svg)](https://codecov.io/gh/justsml/schema-analyzer)\n\u003c!-- ![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/justsml/schema-analyzer) --\u003e\n\u003c!-- ![GitHub All Releases](https://img.shields.io/github/downloads/justsml/schema-analyzer/total?color=cyan\u0026label=github%20release%20downloads) --\u003e\n\n# Schema Analyzer\n\n\u003e An Open Source joint by [Dan Levy](https://danlevy.net/) ✨\n\n## Analyze column type \u0026 size summary from any input JSON array\n\nSchema **Analyzer** is the core library behind Dan's [Schema **Generator**](https://github.com/justsml/schema-generator).\n\n### Features\n\nThe primary goal is to support any input JSON/CSV and infer as much as possible. More data will generally yield better results.\n\n- [x] Heuristic type analysis for arrays of objects.\n- [x] Browser-based (local, no server necessary)\n- [x] Automatic type detection:\n    - [x] ID - Identifier column, by name and unique Integer check (detects BigInteger)\n    - [x] ObjectId (MongoDB's 96 bit/12 Byte ID. 32bit timestamp + 24bit MachineID + 16bit ProcessID + 24bit Counter)\n    - [x] UUID/GUID (Common 128 bit/16 Byte ID. Stored as a hex string, dash delimited in parts: 8, 4, 4, 4, 12)\n    - [x] Boolean (detects obvious strings `true`, `false`, `Y`, `N`)\n    - [x] Date (Smart detection via comprehensive regex pattern)\n    - [x] Timestamp (integer, number of milliseconds since unix epoch)\n    - [x] Currency (62 currency symbols supported)\n    - [x] Float (w/ scale \u0026 precision measurements)\n    - [x] Number (Integers)\n    - [x] Null (sparse column data helps w/ certain inferences)\n    - [x] Email (falls back to string)\n    - [x] String (big text and varchar awareness)\n    - [x] Array (includes min/max/avg length)\n    - [x] Object\n- [x] Detects column size minimum, maximum and average\n- [x] Includes data points at the 30th, 60th and 90th percentiles (for detecting outliers and enum types!)\n- [x] Handles some error/outliers\n- [x] Quantify # of unique values per column\n- [x] Identify `enum` Fields w/ Values\n- [x] Identify `Not Null` fields\n- [ ] Nested data structure \u0026 multi-table relational output.\n\u003c!-- - [ ] _Un-de-normalize_ JSON into flat typed objects. --\u003e\n\n### Getting Started\n\n```js\nnpm install schema-analyzer\n```\n\n```js\nimport { schemaBuilder } from 'schema-builder'\n\nschemaBuilder(schemaName: String, data: Array\u003cObject\u003e): TypeSummary\n```\n\n### Preview Analysis Results\n\n\u003e What does this library's analysis look like?\n\nIt consists of 3 key top-level properties:\n\n- `totalRows` - # of rows analyzed.\n- `fields: FieldTypeSummary` - a map of field names with all detected types ([includes meta-data](#aggregatesummary) for each type detected, with possible overlaps. e.g. an `Email` is also a `String`, `\"42\"` is a String and Number)\n\n#### Review the raw results below\n\nDetails about each field can be found below.\n\n```json\n{\n  \"totalRows\": 5,\n  \"fields\": {\n    \"id\": {\n      \"types\": {\n        \"Number\": {\n          \"rank\": 8,\n          \"count\": 5,\n          \"value\": { \"min\": 1, \"mean\": 3, \"max\": 5, \"p25\": 2, \"p33\": 2, \"p50\": 3, \"p66\": 4, \"p75\": 4, \"p99\": 5 }\n        },\n        \"String\": {\n          \"rank\": 12,\n          \"count\": 5,\n          \"length\": { \"min\": 1, \"mean\": 1, \"max\": 1, \"p25\": 1, \"p33\": 1, \"p50\": 1, \"p66\": 1, \"p75\": 1, \"p99\": 1 }\n        }\n      }\n    },\n    \"name\": {\n      \"types\": {\n        \"String\": {\n          \"rank\": 12,\n          \"count\": 5,\n          \"length\": { \"min\": 3, \"mean\": 7.2, \"max\": 15, \"p25\": 3, \"p33\": 3, \"p50\": 5, \"p66\": 10, \"p75\": 10, \"p99\": 15 }\n        }\n      }\n    },\n    \"role\": {\n      \"types\": {\n        \"String\": {\n          \"rank\": 12,\n          \"count\": 5,\n          \"length\": { \"min\": 4, \"mean\": 5.4, \"max\": 9, \"p25\": 4, \"p33\": 4, \"p50\": 5, \"p66\": 5, \"p75\": 5, \"p99\": 9 }\n        }\n      }\n    },\n    \"email\": {\n      \"types\": {\n        \"Email\": {\n          \"rank\": 11,\n          \"count\": 5,\n          \"length\": { \"min\": 15, \"mean\": 19.4, \"max\": 26, \"p25\": 15, \"p33\": 15, \"p50\": 18, \"p66\": 23, \"p75\": 23, \"p99\": 26 }\n        }\n      }\n    },\n    \"createdAt\": {\n      \"types\": {\n        \"Date\": {\n          \"rank\": 4,\n          \"count\": 4,\n          \"value\": { \"min\": \"2001-01-01T00:00:00.000Z\", \"mean\": \"2015-04-14T18:00:00.000Z\", \"max\": \"2020-02-02T00:00:00.000Z\", \"p25\": \"2020-02-02T00:00:00.000Z\", \"p33\": \"2020-02-02T00:00:00.000Z\", \"p50\": \"2019-12-31T00:00:00.000Z\", \"p66\": \"2019-12-31T00:00:00.000Z\", \"p75\": \"2001-01-01T00:00:00.000Z\", \"p99\": \"2001-01-01T00:00:00.000Z\" }\n        },\n        \"String\": {\n          \"rank\": 12,\n          \"count\": 1,\n          \"length\": { \"min\": 6, \"mean\": 6, \"max\": 6, \"p25\": 6, \"p33\": 6, \"p50\": 6, \"p66\": 6, \"p75\": 6, \"p99\": 6 }\n        }\n      }\n    },\n    \"accountConfirmed\": {\n      \"types\": {\n        \"Unknown\": {\n          \"rank\": -1,\n          \"count\": 1\n        },\n        \"String\": {\n          \"rank\": 12,\n          \"count\": 1,\n          \"length\": { \"min\": 9, \"mean\": 9, \"max\": 9, \"p25\": 9, \"p33\": 9, \"p50\": 9, \"p66\": 9, \"p75\": 9, \"p99\": 9 }\n        },\n        \"Boolean\": {\n          \"rank\": 3,\n          \"count\": 4\n        }\n      }\n    }\n  }\n}\n```\n\n#### Sample input dataset for the example results above\n\n| id | name            | role      | email                        | createdAt  | accountConfirmed |\n|----|-----------------|-----------|------------------------------|------------|------------------|\n| 1  | Eve             | poweruser | `eve@example.com`            | 01/20/2020 | undefined        |\n| 2  | Alice           | user      | `ali@example.com`            | 02/02/2020 | true             |\n| 3  | Bob             | user      | `robert@example.com`         | 12/31/2019 | true             |\n| 4  | Elliot Alderson | admin     | `falkensmaze@protonmail.com` | 01/01/2001 | false            |\n| 5  | Sam Sepiol      | admin     | `falkensmaze@hotmail.com`    | 9/9/99     | true             |\n\n\n\n#### `AggregateSummary`\n\nNumeric and String types include a summary of the observed field sizes:\n\n\u003e Number \u0026 String Range Object Details\n\n##### Properties\n\n- `min` the minimum number or string length\n- `max` the maximum number or string length\n- `mean` the average number or string length\n- `percentiles[25th, 33th, 50th, 66th, 75th, 99th]` values from the `Nth` percentile number or string length\n\nPercentile is based on input data, as-is with out sorting.\n\n##### Length Range Data\n\nRange data for the `length` of a `String` field type:\n\n```js\n{\n  \"rank\": 11,\n  \"count\": 5,\n  \"length\": { \"min\": 15, \"mean\": 19.4, \"max\": 26, \"p25\": 15, \"p33\": 15, \"p50\": 18, \"p66\": 23, \"p75\": 23, \"p99\": 26 }\n}\n```\n\nThis is useful for defining strict length limits or minimums, for example as SQL servers often require..\n\nRange data for a `Date` fields `value`:\n\n```js\n{\n  \"rank\": 4,\n  \"count\": 4,\n  \"value\": { \"min\": \"2001-01-01T00:00:00.000Z\", \"mean\": \"2015-04-14T18:00:00.000Z\", \"max\": \"2020-02-02T00:00:00.000Z\", \"p25\": \"2020-02-02T00:00:00.000Z\", \"p33\": \"2020-02-02T00:00:00.000Z\", \"p50\": \"2019-12-31T00:00:00.000Z\", \"p66\": \"2019-12-31T00:00:00.000Z\", \"p75\": \"2001-01-01T00:00:00.000Z\", \"p99\": \"2001-01-01T00:00:00.000Z\" }\n}\n```\n\n\n## Notes\n\nWe recommend you provide at least 100+ rows. Accuracy increases greatly with 1,000 rows.\n\nThe following features require a certain minimum # of records:\n\n- Enumeration detection.\n  - 100+ Rows Required.\n  - Number of unique values must not exceed 20 or 5% of the total number of records. (100 records will identify as Enum w/ 5 values. Up to 20 are possible given 400 or 1,000+.)\n- `Not Null` detection.\n  - where rowCount === field count\n\n### Full List of Detected Types\n\n- `Unknown`\n- `ObjectId`\n- `UUID`\n- `Boolean`\n- `Date`\n- `Timestamp`\n- `Currency`\n- `Float`\n- `Number`\n- `Email`\n- `String`\n- `Array`\n- `Object`\n- `Null`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustsml%2Fschema-analyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjustsml%2Fschema-analyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustsml%2Fschema-analyzer/lists"}