{"id":15012921,"url":"https://github.com/junyu-w/genson-rs","last_synced_at":"2025-08-20T10:31:43.394Z","repository":{"id":240812921,"uuid":"794419191","full_name":"junyu-w/genson-rs","owner":"junyu-w","description":"Blazing-fast JSON Schema inference engine built in Rust","archived":false,"fork":false,"pushed_at":"2024-05-26T02:07:12.000Z","size":89,"stargazers_count":73,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-10-31T09:08:02.121Z","etag":null,"topics":["json","json-schema","json-schema-inference","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/junyu-w.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-01T05:39:07.000Z","updated_at":"2024-10-08T04:32:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"ea60e1c5-26af-4b75-99d4-af12c5ba09ee","html_url":"https://github.com/junyu-w/genson-rs","commit_stats":{"total_commits":52,"total_committers":3,"mean_commits":"17.333333333333332","dds":"0.13461538461538458","last_synced_commit":"c330bb6ee1cc1941bea9808b11da53a538d684c4"},"previous_names":["junyu-w/genson-rs"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junyu-w%2Fgenson-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junyu-w%2Fgenson-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junyu-w%2Fgenson-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junyu-w%2Fgenson-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/junyu-w","download_url":"https://codeload.github.com/junyu-w/genson-rs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230415317,"owners_count":18222158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json","json-schema","json-schema-inference","rust"],"created_at":"2024-09-24T19:43:24.863Z","updated_at":"2024-12-19T10:09:06.946Z","avatar_url":"https://github.com/junyu-w.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# genson-rs\n\n[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/junyu-w/genson-rs)\n[![crates.io](https://img.shields.io/crates/v/genson-rs.svg)](https://crates.io/crates/genson-rs)\n[![CI](https://github.com/junyu-w/genson-rs/actions/workflows/rust.yml/badge.svg)](https://github.com/junyu-w/genson-rs/actions/workflows/rust.yml)\n\n*-- 🔥 Generate JSON Schema from Gigabytes of JSON data in seconds*\n\n`genson-rs` is a Rust rewrite of the [GenSON](https://github.com/wolverdude/genson/) Python library , which can be used to generate [JSON schema](https://json-schema.org/) (Draft-04 and after) from one or multiple JSON objects.\n\nWhile not having full feature parity yet, `genson-rs` focuses on **speed** ⚡️. It offers MUCH better performance (**25x ~ 75x faster**) compared to the Python `GenSON` library, and is generally a lot faster than other open source schema inference tools as well. Its high performance makes it a viable choice for online schema inference for large JSON dataset at scale. Check out the [benchmark](#benchmark) section for performance benchmark comparisons.\n\n## Install\nInstallation via [Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) is the easiest. If you don't have it already, follow the link to set up Cargo (one simple command), then run:\n```\ncargo install genson-rs\n```\nInstalling via `brew` will be supported soon.\n\n## Usage\n```\ngenson-rs \u003cOPTION\u003e \u003cFILE(S)\u003e\n```\n\ne.g. If you have a large JSON file full of request logs in JSON format\n```\ngenson-rs request_logs.json\n```\n\nAdditionally, if each request log is a JSON object in its own line, you can specify the delimiter which will slightly improve the performance\n```\ngenson-rs --delimiter newline request_logs.json \n```\n\n## Benchmark\n\nThe following benchmarks are executed manually on my local `2023 Macbook Pro with the M2 Pro Chip (10 cores, 4 high-efficiency + 6 high-performance), 16GB RAM, running macOS 13.0`. Each of the test JSON files is generated using the `json_gen.py` script inside of the `tests/data` folder, and each test was executed 3 times. The median was used out of the 3 runs.\n\n| Library         | File Size               | Time               |\n|-----------------|-------------------------|--------------------|\n| GenSON (Python) | 50 MB                   | 1.61s              |\n| genson-rs       | 50 MB                   | 🔥 **0.07s**       |\n| GenSON (Python) | 500 MB                  | 16.07s             |\n| genson-rs       | 500 MB                  | 🔥 **0.61s**       |\n| GenSON (Python) | 1 GB                    | 34.21s             |\n| genson-rs       | 1 GB                    | 🔥 **1.19s**       |\n| GenSON (Python) | 3 GB                    | 107.86s (1min 47s) |\n| genson-rs       | 3 GB                    | 🔥 **4.56s**       |\n| GenSON (Python) | 3 GB (Large JSON Array) | 443.83s (7min 23s) |\n| genson-rs       | 3 GB (Large JSON Array) | 🔥 **7.06s**       |\n\nAs you can see, `genson-rs` is *extremely* fast, and might be the fastest schema inference engine out there based on my rudimentary benchmarks against other tools (that I'm aware of) as well.\n\n## Optimization Techniques \n\nThe `genson-rs` library leverages the following techniques to greatly speed up the schema generation process:\n- ⚡️ **Rust being blazingly fast itself** -- without any GC or interpreter overhead, a 1-to-1 port in Rust running on a single CPU core runs 2x faster than the Python version already\n- ⚡️ **Parallel processing leveraging all available CPU cores** -- whie Python has the limitation of the GIL that prevents it from leveraging multiple CPU cores efficiently, `genson-rs` parallelizes [Map-Reduce](https://en.wikipedia.org/wiki/MapReduce) type of workload whenever possible (e.g. when processing gigantic arrays), maxing out all the available CPU cores\n- ⚡️ **Extremely fast JSON parsing powered by SIMD instructions** -- instead of fully parsing out the whole JSON dataset, we use the `simd-json` library (a Rust port of the C++ `simdjson` library) that leverages SIMD (Single Instruction/Multiple Data) instructions to only parse out the \"tape\" of the JSON dataset, which is sufficient enough to build the schema on top of without fully deserializing the whole dataset\n- ⚡️ **Efficient memory management using the MiMalloc allocator** -- this is recommended by the `simd-json` library itself, `genson-rs` opts to use the `MiMalloc` allocator instead of the default global allocator which made the code run faster by a decent amount\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunyu-w%2Fgenson-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjunyu-w%2Fgenson-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunyu-w%2Fgenson-rs/lists"}