{"id":23626314,"url":"https://github.com/datalust/squirrel-json","last_synced_at":"2025-04-14T05:20:43.931Z","repository":{"id":38378314,"uuid":"290916665","full_name":"datalust/squirrel-json","owner":"datalust","description":"A vectorized JSON parser for pre-validated, minified documents","archived":false,"fork":false,"pushed_at":"2024-07-24T05:05:55.000Z","size":85,"stargazers_count":83,"open_issues_count":2,"forks_count":3,"subscribers_count":4,"default_branch":"dev","last_synced_at":"2025-03-27T19:09:12.832Z","etag":null,"topics":["json","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datalust.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-28T01:00:44.000Z","updated_at":"2024-12-02T13:36:46.000Z","dependencies_parsed_at":"2024-07-24T06:32:39.330Z","dependency_job_id":"241f2fe7-bbc5-495f-aede-8eb2256d1cdf","html_url":"https://github.com/datalust/squirrel-json","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datalust%2Fsquirrel-json","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datalust%2Fsquirrel-json/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datalust%2Fsquirrel-json/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datalust%2Fsquirrel-json/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datalust","download_url":"https://codeload.github.com/datalust/squirrel-json/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248824989,"owners_count":21167411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json","rust"],"created_at":"2024-12-27T22:52:55.583Z","updated_at":"2025-04-14T05:20:43.909Z","avatar_url":"https://github.com/datalust.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `squirrel-json`\n\n## 🐿⚡\n\nThis is heavily based on the JSON deserializer used by Seq's storage engine. You might find this useful if you're\nbuilding a document database that stores documents as minified JSON maps. The job of this code is to take a\nminified JSON object, like:\n\n```json\n{\"@t\":\"2020-03-12T17:08:37.6065924Z\",\"@mt\":\"Redirecting to continue intent {Intent}\",\"Elapsed\":3456}\n```\n\nand produce a flat tape of offsets into that document that can be fed to a traditional JSON parser to extract. It scans through\nthe document using vectorized CPU instructions that find and classify the features of the document very efficiently.\nIf only a fraction of that document is actually needed to satisfy a given query then only that fraction will pay the cost of\nfull deserialization. This is how Seq supports performant queries over log data without attempting to fit it into\ncolumn storage, or requiring it to reside in RAM.\n\n`squirrel-json` takes inspiration from [`simd-json`](https://github.com/simd-lite/simd-json) and is _very_ fast.\n`squirrel-json` is an interesting piece of software, but is neither as useful nor as interesting as\n`simd-json` if you're looking for a state-of-the-art JSON deserializer. This library makes heavy trade-offs\nto perform very well for sparse deserialization of pre-validated JSON maps at the expense of being\nunsuitable for just about anything else.\n\nSee [this blog post](https://blog.datalust.co/deserializing-json-really-fast/) for some more details!\n\n## Platform support\n\nThis library currently supports x86 using AVX2 intrinsics, and ARM using Neon intrinsics. Other platforms\nare supported using a slower (but still reasonably fast) fallback parser. Unfortunately we don't have\na way to test ARM in CI here yet, so support is best-effort.\n\n## ⚠️ CAREFUL\n\nThis library is designed for parsing pre-validated, minified JSON maps. It guarantees UB freedom\nfor any input (including when that input is invalid UTF8), but only guarantees sensical results\nfor valid JSON. See the test cases with an `invalid_` prefix to get an idea of what different\nkinds of input do.\n\nThis library contains a _lot_ of unsafe code and is very performance sensitive. Any changes\nneed to be carefully considered and should be:\n\n- tested against the benchmarks to make sure we don't regress (at least not accidentally).\n- fuzz tested to ensure there aren't soundness holes introduced.\n\nWe take advantage of properties of the JSON document to avoid bounds checks wherever possible\nand use tricks like converting enum variants into interior pointers. Hot paths try to avoid\nbranching as much as possible.\n\nAny unchecked operations performed on the document are done using macros that use the checked\nvariant in test/debug builds to make sure we don't ever cause UB when working through documents.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatalust%2Fsquirrel-json","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatalust%2Fsquirrel-json","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatalust%2Fsquirrel-json/lists"}