{"id":13450020,"url":"https://github.com/minio/simdjson-go","last_synced_at":"2025-05-14T22:04:57.887Z","repository":{"id":37358840,"uuid":"239568408","full_name":"minio/simdjson-go","owner":"minio","description":"Golang port of simdjson: parsing gigabytes of JSON per second","archived":false,"fork":false,"pushed_at":"2025-03-06T13:17:13.000Z","size":13540,"stargazers_count":1908,"open_issues_count":1,"forks_count":98,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-05-14T22:03:06.502Z","etag":null,"topics":["golang-standard","json-document","json-files","ndjson","simdjson","tape","tape-format"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/minio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-10T17:15:53.000Z","updated_at":"2025-05-14T17:06:38.000Z","dependencies_parsed_at":"2025-03-26T01:00:23.596Z","dependency_job_id":null,"html_url":"https://github.com/minio/simdjson-go","commit_stats":{"total_commits":431,"total_committers":10,"mean_commits":43.1,"dds":"0.20417633410672853","last_synced_commit":"d82c779820b28b701fc258ee32f5df4ffc368f2d"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/minio%2Fsimdjson-go","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/minio%2Fsimdjson-go/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/minio%2Fsimdjson-go/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/minio%2Fsimdjson-go/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/minio","download_url":"https://codeload.github.com/minio/simdjson-go/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254235686,"owners_count":22036962,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang-standard","json-document","json-files","ndjson","simdjson","tape","tape-format"],"created_at":"2024-07-31T07:00:27.316Z","updated_at":"2025-05-14T22:04:57.783Z","avatar_url":"https://github.com/minio.png","language":"Go","readme":"# simdjson-go\n\n## Introduction\n\nThis is a Golang port of [simdjson](https://github.com/lemire/simdjson),\na high performance JSON parser developed by Daniel Lemire and Geoff Langdale.\nIt makes extensive use of SIMD instructions to achieve parsing performance of gigabytes of JSON per second.\n\nPerformance wise, `simdjson-go` runs on average at about 40% to 60% of the speed of simdjson.\nCompared to Golang's standard package `encoding/json`, `simdjson-go` is about 10x faster.\n\n[![Documentation](https://godoc.org/github.com/minio/simdjson-go?status.svg)](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc)\n\n## Features\n\n`simdjson-go` is a validating parser, meaning that it amongst others validates and checks numerical values, booleans etc.\n Therefore, these values are available as the appropriate `int` and `float64` representations after parsing.\n\nAdditionally `simdjson-go` has the following features:\n\n- No 4 GB object limit\n- Support for [ndjson](http://ndjson.org/) (newline delimited json)\n- Pure Go (no need for cgo)\n- Object search/traversal.\n- In-place value replacement.\n- Remove object/array members.\n- Serialize parsed JSONas binary data.\n- Re-serialize parts as JSON.\n\n## Requirements\n\n`simdjson-go` has the following requirements for parsing:\n\nA CPU with both AVX2 and CLMUL is required (Haswell from 2013 onwards should do for Intel, for AMD a Ryzen/EPYC CPU (Q1 2017) should be sufficient).\nThis can be checked using the provided [`SupportedCPU()`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#SupportedCPU`) function.\n\nThe package does not provide fallback for unsupported CPUs, but serialized data can be deserialized on an unsupported CPU.\n\nUsing the `gccgo` will also always return unsupported CPU since it cannot compile assembly.\n\n## Usage\n\nRun the following command in order to install `simdjson-go`\n\n```bash\ngo get -u github.com/minio/simdjson-go\n```\n\nIn order to parse a JSON byte stream, you either call [`simdjson.Parse()`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#Parse)\nor [`simdjson.ParseND()`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#ParseND) for newline delimited JSON files.\nBoth of these functions return a [`ParsedJson`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#ParsedJson)\nstruct that can be used to navigate the JSON object by calling [`Iter()`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#ParsedJson.Iter).\n\nThe easiest use is to call [`ForEach()`]((https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#ParsedJson.ForEach)) function of the returned `ParsedJson`.\n\n```Go\nfunc main() {\n\t// Parse JSON:\n\tpj, err := Parse([]byte(`{\"Image\":{\"URL\":\"http://example.com/example.gif\"}}`), nil)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\n\t// Iterate each top level element.\n\t_ = pj.ForEach(func(i Iter) error {\n\t\tfmt.Println(\"Got iterator for type:\", i.Type())\n\t\telement, err := i.FindElement(nil, \"Image\", \"URL\")\n\t\tif err == nil {\n\t\t\tvalue, _ := element.Iter.StringCvt()\n\t\t\tfmt.Println(\"Found element:\", element.Name, \"Type:\", element.Type, \"Value:\", value)\n\t\t}\n\t\treturn nil\n\t})\n\n\t// Output:\n\t// Got iterator for type: object\n\t// Found element: URL Type: string Value: http://example.com/example.gif\n}\n```\n\n### Parsing with iterators\n\nUsing the type [`Iter`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#Iter) you can call\n[`Advance()`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#Iter.Advance) to iterate over the tape, like so:\n\n```Go\nfor {\n    typ := iter.Advance()\n\n    switch typ {\n    case simdjson.TypeRoot:\n        if typ, tmp, err = iter.Root(tmp); err != nil {\n            return\n        }\n\n        if typ == simdjson.TypeObject {\n            if obj, err = tmp.Object(obj); err != nil {\n                return\n            }\n\n            e := obj.FindKey(key, \u0026elem)\n            if e != nil \u0026\u0026 elem.Type == simdjson.TypeString {\n                v, _ := elem.Iter.StringBytes()\n                fmt.Println(string(v))\n            }\n        }\n\n    default:\n        return\n    }\n}\n```\n\nWhen you advance the Iter you get the next type currently queued.\n\nEach type then has helpers to access the data. When you get a type you can use these to access the data:\n\n| Type       | Action on Iter             |\n|------------|----------------------------|\n| TypeNone   | Nothing follows. Iter done |\n| TypeNull   | Null value                 |\n| TypeString | `String()`/`StringBytes()` |\n| TypeInt    | `Int()`/`Float()`          |\n| TypeUint   | `Uint()`/`Float()`         |\n| TypeFloat  | `Float()`                  |\n| TypeBool   | `Bool()`                   |\n| TypeObject | `Object()`                 |\n| TypeArray  | `Array()`                  |\n| TypeRoot   | `Root()`                   |\n\nYou can also get the next value as an `interface{}` using the [Interface()](https://pkg.go.dev/github.com/minio/simdjson-go#Iter.Interface) method.\n\nNote that arrays and objects that are null are always returned as `TypeNull`.\n\nThe complex types returns helpers that will help parse each of the underlying structures.\n\nIt is up to you to keep track of the nesting level you are operating at.\n\nFor any `Iter` it is possible to marshal the recursive content of the Iter using\n[`MarshalJSON()`](https://pkg.go.dev/github.com/minio/simdjson-go#Iter.MarshalJSON) or\n[`MarshalJSONBuffer(...)`](https://pkg.go.dev/github.com/minio/simdjson-go#Iter.MarshalJSONBuffer).\n\nCurrently, it is not possible to unmarshal into structs.\n\n### Search by path\n\nIt is possible to search by path to find elements by traversing objects.\n\nFor example:\n\n```\n\t// Find element in path.\n\telem, err := i.FindElement(nil, \"Image\", \"URL\")\n```\n\nWill locate the field inside a json object with the following structure:\n\n```\n{\n    \"Image\": {\n        \"URL\": \"value\"\n    }\n}\n```\n\nThe values can be any type. The [Element](https://pkg.go.dev/github.com/minio/simdjson-go#Element)\nwill contain the element information and an Iter to access the content.\n\n## Parsing Objects\n\nIf you are only interested in one key in an object you can use `FindKey` to quickly select it.\n\nIt is possible to use the `ForEach(fn func(key []byte, i Iter), onlyKeys map[string]struct{})` \nwhich makes it possible to get a callback for each element in the object. \n\nAn object can be traversed manually by using `NextElement(dst *Iter) (name string, t Type, err error)`.\nThe key of the element will be returned as a string and the type of the value will be returned\nand the provided `Iter` will contain an iterator which will allow access to the content.\n\nThere is a `NextElementBytes` which provides the same, but without the need to allocate a string.\n\nAll elements of the object can be retrieved using a pretty lightweight [`Parse`](https://pkg.go.dev/github.com/minio/simdjson-go#Object.Parse)\nwhich provides a map of all keys and all elements an a slide.\n\nAll elements of the object can be returned as `map[string]interface{}` using the `Map` method on the object.\nThis will naturally perform allocations for all elements.\n\n## Parsing Arrays\n\n[Arrays](https://pkg.go.dev/github.com/minio/simdjson-go#Array) in JSON can have mixed types.\n\nIt is possible to call `ForEach(fn func(i Iter))` to get each element.\n\nTo iterate over the array with mixed types use the [`Iter`](https://pkg.go.dev/github.com/minio/simdjson-go#Array.Iter)\nmethod to get an iterator.\n\nThere are methods that allow you to retrieve all elements as a single type,\n[]int64, []uint64, []float64 and []string with AsInteger(), AsUint64(), AsFloat() and AsString().\n\n## Number parsing\n\nNumbers in JSON are untyped and are returned by the following rules in order:\n\n* If there is any float point notation, like exponents, or a dot notation, it is always returned as float.\n* If number is a pure integer and it fits within an int64 it is returned as such.\n* If number is a pure positive integer and fits within a uint64 it is returned as such.\n* If the number is valid number it is returned as float64.\n\nIf the number was converted from integer notation to a float due to not fitting inside int64/uint64\nthe `FloatOverflowedInteger` flag is set, which can be retrieved using `(Iter).FloatFlags()` method.\n\nJSON numbers follow JavaScript’s double-precision floating-point format.\n\n* Represented in base 10 with no superfluous leading zeros (e.g. 67, 1, 100).\n* Include digits between 0 and 9.\n* Can be a negative number (e.g. -10).\n* Can be a fraction (e.g. .5).\n* Can also have an exponent of 10, prefixed by e or E with a plus or minus sign to indicate positive or negative exponentiation.\n* Octal and hexadecimal formats are not supported.\n* Can not have a value of NaN (Not A Number) or Infinity.\n\n## Parsing NDJSON stream\n\nNewline delimited json is sent as packets with each line being a root element.\n\nHere is an example that counts the number of `\"Make\": \"HOND\"` in NDJSON similar to this:\n\n```\n{\"Age\":20, \"Make\": \"HOND\"}\n{\"Age\":22, \"Make\": \"TLSA\"}\n```\n\n```Go\nfunc findHondas(r io.Reader) {\n\tvar nFound int\n\n\t// Communication\n\treuse := make(chan *simdjson.ParsedJson, 10)\n\tres := make(chan simdjson.Stream, 10)\n\n\tsimdjson.ParseNDStream(r, res, reuse)\n\t// Read results in blocks...\n\tfor got := range res {\n\t\tif got.Error != nil {\n\t\t\tif got.Error == io.EOF {\n\t\t\t\tbreak\n\t\t\t}\n\t\t\tlog.Fatal(got.Error)\n\t\t}\n\n\t\tvar result int\n\t\tvar elem *Element\n\t\terr := got.Value.ForEach(func(i Iter) error {\n\t\t\tvar err error\n\t\t\telem, err = i.FindElement(elem, \"Make\")\n\t\t\tif err != nil {\n\t\t\t\treturn nil\n\t\t\t}\n\t\t\tbts, _ := elem.Iter.StringBytes()\n\t\t\tif string(bts) == \"HOND\" {\n\t\t\t\tresult++\n\t\t\t}\n\t\t\treturn nil\n\t\t})\n\t\treuse \u003c- got.Value\n\t}\n\tfmt.Println(\"Found\", nFound, \"Hondas\")\n}\n```\n\nMore examples can be found in the examples subdirectory and further documentation can be found at [godoc](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc).\n\n\n### In-place Value Replacement\n\nIt is possible to replace a few, basic internal values.\nThis means that when re-parsing or re-serializing the parsed JSON these values will be output.\n\nBoolean (true/false) and null values can be freely exchanged.\n\nNumeric values (float, int, uint) can be exchanged freely.\n\nStrings can also be exchanged with different values.\n\nStrings and numbers can be exchanged. However, note that there is no checks for numbers inserted as object keys,\nso if used for this invalid JSON is possible.\n\nThere is no way to modify objects, arrays, other than value types above inside each.\nIt is not possible to remove or add elements.\n\nTo replace a value, of value referenced by an `Iter` simply call `SetNull`, `SetBool`, `SetFloat`, `SetInt`, `SetUInt`,\n`SetString` or `SetStringBytes`.\n\n### Object \u0026 Array Element Deletion\n\nIt is possible to delete one or more elements in an object.\n\n`(*Object).DeleteElems(fn, onlyKeys)` will call back fn for each key+ value.\n\nIf true is returned, the key+value is deleted. A key filter can be provided for optional filtering.\nIf the callback function is nil all elements matching the filter will be deleted.\nIf both are nil all elements are deleted.\n\nExample:\n\n```Go\n\t// The object we are modifying\n\tvar obj *simdjson.Object\n\n\t// Delete all entries where the key is \"unwanted\":\n\terr = obj.DeleteElems(func(key []byte, i Iter) bool {\n\t\treturn string(key) == \"unwanted\")\n\t}, nil)\n\n\t// Alternative version with prefiltered keys:\n\terr = obj.DeleteElems(nil, map[string]struct{}{\"unwanted\": {}})\n```\n\n`(*Array).DeleteElems(fn func(i Iter) bool)` will call back fn for each array value.\nIf the function returns true the element is deleted in the array.\n\n```Go\n\t// The array we are modifying\n\tvar array *simdjson.Array\n\n\t// Delete all entries that are strings.\n\tarray.DeleteElems(func(i Iter) bool {\n\t\treturn i.Type() == TypeString\n\t})\n```\n\n## Serializing parsed json\n\nIt is possible to serialize parsed JSON for more compact storage and faster load time.\n\nTo create a new serialized use [NewSerializer](https://pkg.go.dev/github.com/minio/simdjson-go#NewSerializer).\nThis serializer can be reused for several JSON blocks.\n\nThe serializer will provide string deduplication and compression of elements.\nThis can be finetuned using the [`CompressMode`](https://pkg.go.dev/github.com/minio/simdjson-go#Serializer.CompressMode) setting.\n\nTo serialize a block of parsed data use the [`Serialize`](https://pkg.go.dev/github.com/minio/simdjson-go#Serializer.Serialize) method.\n\nTo read back use the [`Deserialize`](https://pkg.go.dev/github.com/minio/simdjson-go#Serializer.Deserialize) method.\nFor deserializing the compression mode does not need to match since it is read from the stream.\n\nExample of speed for serializer/deserializer on [`parking-citations-1M`](https://dl.minio.io/assets/parking-citations-1M.json.zst).\n\n| Compress Mode | % of JSON size | Serialize Speed | Deserialize Speed |\n|---------------|----------------|-----------------|-------------------|\n| None          | 177.26%        | 425.70 MB/s     | 2334.33 MB/s      |\n| Fast          | 17.20%         | 412.75 MB/s     | 1234.76 MB/s      |\n| Default       | 16.85%         | 411.59 MB/s     | 1242.09 MB/s      |\n| Best          | 10.91%         | 337.17 MB/s     | 806.23 MB/s       |\n\nIn some cases the speed difference and compression difference will be bigger.\n\n## Performance vs `encoding/json` and `json-iterator/go`\n\nThough simdjson provides different output than traditional unmarshal functions this can give\nan overview of the expected performance for reading specific data in JSON.\n\nBelow is a performance comparison to Golang's standard package `encoding/json` based on the same set of JSON test files, unmarshal to `interface{}`.\n\nComparisons with default settings:\n\n```\nλ benchcmp enc-json.txt simdjson.txt\nbenchmark                      old ns/op     new ns/op     delta\nBenchmarkApache_builds-32      1219080       142972        -88.27%\nBenchmarkCanada-32             38362219      13417193      -65.02%\nBenchmarkCitm_catalog-32       17051899      1359983       -92.02%\nBenchmarkGithub_events-32      603037        74042         -87.72%\nBenchmarkGsoc_2018-32          20777333      1259171       -93.94%\nBenchmarkInstruments-32        2626808       301370        -88.53%\nBenchmarkMarine_ik-32          56630295      14419901      -74.54%\nBenchmarkMesh-32               13411486      4206251       -68.64%\nBenchmarkMesh_pretty-32        18226803      4786081       -73.74%\nBenchmarkNumbers-32            2131951       909641        -57.33%\nBenchmarkRandom-32             7360966       1004387       -86.36%\nBenchmarkTwitter-32            6635848       588773        -91.13%\nBenchmarkTwitterescaped-32     6292856       972250        -84.55%\nBenchmarkUpdate_center-32      6396501       708717        -88.92%\n\nbenchmark                      old MB/s     new MB/s     speedup\nBenchmarkApache_builds-32      104.40       890.21       8.53x\nBenchmarkCanada-32             58.68        167.77       2.86x\nBenchmarkCitm_catalog-32       101.29       1270.02      12.54x\nBenchmarkGithub_events-32      108.01       879.67       8.14x\nBenchmarkGsoc_2018-32          160.17       2642.88      16.50x\nBenchmarkInstruments-32        83.88        731.15       8.72x\nBenchmarkMarine_ik-32          52.68        206.90       3.93x\nBenchmarkMesh-32               53.95        172.03       3.19x\nBenchmarkMesh_pretty-32        86.54        329.57       3.81x\nBenchmarkNumbers-32            70.42        165.04       2.34x\nBenchmarkRandom-32             69.35        508.25       7.33x\nBenchmarkTwitter-32            95.17        1072.59      11.27x\nBenchmarkTwitterescaped-32     89.37        578.46       6.47x\nBenchmarkUpdate_center-32      83.35        752.31       9.03x\n\nbenchmark                      old allocs     new allocs     delta\nBenchmarkApache_builds-32      9716           22             -99.77%\nBenchmarkCanada-32             392535         250            -99.94%\nBenchmarkCitm_catalog-32       95372          110            -99.88%\nBenchmarkGithub_events-32      3328           17             -99.49%\nBenchmarkGsoc_2018-32          58615          67             -99.89%\nBenchmarkInstruments-32        13336          33             -99.75%\nBenchmarkMarine_ik-32          614776         467            -99.92%\nBenchmarkMesh-32               149504         122            -99.92%\nBenchmarkMesh_pretty-32        149504         122            -99.92%\nBenchmarkNumbers-32            20025          28             -99.86%\nBenchmarkRandom-32             66083          76             -99.88%\nBenchmarkTwitter-32            31261          53             -99.83%\nBenchmarkTwitterescaped-32     31757          53             -99.83%\nBenchmarkUpdate_center-32      49074          58             -99.88%\n\nbenchmark                      old bytes     new bytes     delta\nBenchmarkApache_builds-32      461556        965           -99.79%\nBenchmarkCanada-32             10943847      39793         -99.64%\nBenchmarkCitm_catalog-32       5122732       6089          -99.88%\nBenchmarkGithub_events-32      186148        802           -99.57%\nBenchmarkGsoc_2018-32          7032092       17215         -99.76%\nBenchmarkInstruments-32        882265        1310          -99.85%\nBenchmarkMarine_ik-32          22564413      189870        -99.16%\nBenchmarkMesh-32               7130934       15483         -99.78%\nBenchmarkMesh_pretty-32        7288661       12066         -99.83%\nBenchmarkNumbers-32            1066304       1280          -99.88%\nBenchmarkRandom-32             2787054       4096          -99.85%\nBenchmarkTwitter-32            2152260       2550          -99.88%\nBenchmarkTwitterescaped-32     2330548       3062          -99.87%\nBenchmarkUpdate_center-32      2729631       3235          -99.88%\n```\n\nHere is another benchmark comparison to `json-iterator/go`, unmarshal to `interface{}`.\n\n```\nλ benchcmp jsiter.txt simdjson.txt\nbenchmark                      old ns/op     new ns/op     delta\nBenchmarkApache_builds-32      891370        142972        -83.96%\nBenchmarkCanada-32             52365386      13417193      -74.38%\nBenchmarkCitm_catalog-32       10154544      1359983       -86.61%\nBenchmarkGithub_events-32      398741        74042         -81.43%\nBenchmarkGsoc_2018-32          15584278      1259171       -91.92%\nBenchmarkInstruments-32        1858339       301370        -83.78%\nBenchmarkMarine_ik-32          49881479      14419901      -71.09%\nBenchmarkMesh-32               15038300      4206251       -72.03%\nBenchmarkMesh_pretty-32        17655583      4786081       -72.89%\nBenchmarkNumbers-32            2903165       909641        -68.67%\nBenchmarkRandom-32             6156849       1004387       -83.69%\nBenchmarkTwitter-32            4655981       588773        -87.35%\nBenchmarkTwitterescaped-32     5521004       972250        -82.39%\nBenchmarkUpdate_center-32      5540200       708717        -87.21%\n\nbenchmark                      old MB/s     new MB/s     speedup\nBenchmarkApache_builds-32      142.79       890.21       6.23x\nBenchmarkCanada-32             42.99        167.77       3.90x\nBenchmarkCitm_catalog-32       170.09       1270.02      7.47x\nBenchmarkGithub_events-32      163.34       879.67       5.39x\nBenchmarkGsoc_2018-32          213.54       2642.88      12.38x\nBenchmarkInstruments-32        118.57       731.15       6.17x\nBenchmarkMarine_ik-32          59.81        206.90       3.46x\nBenchmarkMesh-32               48.12        172.03       3.58x\nBenchmarkMesh_pretty-32        89.34        329.57       3.69x\nBenchmarkNumbers-32            51.71        165.04       3.19x\nBenchmarkRandom-32             82.91        508.25       6.13x\nBenchmarkTwitter-32            135.64       1072.59      7.91x\nBenchmarkTwitterescaped-32     101.87       578.46       5.68x\nBenchmarkUpdate_center-32      96.24        752.31       7.82x\n\nbenchmark                      old allocs     new allocs     delta\nBenchmarkApache_builds-32      13248          22             -99.83%\nBenchmarkCanada-32             665988         250            -99.96%\nBenchmarkCitm_catalog-32       118755         110            -99.91%\nBenchmarkGithub_events-32      4442           17             -99.62%\nBenchmarkGsoc_2018-32          90915          67             -99.93%\nBenchmarkInstruments-32        18776          33             -99.82%\nBenchmarkMarine_ik-32          692512         467            -99.93%\nBenchmarkMesh-32               184137         122            -99.93%\nBenchmarkMesh_pretty-32        204037         122            -99.94%\nBenchmarkNumbers-32            30037          28             -99.91%\nBenchmarkRandom-32             88091          76             -99.91%\nBenchmarkTwitter-32            45040          53             -99.88%\nBenchmarkTwitterescaped-32     47198          53             -99.89%\nBenchmarkUpdate_center-32      66757          58             -99.91%\n\nbenchmark                      old bytes     new bytes     delta\nBenchmarkApache_builds-32      518350        965           -99.81%\nBenchmarkCanada-32             16189358      39793         -99.75%\nBenchmarkCitm_catalog-32       5571982       6089          -99.89%\nBenchmarkGithub_events-32      221631        802           -99.64%\nBenchmarkGsoc_2018-32          11771591      17215         -99.85%\nBenchmarkInstruments-32        991674        1310          -99.87%\nBenchmarkMarine_ik-32          25257277      189870        -99.25%\nBenchmarkMesh-32               7991707       15483         -99.81%\nBenchmarkMesh_pretty-32        8628570       12066         -99.86%\nBenchmarkNumbers-32            1226518       1280          -99.90%\nBenchmarkRandom-32             3167528       4096          -99.87%\nBenchmarkTwitter-32            2426730       2550          -99.89%\nBenchmarkTwitterescaped-32     2607198       3062          -99.88%\nBenchmarkUpdate_center-32      3052382       3235          -99.89%\n```\n\n\n### Inplace strings\n\nThe best performance is obtained by keeping the JSON message fully mapped in memory and using the\n`WithCopyStrings(false)` option. This prevents duplicate copies of string values being made\nbut mandates that the original JSON buffer is kept alive until the `ParsedJson` object is no longer needed\n(ie iteration over the tape format has been completed).\n\nIn case the JSON message buffer is freed earlier (or for streaming use cases where memory is reused)\n`WithCopyStrings(true)` should be used (which is the default behaviour).\n\nThe performance impact differs based on the input type, but this is the general differences:\n\n```\nBenchmarkApache_builds/copy-32                \t    8242\t    142972 ns/op\t 890.21 MB/s\t     965 B/op\t      22 allocs/op\nBenchmarkApache_builds/nocopy-32              \t   10000\t    111189 ns/op\t1144.68 MB/s\t     932 B/op\t      22 allocs/op\n\nBenchmarkCanada/copy-32                       \t      91\t  13417193 ns/op\t 167.77 MB/s\t   39793 B/op\t     250 allocs/op\nBenchmarkCanada/nocopy-32                     \t      87\t  13392401 ns/op\t 168.08 MB/s\t   41334 B/op\t     250 allocs/op\n\nBenchmarkCitm_catalog/copy-32                 \t     889\t   1359983 ns/op\t1270.02 MB/s\t    6089 B/op\t     110 allocs/op\nBenchmarkCitm_catalog/nocopy-32               \t     924\t   1268470 ns/op\t1361.64 MB/s\t    5582 B/op\t     110 allocs/op\n\nBenchmarkGithub_events/copy-32                \t   16092\t     74042 ns/op\t 879.67 MB/s\t     802 B/op\t      17 allocs/op\nBenchmarkGithub_events/nocopy-32              \t   19446\t     62143 ns/op\t1048.10 MB/s\t     794 B/op\t      17 allocs/op\n\nBenchmarkGsoc_2018/copy-32                    \t     948\t   1259171 ns/op\t2642.88 MB/s\t   17215 B/op\t      67 allocs/op\nBenchmarkGsoc_2018/nocopy-32                  \t    1144\t   1040864 ns/op\t3197.18 MB/s\t    9947 B/op\t      67 allocs/op\n\nBenchmarkInstruments/copy-32                  \t    3932\t    301370 ns/op\t 731.15 MB/s\t    1310 B/op\t      33 allocs/op\nBenchmarkInstruments/nocopy-32                \t    4443\t    271500 ns/op\t 811.59 MB/s\t    1258 B/op\t      33 allocs/op\n\nBenchmarkMarine_ik/copy-32                    \t      79\t  14419901 ns/op\t 206.90 MB/s\t  189870 B/op\t     467 allocs/op\nBenchmarkMarine_ik/nocopy-32                  \t      79\t  14176758 ns/op\t 210.45 MB/s\t  189867 B/op\t     467 allocs/op\n\nBenchmarkMesh/copy-32                         \t     288\t   4206251 ns/op\t 172.03 MB/s\t   15483 B/op\t     122 allocs/op\nBenchmarkMesh/nocopy-32                       \t     285\t   4207299 ns/op\t 171.99 MB/s\t   15615 B/op\t     122 allocs/op\n\nBenchmarkMesh_pretty/copy-32                  \t     248\t   4786081 ns/op\t 329.57 MB/s\t   12066 B/op\t     122 allocs/op\nBenchmarkMesh_pretty/nocopy-32                \t     250\t   4803647 ns/op\t 328.37 MB/s\t   12009 B/op\t     122 allocs/op\n\nBenchmarkNumbers/copy-32                      \t    1336\t    909641 ns/op\t 165.04 MB/s\t    1280 B/op\t      28 allocs/op\nBenchmarkNumbers/nocopy-32                    \t    1321\t    910493 ns/op\t 164.88 MB/s\t    1281 B/op\t      28 allocs/op\n\nBenchmarkRandom/copy-32                       \t    1201\t   1004387 ns/op\t 508.25 MB/s\t    4096 B/op\t      76 allocs/op\nBenchmarkRandom/nocopy-32                     \t    1554\t    773142 ns/op\t 660.26 MB/s\t    3198 B/op\t      76 allocs/op\n\nBenchmarkTwitter/copy-32                      \t    2035\t    588773 ns/op\t1072.59 MB/s\t    2550 B/op\t      53 allocs/op\nBenchmarkTwitter/nocopy-32                    \t    2485\t    475949 ns/op\t1326.85 MB/s\t    2029 B/op\t      53 allocs/op\n\nBenchmarkTwitterescaped/copy-32               \t    1189\t    972250 ns/op\t 578.46 MB/s\t    3062 B/op\t      53 allocs/op\nBenchmarkTwitterescaped/nocopy-32             \t    1372\t    874972 ns/op\t 642.77 MB/s\t    2518 B/op\t      53 allocs/op\n\nBenchmarkUpdate_center/copy-32                \t    1665\t    708717 ns/op\t 752.31 MB/s\t    3235 B/op\t      58 allocs/op\nBenchmarkUpdate_center/nocopy-32              \t    2241\t    536027 ns/op\t 994.68 MB/s\t    2130 B/op\t      58 allocs/op\n```\n\n## Design\n\n`simdjson-go` follows the same two stage design as `simdjson`.\nDuring the first stage the structural elements (`{`, `}`, `[`, `]`, `:`, and `,`)\nare detected and forwarded as offsets in the message buffer to the second stage.\nThe second stage builds a tape format of the structure of the JSON document.\n\nNote that in contrast to `simdjson`, `simdjson-go` outputs `uint32`\nincrements (as opposed to absolute values) to the second stage.\nThis allows arbitrarily large JSON files to be parsed (as long as a single (string) element does not surpass 4 GB...).\n\nAlso, for better performance,\nboth stages run concurrently as separate go routines and a go channel is used to communicate between the two stages.\n\n### Stage 1\n\nStage 1 has been converted from the original C code (containing the SIMD intrinsics) to Golang assembly using [c2goasm](https://github.com/minio/c2goasm).\nIt essentially consists of five separate steps, being:\n\n- `find_odd_backslash_sequences`: detect backslash characters used to escape quotes\n- `find_quote_mask_and_bits`: generate a mask with bits turned on for characters between quotes\n- `find_whitespace_and_structurals`: generate a mask for whitespace plus a mask for the structural characters\n- `finalize_structurals`: combine the masks computed above into a final mask where each active bit represents the position of a structural character in the input message.\n- `flatten_bits_incremental`: output the active bits in the final mask as incremental offsets.\n\nFor more details you can take a look at the various test cases in `find_subroutines_amd64_test.go` to see how\nthe individual routines can be invoked (typically with a 64 byte input buffer that generates one or more 64-bit masks).\n\nThere is one final routine, `find_structural_bits_in_slice`, that ties it all together and is\ninvoked with a slice of the message buffer in order to find the incremental offsets.\n\n### Stage 2\n\nDuring Stage 2 the tape structure is constructed.\nIt is essentially a single function that jumps around as it finds the various structural characters\nand builds the hierarchy of the JSON document that it processes.\nThe values of the JSON elements such as strings, integers, booleans etc. are parsed and written to the tape.\n\nAny errors (such as an array not being closed or a missing closing brace) are detected and reported back as errors to the client.\n\n## Tape format\n\nSimilarly to `simdjson`, `simdjson-go` parses the structure onto a 'tape' format.\nWith this format it is possible to skip over arrays and (sub)objects as the sizes are recorded in the tape.\n\n`simdjson-go` format is exactly the same as the `simdjson` [tape](https://github.com/lemire/simdjson/blob/master/doc/tape.md)\nformat with the following 2 exceptions:\n\n- In order to support ndjson, it is possible to have more than one root element on the tape.\nAlso, to allow for fast navigation over root elements, a root points to the next root element\n(and as such the last root element points 1 index past the length of the tape).\n\nA \"NOP\" tag is added. The value contains the number of tape entries to skip forward for next tag.\n\n- Strings are handled differently, unlike `simdjson` the string size is not prepended in the String buffer\nbut is added as an additional element to the tape itself (much like integers and floats).\n  - In case `WithCopyStrings(false)` Only strings that contain special characters are copied to the String buffer\nin which case the payload from the tape is the offset into the String buffer.\nFor string values without special characters the tape's payload points directly into the message buffer.\n  - In case `WithCopyStrings(true)` (default): Strings are always copied to the String buffer.\n\nFor more information, see `TestStage2BuildTape` in `stage2_build_tape_test.go`.\n\n## Fuzz Tests\n\n`simdjson-go` has been extensively fuzz tested to ensure that input cannot generate crashes and that output matches\nthe standard library.\n\nThe fuzz tests are included as Go 1.18+ compatible tests.\n\n## License\n\n`simdjson-go` is released under the Apache License v2.0. You can find the complete text in the file LICENSE.\n\n## Contributing\n\nContributions are welcome, please send PRs for any enhancements.\n\nIf your PR include parsing changes please run fuzz testers for a couple of hours.\n","funding_links":[],"categories":["Go","Uncategorized","Parsing"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fminio%2Fsimdjson-go","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fminio%2Fsimdjson-go","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fminio%2Fsimdjson-go/lists"}