{"id":19364276,"url":"https://github.com/mity/centijson","last_synced_at":"2025-04-23T14:30:42.881Z","repository":{"id":49230342,"uuid":"159403456","full_name":"mity/centijson","owner":"mity","description":"C JSON parser (both, SAX-like \u0026 full DOM)","archived":false,"fork":false,"pushed_at":"2024-01-20T19:23:18.000Z","size":188,"stargazers_count":22,"open_issues_count":4,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-02T15:42:53.822Z","etag":null,"topics":["c","json","json-parser","json-pointer","json-serializer","mit-license"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mity.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-27T21:45:18.000Z","updated_at":"2024-10-28T06:20:09.000Z","dependencies_parsed_at":"2024-11-10T07:37:04.430Z","dependency_job_id":"7e2433e0-f5a8-4b7c-8fb5-1d95bcf63abd","html_url":"https://github.com/mity/centijson","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mity%2Fcentijson","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mity%2Fcentijson/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mity%2Fcentijson/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mity%2Fcentijson/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mity","download_url":"https://codeload.github.com/mity/centijson/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250451578,"owners_count":21432851,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","json","json-parser","json-pointer","json-serializer","mit-license"],"created_at":"2024-11-10T07:36:54.159Z","updated_at":"2025-04-23T14:30:42.543Z","avatar_url":"https://github.com/mity.png","language":"C","readme":"\n# CentiJSON Readme\n\n* Home: http://github.com/mity/centijson\n\n\n## What is JSON\n\nFrom http://json.org:\n\n\u003e JSON (JavaScript Object Notation) is a lightweight data-interchange format.\n\u003e It is easy for humans to read and write. It is easy for machines to parse\n\u003e and generate. It is based on a subset of the JavaScript Programming Language,\n\u003e Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is\n\u003e completely language independent but uses conventions that are familiar to\n\u003e programmers of the C-family of languages, including C, C++, C#, Java,\n\u003e JavaScript, Perl, Python, and many others. These properties make JSON an ideal\n\u003e data-interchange language.\n\n\n## Main features:\n\n* **Size:** The code size and memory footprint is relatively small.\n\n* **Standard compliance** High emphasis is put on correctness and compliance\n  with the JSON standards [ECMA-404], [RFC-8259] and [RFC-6901]. That includes:\n\n  * **Full input validation:** During the parsing, CentiJSON verifies that the\n    input forms valid JSON.\n\n  * **String validation:** CentiJSON verifies that all strings are valid UTF-8\n    (including corner cases like two Unicode escapes forming surrogate pairs).\n    All JSON escape sequences are automatically translated to their respective\n    Unicode counterparts.\n\n* **Diagnostics:** In the case of an invalid input, you get more than just some\n  failure flag, but also an information about nature of the issue and about its\n  location in the document where it has been detected (offset as well as the\n  line and column numbers are provided).\n\n* **Security:** CentiJSON is intended to be usable even in situations where\n  your application reads JSON from an untrusted source. That includes:\n\n  * **Thorough testing and high code quality:** Having a lot of tests and\n    maintaining their high code coverage, static code analysis and fuzz testing\n    are all tools used commonly during the development and maintenance of\n    CentiJSON.\n\n  * **DoS mitigation:** The API allows good flexibility in imposing limits on\n    the parsed input, including but not limited to, total length of the input,\n    count of all the data records, maximal length of object keys or string\n    values, maximal level of array/object nesting etc. This provides high degree\n    of flexibility how to define policies for mitigation of Denial-of-Service\n    attacks.\n\n* **Modularity:** Do you need just SAX-like parser? Take just that. Do you\n  need full DOM parser and AST representation? Take it all, it's still just\n  few reasonably-sized C files (and corresponding headers).\n\n  * **SAX-like parser:** Take just `json.h` + `json.c` and you have complete\n    SAX-like parser. It's smart enough to verify JSON correctness, validate\n    UTF-8, resolve escape  sequences and all the boring stuff. You can easily\n    build DOM structure which fits your special needs or process the data on\n    the fly.\n\n  * **Full DOM parser:** `json-dom.h` + `json-dom.c` implements such a DOM\n    builder on top of the SAX-like parser which populates the data storage\n    implemented in `value.h` + `value.c`.\n\n  * **Data storage:** The data storage module, `value.h` + `value.c` from\n    [C Reusables](http://github.com/mity/c-reusables) is very versatile and\n    it is not bound to the JSON parser implementation in any way, so you can\n    reuse it for other purposes.\n\n  * **JSON pointer:** JSON pointer module, `json-ptr.h` + `json-ptr.c`, which\n    allows to query the data storage (`value.h` + `value.c`) as specified by\n    [RFC-6901].\n\n* **Streaming:** Ability to feed the parser with JSON input block by block\n  (the blocks can be of an arbitrary size).\n\n* **Serialization:**\n\n  * **Low-level serialization:** `json.h` provides functions for outputting the\n    non-trivial stuff like strings or numbers from C numeric types.\n\n  * **High-level:** `json-dom.h` provides function `json_dom_dump()` which is\n    capable to serialize whole DOM hierarchy.\n\n\n## Performance\n\nTo be honest, we more focus on correctness and guaranteeing reasonable parsing\ntimes for crazy input (the worst case) rather than for a simple uncomplicated\ninput. We should therefore be usable even for application reading the JSON from\nan untrusted sources.\n\nThat for example means the objects in the DOM hierarchy are implemented as a\nred-black tree and we can provide reasonable member lookup times (`log(n)`) no\nmatter how heavily populated the objects are.\n\nOf course, building the RB-trees takes some CPU time and this may show in some\nbenchmarks, especially if they measure just the parsing and never perform any\nlookup in heavily populated objects.\n\nAlso the support for the parsing block by block, in the streaming fashion,\nmeans we cannot have as tight loops as some parsers which do not support this,\nand this gives us a smaller space for some optimizations.\n\nBut even so, some preliminary tests we have done so far seem to indicate that\nwe are quite competitive.\n\n(We will likely publish some real data on this in some foreseeable future.)\n\n\n## Why Yet Another JSON Parser?\n\nIndeed, there are already hundreds (if not thousands) JSON parsers written in\nC out there. But as far as I know, they mostly fall into one of two categories.\n\nThe parsers in the 1st category are very small and simple (and quite often they\nalso take pride in it). They then usually have one or more shortcomings from\nthe following list:\n\n* They usually expect full in-memory document to parse and do not allow parsing\n  block by block;\n\n* They usually allow no or very minimal configuration;\n\n* They in almost all cases use an array or linked list for storing the children\n  of JSON arrays as well as of JSON objects (and sometimes even for **all** the\n  data in the JSON document), so searching the object by the key is operation\n  of linear complexity.\n\n  (That may be good enough if you really **know** that all the input will be\n  always small. But allow any black hat feed it with some bigger beast and you\n  have Denial of Service faster than you can spell it.)\n\n* They often lack any possibility of modifying the tree of the data, like e.g.\n  adding a new item into an array or an object, or removing an item from there.\n\n* They often perform minimal or no UTF-8 encoding validation, do not perform\n  full escape sequence resolution, or fall into troubles if any string contains\n  U+0000 (`\"foo\\u0000bar\"`).\n\nThe parsers in the 2nd category are far less numerous. They are usually very\nhuge beasts which provide many scores of functions, complicated abstraction\nlayers and/or baroque interfaces, and they are simply too big and complicated\nfor my taste or needs or will to incorporate them in my projects.\n\nCentiJSON aims to reside somewhere in the no man's land, between the two\ncategories.\n\n\n## Using CentiJSON\n\n### SAX-like Parser\n\n(Disclaimer: If you don't know what \"SAX-like parser\" means, you likely want\nto see the section below about the DOM parser and ignore this section.)\n\nIf you want to use just the SAX-like parser, follow these steps:\n\n1. Incorporate `src/json.h` and `src/json.c` into your project.\n\n2. Use `#include \"json.h\"` in all relevant sources of your projects where\n   you deal with JSON parsing.\n\n3. Implement callback function, which is called anytime a scalar value (`null`,\n   `false`, `true`, number or string) are encountered; or whenever a begin\n   or end of a container (array or object) are encountered.\n\n   To help with the implementation of the callback, you may call some utility\n   functions to e.g. analyze a number found in the JSON input or to convert\n   it to particular C types (see functions like e.g. `json_number_to_int32()`).\n\n4. To parse a JSON input part by part (e.g. if you read the input by some\n   blocks from a file), use `json_init()` + `json_feed()` + `json_fini()`.\n   Or alternatively, if you have whole input in a single buffer, you may use\n   `json_parse()` which wraps the three functions.\n\nNote that CentiJSON fully verifies correctness of the input. But it is done on\nthe fly. Hence, if you feed the parser with broken JSON file, your callback\nfunction can see e.g. a beginning of an array but not its end, if in the mean\ntime the parser aborts due to an error.\n\nHence, if the parsing as a whole fails (`json_fini()` or `json_parse()` returns\nnon-zero), you may still likely need to release any resources you allocated so\nfar as the callback has been called through out the process; and the code\ndealing with that has to be ready the parsing is aborted at any point between\nthe calls of the callback.\n\nSee comments in `src/json.h` for more details about the API.\n\n### DOM Parser\n\nTo use just the DOM parser, follow these steps:\n\n1. Incorporate the sources `json.h`, `json.c`, `json-dom.h`, `json-dom.c`,\n   `value.h` and `value.c` in the `src` directory into your project.\n\n2. Use `#include \"json-dom.h\"` in all relevant sources of your projects where\n   you deal with JSON parsing, and `#include \"value.h\"` in all sources where\n   you query the parsed data.\n\n3. To parse a JSON input part by part (e.g. if you read the input by some\n   blocks from a file), use `json_dom_init()` + `json_dom_feed()` +\n   `json_dom_fini()`. Or alternatively, if you have whole input in a single\n   buffer, you may use `json_dom_parse()` which wraps the three functions.\n\n4. If the parsing succeeds, the result (document object model or DOM) forms\n   tree hierarchy of `VALUE` structures. Use all the power of the API in\n   `value.h` to query (or modify) the data stored in it.\n\nSee comments in `src/json-dom.h` and `src/value.h` for more details about the\nAPI.\n\n### JSON Pointer\n\nThe JSON pointer module is an optional module on top of the DOM parser. To use\nit, follow the instructions for the DOM parser, and add also the sources\n`json-ptr.h` and `json-ptr.c` into your project.\n\n### Outputting JSON\n\nIf you also need to output JSON, you may use low-level helper utilities\nin `src/json.h` which are capable to output JSON numbers from C numeric types,\nor JSON strings from C strings, handling all the hard stuff of the JSON syntax\nlike escaping of problematic characters. Writing the simple stuff like array or\nobject brackets, value delimiters etc. is kept on the application's shoulders.\n\nOr, if you have DOM model represented by the `VALUE` structure hierarchy (as\nprovided by the DOM parser or crafted manually), call just `json_dom_dump()`.\nThis function, provided in `src/json-dom.h`, dumps the complete data hierarchy\ninto a JSON stream. It supports few options for formatting the output in the\ndesired way: E.g. it can indent the output to reflect the nesting of objects\nand arrays; and it can also minimize the output by skipping any non-meaningful\nwhitespace altogether.\n\nIn either cases, you have to implement a writer callback, which is capable to\nsimply write down some sequence of bytes. This way, the application may save\nthe output into a file; send it over a network or whatever it wishes to do\nwith the stream.\n\n\n## FAQ\n\n**Q: Why `value.h` does not provide any API for objects?**\n\n**A:** That module is not designed to be JSON-specific. The term \"object\", as\nused in JSON context, is somewhat misleading outside of the context. Therefore\n`value.h` instead uses more descriptive term \"dictionary\".\n\nThe following table shows how are JSON types translated to their counterparts\nin `value.h`:\n\n| JSON type | `json.h` type                       | `value.h` type        |\n|-----------|-------------------------------------|-----------------------|\n| null      | `JSON_NULL`                         | `VALUE_NULL`          |\n| false     | `JSON_FALSE`                        | `VALUE_BOOL`          |\n| true      | `JSON_TRUE`                         | `VALUE_BOOL`          |\n| number    | `JSON_NUMBER`                       | see the next question |\n| string    | `JSON_STRING`                       | `VALUE_STRING`        |\n| array     | `JSON_ARRAY_BEG`+`JSON_ARRAY_END`   | `VALUE_ARRAY`         |\n| object    | `JSON_OBJECT_BEG`+`JSON_OBJECT_END` | `VALUE_DICT`          |\n\n**Q: How does CentiJSON deal with numbers?**\n\n**A:** It's true that the untyped notion of the number type, as specified by\nJSON standards, is a little bit complicated to deal with for languages like C.\n\nOn the SAX-like parser level, the syntax of numbers is verified accordingly\nto the JSON standards and provided to the callback as a verbatim string.\n\nThe provided DOM builder (`json-dom.h`) tries to guess the most appropriate C\ntype how to store the number to mitigate any data loss by applying the rules\n(the first applicable rule is used):\n1. If there is no fraction and no exponent part and the integer fits into\n   `VALUE_INT32`, then it shall be `VALUE_INT32`.\n2. If there is no minus sign, no fraction or exponent part and the integer fits\n   into `VALUE_UINT32`, then it shall be `VALUE_UINT32`.\n3. If there is no fraction and no exponent part and the integer fits into\n   `VALUE_INT64`, then it shall be `VALUE_INT64`.\n4. If there is no minus sign, no fraction or exponent part and the integer fits\n   into `VALUE_UINT64`, then it shall be `VALUE_UINT64`.\n5. In all other cases, it shall be `VALUE_DOUBLE`.\n\nThat said, note that whatever numeric type is actually used for storing the\nvalue, the getter functions of all those numeric values are capable to convert\nthe value into another C numeric types.\n\nFor example, you may use getter function `value_get_int32()` not only for values\nof the type `VALUE_INT32`, but also for the other numeric values, e.g.\n`VALUE_INT64` or `VALUE_DOUBLE`.\n\nNaturally, the conversion may exhibit similar limitations as C casting,\nincluding data loss (e.g. in the overflow situation) or rounding errors (e.g.\nin double to integer conversion).\n\nSee the comments in the header `value.h` for more details.\n\n**Q: Are there any hard-coded limits?**\n\n**A:** No. There are only soft limits, configurable in run time by the\napplication and intended to be used as mitigation against Denial-of-Service\nattacks.\n\nApplication can instruct the parser to use no limits by appropriate setup of\nthe structure `JSON_CONFIG` passed to `json_init()`. The only limitations are\nthen imposed by properties of your machine and OS.\n\n**Q: Is CentiJSON thread-safe?**\n\n**A:** Yes. You may parse as many documents in parallel as you like or your\nmachine is capable of. There is no global state and no need to synchronize\nas long as each thread uses different parser instance.\n\n(Of course, do not try parallelizing parsing of a single document. That makes\nno sense, given the nature of JSON format.)\n\n**Q: CentiJSON? Why such a horrible name?**\n\n**A:** First, because I am poor in naming things. Second, because CentiJSON is\nbigger than all those picojsons, nanojsons or microjsons; yet it's still quite\nsmall, as the prefix suggests. Third, because it begins with the letter 'C',\nand that refers to the C language. Forth, because the name reminds centipedes\nand centipedes belong to Arthropods. The characteristic feature of this group\nis their segmented body; similarly I see the modularity of CentiJSON as an\nimportant competitive advantage in its niche. And last but not least, because\nit seems it's not yet used for any other JSON implementation.\n\n\n## License\n\nCentiJSON is covered with MIT license, see the file `LICENSE.md`.\n\n\n## Reporting Bugs\n\nIf you encounter any bug, please be so kind and report it. Unheard bugs cannot\nget fixed. You can submit bug reports here:\n\n* http://github.com/mity/centijson/issues\n\n\n\n[ECMA-404]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf\n[RFC-8259]: https://tools.ietf.org/html/rfc8259\n[RFC-6901]: https://tools.ietf.org/html/rfc6901\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmity%2Fcentijson","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmity%2Fcentijson","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmity%2Fcentijson/lists"}