{"id":13631710,"url":"https://github.com/dgraham/json-stream","last_synced_at":"2025-04-12T18:39:37.481Z","repository":{"id":998660,"uuid":"810009","full_name":"dgraham/json-stream","owner":"dgraham","description":"A streaming JSON parser that generates SAX-like events.","archived":false,"fork":false,"pushed_at":"2024-04-22T01:57:32.000Z","size":114,"stargazers_count":195,"open_issues_count":1,"forks_count":29,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-03T20:11:30.922Z","etag":null,"topics":["json"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dgraham.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2010-08-01T01:07:47.000Z","updated_at":"2025-03-11T23:16:26.000Z","dependencies_parsed_at":"2024-06-18T15:31:34.770Z","dependency_job_id":null,"html_url":"https://github.com/dgraham/json-stream","commit_stats":{"total_commits":128,"total_committers":3,"mean_commits":"42.666666666666664","dds":0.0859375,"last_synced_commit":"6f3557ccd7344718bc6b5fd8150a80d52f8a481e"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgraham%2Fjson-stream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgraham%2Fjson-stream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgraham%2Fjson-stream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgraham%2Fjson-stream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dgraham","download_url":"https://codeload.github.com/dgraham/json-stream/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248616251,"owners_count":21134038,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json"],"created_at":"2024-08-01T22:02:35.333Z","updated_at":"2025-04-12T18:39:37.459Z","avatar_url":"https://github.com/dgraham.png","language":"Ruby","funding_links":[],"categories":["Ruby","Gems"],"sub_categories":["Serialization"],"readme":"# JSON::Stream\n\nJSON::Stream is a JSON parser, based on a finite state machine, that generates\nevents for each state change. This allows streaming both the JSON document into\nmemory and the parsed object graph out of memory to some other process.\n\nThis is much like an XML SAX parser that generates events during parsing. There\nis no requirement for the document, or the object graph, to be fully buffered in\nmemory. This is best suited for huge JSON documents that won't fit in memory.\nFor example, streaming and processing large map/reduce views from Apache\nCouchDB.\n\n## Usage\n\nThe simplest way to parse is to read the full JSON document into memory\nand then parse it into a full object graph. This is fine for small documents\nbecause we have room for both the document and parsed object in memory.\n\n```ruby\nrequire 'json/stream'\njson = File.read('/tmp/test.json')\nobj = JSON::Stream::Parser.parse(json)\n```\n\nWhile it's possible to do this with JSON::Stream, we really want to use the json\ngem for documents like this. JSON.parse() is much faster than this parser,\nbecause it can rely on having the entire document in memory to analyze.\n\nFor larger documents we can use an IO object to stream it into the parser.\nWe still need room for the parsed object, but the document itself is never\nfully read into memory.\n\n```ruby\nrequire 'json/stream'\nstream = File.open('/tmp/test.json')\nobj = JSON::Stream::Parser.parse(stream)\n```\n\nAgain, while JSON::Stream can be used this way, if we just need to stream the\ndocument from disk or the network, we're better off using the yajl-ruby gem.\n\nHuge documents arriving over the network in small chunks to an EventMachine\n`receive_data` loop is where JSON::Stream is really useful. Inside an\nEventMachine::Connection subclass we might have:\n\n```ruby\ndef post_init\n  @parser = JSON::Stream::Parser.new do\n    start_document { puts \"start document\" }\n    end_document   { puts \"end document\" }\n    start_object   { puts \"start object\" }\n    end_object     { puts \"end object\" }\n    start_array    { puts \"start array\" }\n    end_array      { puts \"end array\" }\n    key            { |k| puts \"key: #{k}\" }\n    value          { |v| puts \"value: #{v}\" }\n  end\nend\n\ndef receive_data(data)\n  begin\n    @parser \u003c\u003c data\n  rescue JSON::Stream::ParserError =\u003e e\n    close_connection\n  end\nend\n```\n\nThe parser accepts chunks of the JSON document and parses up to the end of the\navailable buffer. Passing in more data resumes the parse from the prior state.\nWhen an interesting state change happens, the parser notifies all registered\ncallback procs of the event.\n\nThe event callback is where we can do interesting data filtering and passing\nto other processes. The above example simply prints state changes, but\nimagine the callbacks looking for an array named `rows` and processing sets\nof these row objects in small batches. Millions of rows, streaming over the\nnetwork, can be processed in constant memory space this way.\n\n## Alternatives\n\n* [json](https://github.com/flori/json)\n* [yajl-ruby](https://github.com/brianmario/yajl-ruby)\n* [yajl-ffi](https://github.com/dgraham/yajl-ffi)\n* [application/json-seq](http://www.rfc-editor.org/rfc/rfc7464.txt)\n\n## Development\n\n```\n$ bin/setup\n$ bin/rake test\n```\n\n## License\n\nJSON::Stream is released under the MIT license. Check the LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgraham%2Fjson-stream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdgraham%2Fjson-stream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgraham%2Fjson-stream/lists"}