{"id":13451128,"url":"https://github.com/grisha/json2avro","last_synced_at":"2025-03-23T18:31:52.182Z","repository":{"id":8903581,"uuid":"10626628","full_name":"grisha/json2avro","owner":"grisha","description":"Fast JSON to Avro converter","archived":false,"fork":false,"pushed_at":"2018-10-17T16:59:12.000Z","size":629,"stargazers_count":60,"open_issues_count":5,"forks_count":29,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-10-28T18:14:41.199Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grisha.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-06-11T17:19:21.000Z","updated_at":"2024-10-24T05:35:44.000Z","dependencies_parsed_at":"2022-09-14T10:11:56.097Z","dependency_job_id":null,"html_url":"https://github.com/grisha/json2avro","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grisha%2Fjson2avro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grisha%2Fjson2avro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grisha%2Fjson2avro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grisha%2Fjson2avro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grisha","download_url":"https://codeload.github.com/grisha/json2avro/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245149397,"owners_count":20568899,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T07:00:48.595Z","updated_at":"2025-03-23T18:31:47.170Z","avatar_url":"https://github.com/grisha.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"\nA utility for converting JSON files to Avro. It is written entirely in\nC and is quite fast. Supports Snappy, Deflate (zlib) and LZMA\ncompression codecs, as well as custom Avro block size.\n\nThe purpose is to be useful in converting messy legacy JSON in which\nsome elements might be missing or of wrong type, which is not\ncurrently possible with the standard avro-tools fromjson option.\n\nSince in a conversion from JSON *schema resolution* is technically not\napplicable (because JSON is not Avro), json2avro mimics schema\nresolution behavior by attemptin to use the defaults specified in the\nschema if the corresponding JSON element is missing as well as\nattempting to resolve unions by trying each type until one succeeds.\n\nIt uses the Jansson JSON parser and Avro-C for Avro encoding. Both\ntools are written in C and are extremely fast.\n\nThe Jansson parser is used with the JSON_DISABLE_EOF_CHECK, which\nmeans that the input does not have to be an object per-line, but is\nfree-format. So long as the input represents a sequence of JSON\nobjects (an object is enclosed in [] or {}), json2avro should be able\nto parse it. Note that Jansson does not allow null characters (\\u0000)\nas part of JSON strings, not even in embedded form. To work around\nthis, json2avro will replace all nulls in strings (escaped or\nnot) with a character '0'. (Yes, this is a total hack).\n\nIf json2avro encounters an error, it skips to the nearest end-of-line\nand starts parsing afresh. (This behavior can be turned off with the\n-x option).\n\n## Usage\n\n```sh\n$ ./json2avro -h\nUsage: ./json2avro [options] [input_file.json] \u003coutput_file.avro\u003e\n\nWhere options are:\n -s schema (required) Avro schema to use for conversion.\n -S file   (required) JSON file to read the avro schema from.\n -c algo   (optional) Set output compression algorithm: null, snappy, deflate, lzma\n                      Default: no compression\n -b bytes  (optional) Set output block size in bytes. Default: 16384\n -d        (optional) Turn on debug mode.\n -j        (optional) Dump unexpected JSON objects as strings.\n -x        (optional) Abort on JSON parsing errors. Default: skip invalid json.\n -z bytes  (optional) Maximum JSON string size. Default: no limit.\n -m        (optional) Linux only, enable periodic memory stats information output.\n -h                   Show this help and exit.\n\nIf infile.json is not specified, STDIN is assumed. outfile.avro of '-' means STDOUT.\n```\n\n## Example\n\nIf we have the following JSON called `input.json`:\n\n```javascript\n\n{\"a_null\":null, \"a_bool\":true, \"an_int\":12345, \"a_long\":9876543210,\n\"a_float\":1.234567, \"a_double\":12345678.1234567, \"a_string\":\"foo bar\",\n\"random_bytes\":\"\\nV@H#3\\u001ad\\u001a\\u0006G\\u0006K\\u0007\",\n\"a_fixed\":\"abcd\", \"an_int_array\":[123, 456, -32, 0, 12],\n\"a_float_map\":{\"foo\":2.345, \"bar\":-3.456}}    {\"a_null\":null,\n\"a_bool\":false, \"an_int\":54321, \"a_long\":9876543212,\n\"a_float\":7.654321, \"a_double\":8.76543217654321E7,\n\"a_string\":\"foo bar\",\n\"random_bytes\":\"\\u0006K\\u0007\\nV@H#3\\u001ad\\u001a\\u0006\",\n\"a_fixed\":\"dcba\", \"an_int_array\":[321, 654, -23, 0, 21],\n\"a_float_map\":{\"foo\":5.324, \"bar\":-6.543}, \"null_default\":\"blah\"}\n\n```\n\nIt can be converted to Avro with the following command:\n\n```\n./json2avro input.json output.avro -s \\\n'{\"type\":\"record\",\"name\":\"testrec\",\"fields\":[\n {\"name\":\"a_null\",\"type\":\"null\"},\n {\"name\":\"a_bool\",\"type\":\"boolean\"},\n {\"name\":\"an_int\",\"type\":\"int\"},\n {\"name\":\"a_long\",\"type\":\"long\"},\n {\"name\":\"a_float\",\"type\":\"float\"},\n {\"name\":\"a_double\",\"type\":\"double\"},\n {\"name\":\"a_string\",\"type\":\"string\"},\n {\"name\":\"random_bytes\",\"type\":\"bytes\"},\n {\"name\":\"a_fixed\",\"type\":{\"type\":\"fixed\",\"size\":3,\"name\":\"four\"}},\n {\"name\":\"an_int_array\",\"type\":{\"type\":\"array\",\"items\":\"int\"}},\n {\"name\":\"a_float_map\",\"type\":{\"type\":\"map\",\"values\":\"float\"}},\n {\"name\":\"null_default\",\"type\":[\"null\",\"string\"],\"default\":\"null\"}]}'\n```\n\nYou can verify that it worked correctly by using avro-tools (distributed with Avro Java), for example:\n\n```\njava -jar ~/src/avro/java/avro-tools-1.7.4.jar tojson output.avro\n{\"a_null\":null,\"a_bool\":true,\"an_int\":12345,\"a_long\":9876543210,\"a_float\":1.234567,\"a_double\":1.23456781234567E7,\"a_string\":\"foo bar\",\"random_bytes\":\"\\nV@H#3\\u001Ad\\u001A\\u0006G\\u0006K\\u0007\",\"a_fixed\":\"\\u0000\\u0000\\u0000\",\"an_int_array\":[123,456,-32,0,12],\"a_float_map\":{\"bar\":-3.456,\"foo\":2.345},\"null_default\":{\"string\":\"null\"}}\n{\"a_null\":null,\"a_bool\":false,\"an_int\":54321,\"a_long\":9876543212,\"a_float\":7.654321,\"a_double\":8.76543217654321E7,\"a_string\":\"foo bar\",\"random_bytes\":\"\\u0006K\\u0007\\nV@H#3\\u001Ad\\u001A\\u0006\",\"a_fixed\":\"\\u0000\\u0000\\u0000\",\"an_int_array\":[321,654,-23,0,21],\"a_float_map\":{\"bar\":-6.543,\"foo\":5.324},\"null_default\":{\"string\":\"blah\"}}\n```\n\nThe -j options tells json2avro to dump remaining JSON as a string\nwhere the Avro schema expects a string but JSON contains other\ntypes. This is useful when you have objects of arbitrary schema and\nyou would like to store them as strings. For example, given the\nfollowing JSON:\n\n```\n{\"foo\":\"some value\", \"bar\":{\"some\":[\"more\", 3, {\"complex\":\"json\"}], \"which\":\"we don't care to parse\"}}\n```\n\nYou can convert it to Avro as such:\n\n```sh\n./json2avro input2.json output2.avro -j -s \\\n'{\"type\":\"record\", \"name\":\"strjson\", \"fields\":[\n {\"name\":\"foo\", \"type\":\"string\"},\n {\"name\":\"bar\", \"type\":\"string\"}]}'\n```\n\nThis will save the value of \"bar\" as a JSON-encoded string:\n\n```\njava -jar ~/src/avro/java/avro-tools-1.7.4.jar tojson output2.avro\n{\"foo\":\"some value\",\"bar\":\"{\\\"some\\\":[\\\"more\\\",3,{\\\"complex\\\":\\\"json\\\"}],\\\"which\\\":\\\"we don't care to parse\\\"}\"}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrisha%2Fjson2avro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrisha%2Fjson2avro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrisha%2Fjson2avro/lists"}