{"id":13792518,"url":"https://github.com/tarantool/avro-schema","last_synced_at":"2025-04-14T15:11:09.228Z","repository":{"id":45574244,"uuid":"52920160","full_name":"tarantool/avro-schema","owner":"tarantool","description":"Apache Avro schema tools for Tarantool","archived":false,"fork":false,"pushed_at":"2024-03-14T13:02:44.000Z","size":878,"stargazers_count":57,"open_issues_count":32,"forks_count":4,"subscribers_count":39,"default_branch":"master","last_synced_at":"2025-03-28T04:03:37.306Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tarantool.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2016-03-02T00:14:45.000Z","updated_at":"2023-09-19T09:26:27.000Z","dependencies_parsed_at":"2024-03-14T14:28:30.088Z","dependency_job_id":"ee92e0d2-54ad-461a-a1b3-5d694796a24b","html_url":"https://github.com/tarantool/avro-schema","commit_stats":null,"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarantool%2Favro-schema","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarantool%2Favro-schema/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarantool%2Favro-schema/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarantool%2Favro-schema/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tarantool","download_url":"https://codeload.github.com/tarantool/avro-schema/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248904640,"owners_count":21180835,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T22:01:13.168Z","updated_at":"2025-04-14T15:11:09.205Z","avatar_url":"https://github.com/tarantool.png","language":"Lua","funding_links":[],"categories":["Packages"],"sub_categories":["Database"],"readme":"\u003ca href=\"http://tarantool.org\"\u003e\n  \u003cimg src=\"https://avatars2.githubusercontent.com/u/2344919?v=2\u0026s=250\" align=\"right\"\u003e\n\u003c/a\u003e\n\n[![Build Status](https://travis-ci.org/tarantool/avro-schema.svg?branch=master)](https://travis-ci.org/tarantool/avro-schema)\n\n# Apache Avro schema tools\n\n[Apache Avro](http://avro.apache.org/docs/1.8.0/spec.html) schema tools for Tarantool,\nimplemented from scratch in Lua.\n\nNotable features:\n\n * Avro defaults;\n * Avro aliases;\n * data transformations are fast due to runtime code generation;\n * extensions such as built-in nullable types.\n\n```lua\navro_schema = require('avro_schema')\n```\n\n## Table of contents\n\n  - [Installation](#installation)\n  - [Creating a schema](#creating-a-schema)\n  - [Validating and normalizing data with a schema](#validating-and-normalizing-data-with-a-schema)\n  - [Checking if schemas are compatible](#checking-if-schemas-are-compatible)\n  - [Checking if an object is a schema object](#checking-if-an-object-is-a-schema-object)\n  - [Querying a schema's field names or field types](#querying-a-schemas-field-names-or-field-types)\n  - [Compiling schemas](#compiling-schemas)\n    - [Compile options](#compile-options)\n  - [Generated routines](#generated-routines)\n  - [References](#references)\n    - [Related discussions](#related-discussions)\n  - [Nullability (extension)](#nullability-extension)\n  - [Default values](#default-values)\n\n## Installation\n\nTo install the module use\n```lua\ntarantoolctl rocks install avro-schema\n```\n\n## Creating a schema\n\n```lua\nok, schema = avro_schema.create {\n    type = \"record\",\n    name = \"Frob\",\n    fields = {\n      { name = \"foo\", type = \"int\", default = 42 },\n      { name = \"bar\", type = \"string\" }\n    }\n  }\n```\n\nCreates a schema object (`ok == true`). If there was a syntax error, returns `false` and the\nerror message.\n\n## Validating and normalizing data with a schema\n\n```lua\nok, normalized_data_copy = avro_schema.validate(schema, { bar = \"Hello, world!\" })\n```\n\nReturns `true` if the data was valid. Otherwise, returns `false` and the error message.\n\nThe `avro_schema.validate()` function creates a normalized copy of the data.\nNormalization implies filling in default values for missing fields.\nFor example, because the \"foo\" field has a default value = 42,\nthe result from the above example will be `{ foo = 42, bar = \"Hello, world!\" }`.\n\n## Checking if schemas are compatible\n\nTo facilitate data evolution Avro defines certain schema mapping rules.\nIf schemas `A` and `B` are compatible, then one can convert data from `A` to `B`.\n\n```lua\nok = avro_schema.are_compatible(schema1, schema2)\nok = avro_schema.are_compatible(schema2, schema1, \"downgrade\")\n```\n\nAllowed modifications include:\n\n  1. renaming types and record fields (provided that `aliases` are correctly set);\n  2. extending records with new fields (these fields are initialized with default values, which are mandatory);\n  3. removing fields (contents are simply removed during conversion);\n  4. modifying unions and enums (provided that type definitions retain some similarity);\n  5. type promotions are allowed (e.g. `int` is compatible with `long` but not vice versa).\n\nLet's assume:\n\n* `B` is newer than `A`.\n* `A` defines `Apple` (a record type).\n* `B` renames it to `Banana`.\n\nUpgrading data from `A` to `B` works, since `Banana` is marked as an alias of `Apple`.\nHowever, downgrading data from `B` to `A` does not work, since in `A` the record type\n`Apple` has no aliases.\n\nTo make it work we implement `downgrade` mode.\nIn downgrade mode, name mapping rules take into account the aliases in the source schema,\nand ignore the aliases in the target schema.\n\n## Checking if an object is a schema object\n\n```lua\navro_schema.is(object)\n```\n\n## Querying a schema's field names or field types\n\n```lua\navro_schema.get_names(schema [, service-fields])\n```\n\n```lua\navro_schema.get_types(schema [, service-fields])\n```\n\nThe first argument must be a schema object, such as the one created in the\n[Creating a schema](#creating-a-schema) example above.\n\nThe optional second argument is a table with names of types, such as `{'string', 'int'}`.\n\nThe result will be a Lua table of field names (for the `get_names` method)\nor a Lua table of field types (for the `get_types` method).\n\nThe order will match the field order in the flat representation.\n\n## Compiling schemas\n\nCompiling a schema creates optimized data conversion routines (runtime code generation).\n\n```lua\nok, methods = avro_schema.compile(schema)\nok, methods = avro_schema.compile({schema1, schema2})\n```\n\nIf two schemas are provided, then the generated routines consume data in `schema1` and\nproduce results in `schema2`.\n\nWhat if the `schema1` source and the `schema2` destination are not adjacent revisions,\ni.e. there were some revisions in between?\nWhile going from source to destination directly is fast, sometimes it alters the results.\nPerforming conversion step by step, using all the in-between revisions, always yields\ncorrect results but it is slow.\n\nThere is a third option: let `compile` generate routines that are fast yet produce the\ncorrect results.\n\n### Compile options\n\nA few options affecting compilation are recognized.\n\nEnabling `downgrade` mode (see [avro_schema.are_compatible](#checking-if-schemas-are-compatible)\nfor details):\n```lua\nok, methods = avro_schema.compile({schema1, schema2, downgrade = true})\n```\n\nDumping generated code for inspection:\n```lua\nok, methods = avro_schema.compile({schema1, schema2, dump_src = \"output.lua\"})\n```\n\nTroubleshooting code generation issues:\n```lua\nok, methods = avro_schema.compile({schema1, schema2, debug = true, dump_il = \"output.il\"})\n```\n\nAdd service fields (which are part of a tuple, but are not part of an object):\n```lua\nok, methods = avro_schema.compile({schema, service_fields = {'string', 'int'}})\n```\n\n## Generated routines\n\n`Compile` produces the following routines (returned in a Lua table):\n  * `flatten`\n  * `unflatten`\n  * `xflatten`\n  * `flatten_msgpack`\n  * `unflatten_msgpack`\n  * `xflatten_msgpack`\n  * `get_types`\n  * `get_names`\n\nHere is an example which uses the avro schema that we described in\nthe section [Creating a schema](#creating-a-schema), a Tarantool database space,\nand the methods that `compile` produces. This is a script that you\ncan paste into a client of a Tarantool server; the comments explain\nwhat the results look like and what they mean.\n\n```lua\n-- Create a Tarantool database, an index, and a tuple\nbox.schema.space.create('T')\nbox.space.T:create_index('I')\nbox.space.T:insert{1, 'string-value'}\n-- Let tuple_1 = a tuple from the database space\ntuple_1 = box.space.T:get(1)\n-- Load the module\navro_schema = require('avro_schema')\n-- Load avro_schema and create a schema as described earlier\nok, schema = avro_schema.create {\n    type = \"record\",\n    name = \"Frob\",\n    fields = {\n      { name = \"foo\", type = \"int\", default = 42 },\n      { name = \"bar\", type = \"string\" }\n    }\n  }\n-- Compile, so that \"methods\" will have the generated routines\nok, methods = avro_schema.compile(schema)\n-- Invoke unflatten(). The result will look like this:\n-- - {'foo': 1, 'bar': 'string-value'}\n-- That is: unflattening can turn tuples into avro-schema objects.\nok, result = methods.unflatten(tuple_1)\nresult\n-- Make a new Lua table with an integer and a string component\n-- table_1 = {42, 'string-value-2'}\n-- Invoke flatten(). The result can be inserted into the database.\n-- The value of the newly inserted tuple will look like this:\n-- - [1, 'string-value']\n-- That is, flattening can turn avro-schema objects into tuples.\nok, tuple_2 = methods.flatten(result)\nbox.space.T:truncate()\nbox.space.T:insert(tuple_2)\n-- Make an avro_schema object with {foo=2, bar='Hello, World!'}\nok, normalized_data_copy = avro_schema.validate(schema, { bar = \"Hello, world!\" })\n-- Invoke xflatten(). The result will look like this:\n-- - [['=', 1, 42], ['=', 2, 'Hello, world!']]\nok, result = methods.xflatten(normalized_data_copy)\nresult\n-- That is, the format of an xflatten() result is exactly\n-- what a Tarantool \"update\" request looks like.\n-- Therefore let's put it in an update request ...\nbox.space.T:update({42},result)\n-- And the result looks like:\n-- -- - [1, 'Hello, world!']\n```\n\nSo: with `flatten()` for inserting, `xflatten()` for updating,\n`unflatten()`\nfor getting, we have ways to use `avro_schema` objects as tuples in\nTarantool databases.\n\nWith the other three methods that work with transformations of\n`avro_schema` objects -- `flatten_msgpack()` and `xflatten_msgpack()` and\n`unflatten_msgpack()` -- we have similar functionality,\nexcept that the transformations are to and from MsgPack objects.\n(The `..._msgpack()` methods are usually faster because\nthey do not need to encode or decode internally.)\n\nThe final two methods -- `get_types()` and `get_names()` -- have almost the\nsame effect as `get_types()` and `get_names()` described in the earlier section \n[Querying a schema's field names or field types](#querying-a-schemas-field-names-or-field-types).\n(The main difference is that the optional \"service_fields\" argument\nis unnecessary if `methods` is the result of a compile done with\nthe `service_fields =` option.) For example:\n\n```lua\ntarantool\u003e methods.get_names()\n---\n- - foo\n  - bar\n...\ntarantool\u003e methods.get_types()\n---\n- - int\n  - string\n...\n```\n\n## References\n\nNamed types are ones that have mandatory `name` fields in their definitions:\nrecord, fixed, enum.\n\nNamed types can be referenced after the first definition (in depth-first,\nleft-to-right traversal).\n\nExample:\n\n```\n{\n    name = 'user',\n    type = 'record',\n    fields = {\n        {name = 'uid', type = 'long'},\n        {\n            name = 'nested',\n            type = {\n                type = 'record',\n                name = 'nested_record',\n                fields = {\n                    {name = 'x', type = 'long'},\n                    {name = 'y', type = 'long'}\n                }\n            }\n        },\n        {\n            name = 'another_nested',\n            type = 'nested_record'\n        }\n    }\n}\n```\n\nNotes:\n\n* A reference is a usage of a type (not a value), so the effect is as if you\n  define the same type with an a different name.\n* A field of a record also has a name, but it is not a type, so you cannot\n  reference a field by its name.\n* A record can be referenced from within itself only as part of a union or an\n  array.\n* An array and a map are unnamed and cannot be referenced by a name, consider\n  related discussions below.\n\n### Related discussions\n\n* [[Avro-user] Why Array and Map are not named type ?][1]\n* [AEP 102 - Named Unions][2]\n\n## Nullability (extension)\n\nThe problem: in database management systems NULL is a value, not a type.\nSo it should be possible, for example, to have a \"long integer\" type that\ncan contain both NULL and integers.\n\nOne can try to handle this with a union such as `{'null', 'long'}` which\ncan have both `null` and `{long = 42}`. What really is necessary, though,\nis that a single field, whose name determines the type, can contain both\n`null` and `42` as valid values (see the [JSON Encoding][3]\nsection of the avro-schema standard). This problem -- expressing a single\ntype that accepts both `null` and `42` -- is the problem that the\nnullability extension solves.\n\nA type can be marked as nullable by adding an asterisk (\"\\*\") at the end of the type name:\n\n```lua\n{\n    name = 'user',\n    type = 'record',\n    fields = {\n        {name = 'uid', type = 'long'},\n        {name = 'first_name', type = 'string'},\n        {name = 'middle_name', type = 'string*'},\n        {name = 'last_name', type = 'string'}\n    }\n}\n```\n\nThe following types can be marked as nullable:\n\n* All primitive types: null, boolean, int, long, float, double, bytes, string.\n* All named complex types: record, fixed, enum.\n* Almost all unnamed complex types: array, map (but not union).\n\nNotes:\n\n* A type reference can be non-nullable or nullable (asterisk-marked)\n  independently of the original type definition.\n* Use standard `{'null', ...}` without an asterisk to make a union nullable type.\n* The xflatten method is not designed to work with complex nullable types.\n\n...\n\n## Default values\n\nDefault values are substituted in two cases:\n1. during flattening if the fields are not presented in the data\n2. during unflattening and schema evolution in case the target schema\n   has extra fields with the default values\n\nNotes:\n\n* Only zero-size arrays and maps are supported by now.\n* Default value may be inherited from an inner field with a default\nvalue or overridden. Example:\n```lua\nlocal schema = {\n    type = \"record\", name = \"Frob\", fields = {\n        { name = \"foo\", default = {f1=1, f2={f2_1=2}}, type =\n            { type = \"record\", name = \"default_1\", fields = {\n                {name = \"f1\", type = \"int\"},\n                {name = \"f2\", default = {f2_1=21}, type =\n                    {type = \"record\", name = \"default_2\", fields = {\n                        {name = \"f2_1\", type = \"int\"}}\n                    }}\n            }}},\n        { name = \"bar\", type = \"int\"}\n    }\n}\nok, handle = avro_schema.create(schema)\nok, methods = avro_schema.compile(handle)\nok, unflattened = methods.flatten({bar=11})\n-- returns {1,2,11}\nok, unflattened = methods.flatten({foo={f1=3},bar=11})\n-- returns {3,21,11}\n```\n\n[1]: http://grokbase.com/t/avro/user/108svyaz63/why-array-and-map-are-not-named-type\n[2]: https://cwiki.apache.org/confluence/display/AVRO/AEP+102+-+Named+Unions\n[3]: http://avro.apache.org/docs/1.8.2/spec.html#json_encoding\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftarantool%2Favro-schema","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftarantool%2Favro-schema","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftarantool%2Favro-schema/lists"}