{"id":20358825,"url":"https://github.com/creativcoder/avrow","last_synced_at":"2025-04-12T03:22:56.942Z","repository":{"id":49881437,"uuid":"264729043","full_name":"creativcoder/avrow","owner":"creativcoder","description":"Avrow is a pure Rust implementation of the avro specification https://avro.apache.org/docs/current/spec.html with Serde support.","archived":false,"fork":false,"pushed_at":"2022-11-05T06:51:33.000Z","size":407,"stargazers_count":29,"open_issues_count":1,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-09T08:17:17.881Z","etag":null,"topics":["avro","avro-schema","deserialization","encoding-library","rust","rust-lang","schema","serialization"],"latest_commit_sha":null,"homepage":"https://creativcoder.dev/avrow","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/creativcoder.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE-APACHE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"liberapay":"creativcoder","custom":["https://www.buymeacoffee.com/creativcoder"]}},"created_at":"2020-05-17T18:17:03.000Z","updated_at":"2024-11-25T16:27:28.000Z","dependencies_parsed_at":"2023-01-21T13:02:58.136Z","dependency_job_id":null,"html_url":"https://github.com/creativcoder/avrow","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/creativcoder%2Favrow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/creativcoder%2Favrow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/creativcoder%2Favrow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/creativcoder%2Favrow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/creativcoder","download_url":"https://codeload.github.com/creativcoder/avrow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248510728,"owners_count":21116258,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avro","avro-schema","deserialization","encoding-library","rust","rust-lang","schema","serialization"],"created_at":"2024-11-14T23:29:03.203Z","updated_at":"2025-04-12T03:22:56.916Z","avatar_url":"https://github.com/creativcoder.png","language":"Rust","readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg alt=\"avrow\" width=\"250\" src=\"assets/avrow_logo.png\" /\u003e\n\n[![Actions Status](https://github.com/creativcoder/avrow/workflows/ci/badge.svg)](https://github.com/creativcoder/avrow/actions)\n[![crates](https://img.shields.io/crates/v/avrow.svg)](https://crates.io/crates/avrow)\n[![docs.rs](https://docs.rs/avrow/badge.svg)](https://docs.rs/avrow/)\n[![license](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/creativcoder/avrow/blob/master/LICENSE-MIT)\n[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/creativcoder/avrow/blob/master/LICENSE-APACHE)\n[![Contributor Covenant](https://img.shields.io/badge/contributor%20covenant-v1.4%20adopted-ff69b4.svg)](CODE_OF_CONDUCT.md)\n\n  \u003cbr /\u003e\n  \u003cbr /\u003e\n\n  \n### Avrow is a pure Rust implementation of the [Avro specification](https://avro.apache.org/docs/current/spec.html) with [Serde](https://github.com/serde-rs/serde) support.\n  \n\n  \u003cbr /\u003e\n  \u003cbr /\u003e\n\n\u003c/div\u003e\n\n### Table of Contents\n- [Overview](#overview)\n- [Features](#features)\n- [Getting started](#getting-started)\n- [Examples](#examples)\n  - [Writing avro data](#writing-avro-data)\n  - [Reading avro data](#reading-avro-data)\n  - [Writer builder](#writer-customization)\n- [Supported Codecs](#supported-codecs)\n- [Using the avrow-cli tool](#using-avrow-cli-tool)\n- [Benchmarks](#benchmarks)\n- [Todo](#todo)\n- [Changelog](#changelog)\n- [Contributions](#contributions)\n- [Support](#support)\n- [MSRV](#msrv)\n- [License](#license)\n\n## Overview\n\nAvrow is a pure Rust implementation of the [Avro specification](https://avro.apache.org/docs/current/spec.html): a row based data serialization system. The Avro data serialization format finds its use quite a lot in big data streaming systems such as [Kafka](https://kafka.apache.org/) and [Spark](https://spark.apache.org/).\nWithin avro's context, an avro encoded file or byte stream is called a \"data file\".\nTo write data in avro encoded format, one needs a schema which is provided in json format. Here's an example of an avro schema represented in json:\n\n```json\n{\n  \"type\": \"record\",\n  \"name\": \"LongList\",\n  \"aliases\": [\"LinkedLongs\"],\n  \"fields\" : [\n    {\"name\": \"value\", \"type\": \"long\"},\n    {\"name\": \"next\", \"type\": [\"null\", \"LongList\"]}\n  ]\n}\n```\nThe above schema is of type record with fields and represents a linked list of 64-bit integers. In most implementations, this schema is then fed to a `Writer` instance along with a buffer to write encoded data to. One can then call one\nof the `write` methods on the writer to write data. One distinguishing aspect of avro is that the schema for the encoded data is written on the header of the data file. This means that for reading data you don't need to provide a schema to a `Reader` instance. The spec also allows providing a reader schema to filter data when reading.\n\nThe Avro specification provides two kinds of encoding:\n* Binary encoding - Efficent and takes less space on disk.\n* JSON encoding - When you want a readable version of avro encoded data. Also used for debugging purposes.\n\nThis crate implements only the binary encoding as that's the format practically used for performance and storage reasons.\n\n## Features\n\n* Full support for recursive self-referential schemas with Serde serialization/deserialization.\n* All compressions codecs (`deflate`, `bzip2`, `snappy`, `xz`, `zstd`) supported as per spec.\n* Simple and intuitive API - As the underlying structures in use are `Read` and `Write` types, avrow tries to mimic the same APIs as Rust's standard library APIs for minimal learning overhead. Writing avro values is simply calling `write` or `serialize` (with serde) and reading avro values is simply using iterators.\n* Less bloat / Lightweight - Compile times in Rust are costly. Avrow tries to use minimal third-party crates. Compression codec and schema fingerprinting support are feature gated by default. To use them, compile with respective feature flags (e.g. `--features zstd`).\n* Schema evolution - One can configure the avrow `Reader` with a reader schema and only read data relevant to their use case.\n* Schema's in avrow supports querying their canonical form and have fingerprinting (`rabin64`, `sha256`, `md5`) support.\n\n**Note**: This is not a complete spec implemention and remaining features being implemented are listed under [Todo](#todo) section.\n\n## Getting started:\n\nAdd avrow as a dependency to `Cargo.toml`:\n\n```toml\n[dependencies]\navrow = \"0.2.0\"\n```\n\n## Examples:\n\n### Writing avro data\n\n```rust\n\nuse anyhow::Error;\nuse avrow::{Schema, Writer};\nuse std::str::FromStr;\n\nfn main() -\u003e Result\u003c(), Error\u003e {\n    // Create schema from json\n    let schema = Schema::from_str(r##\"{\"type\":\"string\"}\"##)?;\n    // or from a path\n    let schema2 = Schema::from_path(\"./string_schema.avsc\")?;\n    // Create an output stream\n    let stream = Vec::new();\n    // Create a writer\n    let writer = Writer::new(\u0026schema, stream.as_slice())?;\n    // Write your data!\n    let res = writer.write(\"Hey\")?;\n    // or using serialize method for serde derived types.\n    let res = writer.serialize(\"there!\")?;\n\n    Ok(())\n}\n\n```\nFor simple and native Rust types, avrow provides a `From` impl to convert to Avro value types. For compound or user defined types (structs or enums), one can use the `serialize` method which relies on serde. Alternatively, one can construct `avrow::Value` instances which is a more verbose way to write avro values and should be a last resort.\n\n### Reading avro data\n\n```rust\nfn main() -\u003e Result\u003c(), Error\u003e {\n    let schema = Schema::from_str(r##\"\"null\"\"##);\n    let data = vec![\n        79, 98, 106, 1, 4, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101,\n        109, 97, 32, 123, 34, 116, 121, 112, 101, 34, 58, 34, 98, 121, 116,\n        101, 115, 34, 125, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101,\n        99, 14, 100, 101, 102, 108, 97, 116, 101, 0, 145, 85, 112, 15, 87,\n        201, 208, 26, 183, 148, 48, 236, 212, 250, 38, 208, 2, 18, 227, 97,\n        96, 100, 98, 102, 97, 5, 0, 145, 85, 112, 15, 87, 201, 208, 26,\n        183, 148, 48, 236, 212, 250, 38, 208,\n    ];\n    // Create a Reader\n    let reader = Reader::with_schema(v.as_slice(), \u0026schema)?;\n    for i in reader {\n        dbg!(\u0026i);\n    }\n\n    Ok(())\n}\n\n```\n\n### Self-referential recursive schema example\n\n```rust\nuse anyhow::Error;\nuse avrow::{from_value, Codec, Reader, Schema, Writer};\nuse serde::{Deserialize, Serialize};\n\n#[derive(Debug, Serialize, Deserialize)]\nstruct LongList {\n    value: i64,\n    next: Option\u003cBox\u003cLongList\u003e\u003e,\n}\n\nfn main() -\u003e Result\u003c(), Error\u003e {\n    let schema = r##\"\n        {\n            \"type\": \"record\",\n            \"name\": \"LongList\",\n            \"aliases\": [\"LinkedLongs\"],\n            \"fields\" : [\n              {\"name\": \"value\", \"type\": \"long\"},\n              {\"name\": \"next\", \"type\": [\"null\", \"LongList\"]}\n            ]\n          }\n        \"##;\n\n    let schema = Schema::from_str(schema)?;\n    let mut writer = Writer::with_codec(\u0026schema, vec![], Codec::Null)?;\n\n    let value = LongList {\n        value: 1i64,\n        next: Some(Box::new(LongList {\n            value: 2i64,\n            next: Some(Box::new(LongList {\n                value: 3i64,\n                next: Some(Box::new(LongList {\n                    value: 4i64,\n                    next: Some(Box::new(LongList {\n                        value: 5i64,\n                        next: None,\n                    })),\n                })),\n            })),\n        })),\n    };\n\n    writer.serialize(value)?;\n\n    // Calling into_inner performs flush internally. Alternatively, one can call flush explicitly.\n    let buf = writer.into_inner()?;\n\n    // read\n    let reader = Reader::with_schema(buf.as_slice(), \u0026schema)?;\n    for i in reader {\n        let a: LongList = from_value(\u0026i)?;\n        dbg!(a);\n    }\n\n    Ok(())\n}\n\n```\n\n### An example of writing a json object with a confirming schema. The json object maps to the `avrow::Record` type.\n\n```rust\nuse anyhow::Error;\nuse avrow::{from_value, Reader, Record, Schema, Writer};\nuse serde::{Deserialize, Serialize};\nuse std::str::FromStr;\n\n#[derive(Debug, Serialize, Deserialize)]\nstruct Mentees {\n    id: i32,\n    username: String,\n}\n\n#[derive(Debug, Serialize, Deserialize)]\nstruct RustMentors {\n    name: String,\n    github_handle: String,\n    active: bool,\n    mentees: Mentees,\n}\n\nfn main() -\u003e Result\u003c(), Error\u003e {\n    let schema = Schema::from_str(\n        r##\"\n            {\n            \"name\": \"rust_mentors\",\n            \"type\": \"record\",\n            \"fields\": [\n                {\n                \"name\": \"name\",\n                \"type\": \"string\"\n                },\n                {\n                \"name\": \"github_handle\",\n                \"type\": \"string\"\n                },\n                {\n                \"name\": \"active\",\n                \"type\": \"boolean\"\n                },\n                {\n                    \"name\":\"mentees\",\n                    \"type\": {\n                        \"name\":\"mentees\",\n                        \"type\": \"record\",\n                        \"fields\": [\n                            {\"name\":\"id\", \"type\": \"int\"},\n                            {\"name\":\"username\", \"type\": \"string\"}\n                        ]\n                    }\n                }\n            ]\n            }\n\"##,\n    )?;\n\n    let json_data = serde_json::from_str(\n        r##\"\n    { \"name\": \"bob\",\n        \"github_handle\":\"ghbob\",\n        \"active\": true,\n        \"mentees\":{\"id\":1, \"username\":\"alice\"} }\"##,\n    )?;\n    let rec = Record::from_json(json_data, \u0026schema)?;\n    let mut writer = crate::Writer::new(\u0026schema, vec![])?;\n    writer.write(rec)?;\n\n    let avro_data = writer.into_inner()?;\n    let reader = crate::Reader::new(avro_data.as_slice())?;\n    for value in reader {\n        let mentors: RustMentors = from_value(\u0026value)?;\n        dbg!(mentors);\n    }\n    Ok(())\n}\n\n```\n\n### Writer customization\n\nIf you want to have more control over the parameters of `Writer`, consider using `WriterBuilder` as shown below:\n\n```rust\n\nuse anyhow::Error;\nuse avrow::{Codec, Reader, Schema, WriterBuilder};\n\nfn main() -\u003e Result\u003c(), Error\u003e {\n    let schema = Schema::from_str(r##\"\"null\"\"##)?;\n    let v = vec![];\n    let mut writer = WriterBuilder::new()\n        .set_codec(Codec::Null)\n        .set_schema(\u0026schema)\n        .set_datafile(v)\n        // set any custom metadata in the header\n        .set_metadata(\"hello\", \"world\")\n        // set after how many bytes, the writer should flush\n        .set_flush_interval(128_000)\n        .build()\n        .unwrap();\n    writer.serialize(())?;\n    let v = writer.into_inner()?;\n\n    let reader = Reader::with_schema(v.as_slice(), schema)?;\n    for i in reader {\n        dbg!(i?);\n    }\n\n    Ok(())\n}\n```\n\nRefer to [examples](./examples) for more code examples.\n\n## Supported Codecs\n\nIn order to facilitate efficient encoding, avro spec also defines compression codecs to use when serializing data.\n\nAvrow supports all compression codecs as per spec:\n\n- Null - The default is no codec.\n- [Deflate](https://en.wikipedia.org/wiki/DEFLATE)\n- [Snappy](https://github.com/google/snappy)\n- [Zstd](https://facebook.github.io/zstd/)\n- [Bzip2](https://www.sourceware.org/bzip2/)\n- [Xz](https://linux.die.net/man/1/xz)\n\nThese are feature-gated behind their respective flags. Check `Cargo.toml` `features` section for more details.\n\n## Using avrow-cli tool:\n\nQuite often you will need a quick way to examine avro file for debugging purposes. \nFor that, this repository also comes with the [`avrow-cli`](./avrow-cli) tool (av)\nby which one can examine avro datafiles from the command line.\n\nSee [avrow-cli](avrow-cli/) repository for more details.\n\nInstalling avrow-cli:\n\n```\ncd avrow-cli\ncargo install avrow-cli\n```\n\nUsing avrow-cli (binary name is `av`):\n\n```bash\nav read -d data.avro\n```\n\nThe `read` subcommand will print all rows in `data.avro` to standard out in debug format.\n\n### Rust native types to Avro value mapping (via Serde)\n\nPrimitives\n---\n\n| Rust native types (primitive types) | Avro (`Value`) |\n| ----------------------------------- | -------------- |\n| `(), Option::None`                  | `null`         |\n| `bool`                              | `boolean`      |\n| `i8, u8, i16, u16, i32, u32`        | `int`          |\n| `i64, u64`                          | `long`         |\n| `f32`                               | `float`        |\n| `f64`                               | `double`       |\n| `\u0026[u8], Vec\u003cu8\u003e`                    | `bytes`        |\n| `\u0026str, String`                      | `string`       |\n---\nComplex\n\n| Rust native types (complex types)                    | Avro     |\n| ---------------------------------------------------- | -------- |\n| `struct Foo {..}`                                    | `record` |\n| `enum Foo {A,B}` (variants cannot have data in them) | `enum`   |\n| `Vec\u003cT\u003e where T: Into\u003cValue\u003e`                        | `array`  |\n| `HashMap\u003cString, T\u003e where T: Into\u003cValue\u003e`            | `map`    |\n| `T where T: Into\u003cValue\u003e`                             | `union`  |\n| `Vec\u003cu8\u003e` : Length equal to size defined in schema   | `fixed`  |\n\n\u003cbr\u003e\n\n## Todo\n\n* [Logical types](https://avro.apache.org/docs/current/spec.html#Logical+Types) support.\n* Sorted reads.\n* Single object encoding.\n* Schema Registry as a trait - would allow avrow to read from and write to remote schema registries.\n* AsyncRead + AsyncWrite Reader and Writers.\n* Avro protocol message and RPC support. \n* Benchmarks and optimizations.\n\n## Changelog\n\nPlease see the [CHANGELOG](CHANGELOG.md) for a release history.\n\n## Contributions\n\nAll kinds of contributions are welcome.\n\nHead over to [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.\n\n## Support\n\n\u003ca href=\"https://www.buymeacoffee.com/creativcoder\" target=\"_blank\"\u003e\u003cimg src=\"https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png\" alt=\"Buy Me A Coffee\" style=\"height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;\" \u003e\u003c/a\u003e\n\n[![ko-fi](https://www.ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/P5P71YZ0L)\n\n## MSRV\n\nAvrow works on stable Rust, starting 1.37+.\nIt does not use any nightly features.\n\n## License\n\nDual licensed under either of \u003ca href=\"LICENSE-APACHE\"\u003eApache License, Version\n2.0\u003c/a\u003e or \u003ca href=\"LICENSE-MIT\"\u003eMIT license\u003c/a\u003e at your option.\n\nUnless you explicitly state otherwise, any contribution intentionally submitted\nfor inclusion in this crate by you, as defined in the Apache-2.0 license, shall\nbe dual licensed as above, without any additional terms or conditions.\n","funding_links":["https://liberapay.com/creativcoder","https://www.buymeacoffee.com/creativcoder","https://ko-fi.com/P5P71YZ0L"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcreativcoder%2Favrow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcreativcoder%2Favrow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcreativcoder%2Favrow/lists"}