{"id":13879206,"url":"https://github.com/dasch/avro_turf","last_synced_at":"2025-05-14T15:08:18.195Z","repository":{"id":28156182,"uuid":"31656719","full_name":"dasch/avro_turf","owner":"dasch","description":"A library that makes it easier to use the Avro serialization format from Ruby.","archived":false,"fork":false,"pushed_at":"2025-03-21T12:16:53.000Z","size":316,"stargazers_count":170,"open_issues_count":1,"forks_count":85,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-04T23:03:01.070Z","etag":null,"topics":["avro","avro-data","ruby","schema","schema-registry"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dasch.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-03-04T12:48:38.000Z","updated_at":"2025-03-31T21:12:37.000Z","dependencies_parsed_at":"2024-06-18T14:45:54.013Z","dependency_job_id":"77f4516c-c1c6-4fd7-b91e-1feffd95dfd9","html_url":"https://github.com/dasch/avro_turf","commit_stats":{"total_commits":250,"total_committers":55,"mean_commits":4.545454545454546,"dds":0.596,"last_synced_commit":"e01a2a47ca3a976e1298e1cfdd3f5a0945598b39"},"previous_names":[],"tags_count":40,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasch%2Favro_turf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasch%2Favro_turf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasch%2Favro_turf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasch%2Favro_turf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dasch","download_url":"https://codeload.github.com/dasch/avro_turf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248505875,"owners_count":21115354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avro","avro-data","ruby","schema","schema-registry"],"created_at":"2024-08-06T08:02:13.416Z","updated_at":"2025-04-12T01:51:47.198Z","avatar_url":"https://github.com/dasch.png","language":"Ruby","funding_links":[],"categories":["Ruby"],"sub_categories":[],"readme":"# AvroTurf\n\nAvroTurf is a library that makes it easier to encode and decode data using the [Apache Avro](http://avro.apache.org/) serialization format. It adds a layer on top of the official Avro gem which makes it easier to integrate Avro into your application:\n\n* Provides an idiomatic Ruby interface.\n* Allows referencing schemas defined in another file.\n\n## Deprecation Notice\n\nThe `AvroTurf::SchemaRegistry`, `AvroTurf::CachedSchemaRegistry`,\nand `FakeSchemaRegistryServer` names have been deprecated because the Avro spec recently\nintroduced an incompatible [single-message encoding format](https://github.com/apache/avro/commit/30408a9c192c5f4eaaf42f01f0ffbfffd705aa57).\n\nThese classes have been renamed to `AvroTurf::ConfluentSchemaRegistry`,\n`AvroTurf::CachedConfluentSchemaRegistry`, and `FakeConfluentSchemaRegistry`.\n\nThe aliases for the original names will be removed in a future release.\n\n## Note about finding nested schemas\n\nAs of AvroTurf version 1.0.0, only top-level schemas that have their own .avsc file will be loaded and resolvable by the `AvroTurf::SchemaStore#find` method. This change will likely not affect most users. However, if you use `AvroTurf::SchemaStore#load_schemas!` to pre-cache all your schemas and then rely on `AvroTurf::SchemaStore#find` to access nested schemas that are not defined by their own .avsc files, your code may stop working when you upgrade to v1.0.0.\n\nAs an example, if you have a `person` schema (defined in `my/schemas/contacts/person.avsc`) that defines a nested `address` schema like this:\n\n```json\n{\n  \"name\": \"person\",\n  \"namespace\": \"contacts\",\n  \"type\": \"record\",\n  \"fields\": [\n    {\n      \"name\": \"address\",\n      \"type\": {\n        \"name\": \"address\",\n        \"type\": \"record\",\n        \"fields\": [\n          { \"name\": \"addr1\", \"type\": \"string\" },\n          { \"name\": \"addr2\", \"type\": \"string\" },\n          { \"name\": \"city\", \"type\": \"string\" },\n          { \"name\": \"zip\", \"type\": \"string\" }\n        ]\n      }\n    }\n  ]\n}\n```\n...this will no longer work in v1.0.0:\n```ruby\nstore = AvroTurf::SchemaStore.new(path: 'my/schemas')\nstore.load_schemas!\n\n# Accessing 'person' is correct and works fine.\nperson = store.find('person', 'contacts') # my/schemas/contacts/person.avsc exists\n\n# Trying to access 'address' raises AvroTurf::SchemaNotFoundError\naddress = store.find('address', 'contacts') # my/schemas/contacts/address.avsc is not found\n```\n\nFor details and context, see [this pull request](https://github.com/dasch/avro_turf/pull/111).\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'avro_turf'\n```\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install avro_turf\n\n## Usage\n\nUsing AvroTurf is quite simple:\n\n```ruby\n# Schemas will be looked up from the specified directory.\navro = AvroTurf.new(schemas_path: \"app/schemas/\")\n\n# Decode some data using a named schema. The schema file should exist in the\n# schemas directory with the file name `\u003cname\u003e.avsc`.\navro.decode(encoded_data, schema_name: \"person\")\n\n# Encode some data using the named schema.\navro.encode({ \"name\" =\u003e \"Jane\", \"age\" =\u003e 28 }, schema_name: \"person\")\n\n# Data can be validated before encoding to get a description of problem through\n# Avro::SchemaValidator::ValidationError exception\navro.encode({ \"titl\" =\u003e \"hello, world\" }, schema_name: \"person\", validate: true)\n\n# If you do not want to register the schema in case it does not exist, you can pass the register_schemas option as false\navro.encode({ \"name\" =\u003e \"Jane\", \"age\" =\u003e 28 }, schema_name: \"person\", register_schemas: false)\n```\n\n### Inter-schema references\n\nUnlike the official Avro library, AvroTurf allows schemas to reference each other. As an example:\n\n```json\n// person.avsc\n{\n  \"name\": \"person\",\n  \"type\": \"record\",\n  \"fields\": [\n    {\n      \"name\": \"full_name\",\n      \"type\": \"string\"\n    },\n    {\n      \"name\": \"address\",\n      \"type\": \"address\"\n    }\n  ]\n}\n\n// address.avsc\n{\n  \"name\": \"address\",\n  \"type\": \"record\",\n  \"fields\": [\n    {\n      \"name\": \"street\",\n      \"type\": \"string\"\n    },\n    {\n      \"name\": \"city\",\n      \"type\": \"string\"\n    }\n  ]\n}\n```\n\nIn the example above, the `person` schema references the `address` schema, even though the latter is defined in another file. This makes it possible to share types across schemas, e.g.\n\n```json\n// person_list.avsc\n{\n  \"type\": \"array\",\n  \"items\": \"person\"\n}\n```\n\nThere's no reason to copy-paste the `person` schema into the `person_list` schema, as you can reference it directly.\n\nThis feature helps avoid subtle errors when the same type is represented using slightly different schemas.\n\n\n### Using a Schema Registry\n\nBy default, AvroTurf will encode data in the Avro data file format. This means that the schema used to encode the data is prepended to the output. If you want to decrease the size of the output, e.g. when storing data in a log such as Apache Kafka or in a database, you can use the `AvroTurf::Messaging` API. This top-level API requires the use of [Schema Registry](https://github.com/confluentinc/schema-registry), a service which allows registering and fetching Avro schemas.\n\nThe Messaging API will automatically register schemas used for encoding data, and will fetch the corresponding schema when decoding. Instead of including the full schema in the output, only a schema id generated by the registry is included. Registering the same schema twice is idempotent, so no coordination is needed.\n\nAn optional `schema_context` parameter allows the registry to be scoped to a\n[schema context](https://docs.confluent.io/platform/7.5/schema-registry/schema-linking-cp.html#schema-contexts).\nIf there is a need to access multiple contexts, you will need to use multiple instances of `ConfluentSchemaRegistry`.\n\n**NOTE:** [The Messaging format](https://github.com/confluentinc/schema-registry/blob/master/docs/serializer-formatter.rst#wire-format) is _not_ compatible with the Avro data file API.\n\nThe Messaging API is not included by default, so you must require 'avro_turf/messaging' explicitly if you want to use it.\n\nUsing the Messaging API is simple once you have set up a Schema Registry service:\n\n```ruby\nrequire 'avro_turf/messaging'\n\n# You need to pass the URL of your Schema Registry.\navro = AvroTurf::Messaging.new(registry_url: \"http://my-registry:8081/\")\n\n# The API for encoding and decoding data is similar to the default one. Encoding\n# data has the side effect of registering the schema. This only happens the first\n# time a schema is used.\ndata = avro.encode({ \"title\" =\u003e \"hello, world\" }, schema_name: \"greeting\")\n\n# If you don't want to automatically register new schemas, you can pass explicitly\n# both subject and version to specify which schema should be used for encoding.\n# It will fetch that schema from the registry and cache it. Subsequent instances\n# of the same schema version will be served by the cache.\ndata = avro.encode({ \"title\" =\u003e \"hello, world\" }, subject: 'greeting', version: 1)\n\n# If you want to use a specific local schema, but register it with a different name in the\n# registry, then provide a subject and a schema_name, but not a version\ndata = avro.encode({ \"title\" =\u003e \"hello, world\" }, subject: \"greeting-value\", schema_name: \"greeting\")\n\n# You can also pass explicitly schema_id to specify which schema\n# should be used for encoding.\n# It will fetch that schema from the registry and cache it. Subsequent instances\n# of the same schema version will be served by the cache.\ndata = avro.encode({ \"title\" =\u003e \"hello, world\" }, schema_id: 2)\n\n# Message can be validated before encoding to get a description of problem through\n# Avro::SchemaValidator::ValidationError exception\ndata = avro.encode({ \"titl\" =\u003e \"hello, world\" }, schema_name: \"greeting\", validate: true)\n\n# When decoding, the schema will be fetched from the registry and cached. Subsequent\n# instances of the same schema id will be served by the cache.\navro.decode(data) #=\u003e { \"title\" =\u003e \"hello, world\" }\n\n# If you want to get decoded message as well as the schema used to encode the message,\n# you can use `#decode_message` method.\nresult = avro.decode_message(data)\nresult.message       #=\u003e { \"title\" =\u003e \"hello, world\" }\nresult.schema_id     #=\u003e 3\nresult.writer_schema #=\u003e #\u003cAvro::Schema: ...\u003e\nresult.reader_schema #=\u003e nil\n\n# You can also work with schema through this interface:\n# Fetch latest schema for subject from registry\nschema, schema_id = avro.fetch_schema(subject: 'greeting')\n# Fetch specific version\nschema, schema_id = avro.fetch_schema(subject: 'greeting', version: 1)\n# Fetch schema by id\nschema, schema_id = avro.fetch_schema_by_id(3)\n# Register schema fetched from store by name\nschema, schema_id = avro.register_schema(schema_name: 'greeting')\n# Specify namespace (same as schema_name: 'somewhere.greeting')\nschema, schema_id = avro.register_schema(schema_name: 'greeting', namespace: 'somewhere')\n# Customize subject under which to register schema\nschema, schema_id = avro.register_schema(schema_name: 'greeting', namespace: 'somewhere', subject: 'test')\n```\n\n### Confluent Schema Registry Client\n\nThe ConfluentSchemaRegistry client used by the Messaging API can also be used directly.\nIt can check whether a schema is compatible with a subject in the registry using the [Compatibility API](http://docs.confluent.io/3.1.2/schema-registry/docs/api.html#compatibility):\n\n```ruby\nrequire 'avro_turf'\nrequire 'avro_turf/confluent_schema_registry'\n\nschema = \u003c\u003c-JSON\n{\n  \"name\": \"person\",\n  \"type\": \"record\",\n  \"fields\": [\n    {\n      \"name\": \"full_name\",\n      \"type\": \"string\"\n    },\n    {\n      \"name\": \"address\",\n      \"type\": \"address\"\n    }\n  ]\n}\nJSON\n\nregistry = AvroTurf::ConfluentSchemaRegistry.new(\"http://my-registry:8081/\")\n\n# Returns true if the schema is compatible, nil if the subject or version is not registered, and false if incompatible.\nregistry.compatible?(\"person\", schema)\n\n# Returns an array of any breaking changes, nil if the subject or version is not registered\nregistry.compatibility_issues(\"person\", schema)\n```\n\nThe ConfluentSchemaRegistry client can also change the global compatibility level or the compatibility level for an individual subject using the [Config API](http://docs.confluent.io/3.1.2/schema-registry/docs/api.html#config):\n\n```ruby\nregistry.update_global_config(compatibility: 'FULL')\nregistry.update_subject_config(\"person\", compatibility: 'NONE')\n```\n\n### Testing Support\n\nAvroTurf includes a `FakeConfluentSchemaRegistryServer` that can be used in tests. The\nfake schema registry server depends on Sinatra but it is _not_ listed as a runtime\ndependency for AvroTurf. Sinatra must be added to your Gemfile or gemspec in order\nto use the fake server.\n\nGiven the recent update in `sinatra` to fix [CVE-2024-21510](https://github.com/advisories/GHSA-hxx2-7vcw-mqr3) that included a new `HostAuthorization` middleware, the `FakeConfluentSchemaRegistryServer` is provided as a base implementation that has to be inherited into a new class and configured by the user so requests are properly authorised to the test registry host.\n\nExample using RSpec:\n\n```ruby\nrequire 'avro_turf/test/fake_confluent_schema_registry_server'\nrequire 'webmock/rspec'\n\nclass AuthorizedFakeConfluentSchemaRegistryServer \u003c FakeConfluentSchemaRegistryServer\n  set :host_authentication, permitted_hosts: ['registry.example.com']\nend\n\n# within an example\nlet(:registry_url) { \"http://registry.example.com\" }\nbefore do\n  stub_request(:any, /^#{registry_url}/).to_rack(AuthorizedFakeConfluentSchemaRegistryServer)\n  AuthorizedFakeConfluentSchemaRegistryServer.clear\nend\n\n# Messaging objects created with the same registry_url will now use the fake server.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdasch%2Favro_turf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdasch%2Favro_turf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdasch%2Favro_turf/lists"}