{"id":16949490,"url":"https://github.com/tailhook/probor","last_synced_at":"2025-10-19T06:24:58.357Z","repository":{"id":36034176,"uuid":"40330826","full_name":"tailhook/probor","owner":"tailhook","description":"A protocol on top of CBOR that provides protobuf-like functionality","archived":false,"fork":false,"pushed_at":"2018-05-18T10:28:36.000Z","size":8089,"stargazers_count":53,"open_issues_count":2,"forks_count":2,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-13T22:09:15.821Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tailhook.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-08-06T23:08:13.000Z","updated_at":"2025-02-07T10:33:49.000Z","dependencies_parsed_at":"2022-08-26T04:11:59.879Z","dependency_job_id":null,"html_url":"https://github.com/tailhook/probor","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tailhook%2Fprobor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tailhook%2Fprobor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tailhook%2Fprobor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tailhook%2Fprobor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tailhook","download_url":"https://codeload.github.com/tailhook/probor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243852484,"owners_count":20358271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T21:54:56.794Z","updated_at":"2025-10-19T06:24:53.338Z","avatar_url":"https://github.com/tailhook.png","language":"Rust","readme":"======\nProbor\n======\n\n\n:Status: Proof of Concept\n:Rust docs: http://tailhook.github.io/probor/\n\nProbor is an extensible mechanism for serializing structured data on top of\nCBOR_.\n\nIn additional to CBOR_ probor has the following:\n\n1. A library to efficiently read data into language native structures\n\n2. A schema definition language that serves as a documentation for\n   interoperability between systems\n\n3. A more compact protocol which omits field names for objects\n\n4. Conventions to make schema backwards compatible\n\n\nWhy?\n====\n\nWe like CBOR_ for the following:\n\n1. It's IETF standard\n\n2. It's self-descriptive\n\n3. It's compact enough\n\n4. It's extensive_ including mind-boggling_ things\n\n5. It has implementations in most languages\n\n.. _extensive: http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml\n.. _mind-boggling: https://github.com/paroga/cbor-js/issues/3\n\nWhat we lack in CBOR_:\n\n1. No schema definition, i.e. can't check interoperability between systems\n\n2. Transmitting/storing lots of objects is expensive because keys are encoded\n   for every object\n\n3. No standard way to cast \"object\" (i.e. a map or a dict) into native typed\n   object\n\n\nComparison\n==========\n\nThis section roughly compares similar projects to see second our arguments in\n\"Why?\" section. Individual arguments may not be very convincing by they are\nreasonable enough in total.\n\n\nProbor vs Protobuf\n------------------\n\nProtobuf_ can't parse data if no schema known. *Probor* is not always totally\nreadable, but at least you can unpack the data using generic *cbor* decoder and\nlook at raw values (presumably without key names).\n\nAnd it's not only hard when schema is unknown, but when you have a schema but\nno code generated to inspect it. For example if you have a java application,\nbut want to inspect some code in python. You need a pythonic code generator and\ngenerate code before you can read anything with *protobuf*.\n\n*Probor* also has debugging (non-compact) mode in which it may encode object\nand enums by name so you can easily understand the values. You can also keep\nkey names for most objects except ones that are transmitted in large\nquantities, because compact and non-compact formats are compatible. You are in\ncontrol.\n\nThe types that *Protobuf* generates are not native. So they are larger and\nhard to work with. Because code is generated you usually can't add methods\nto the object itself without subtle hacks. *Probor* tries to provide thin layer\naround native objects.\n\nAlso working with a code generation is inconvenient. *Protobuf* has a code\ngenerator written in C++ which you need to have installed. Moreover you often\nneed another version of protobuf code generator for every language. *Probor*\nworks without code generation for all currently supported languages\nby providing simple to use macros and/or annotations to native types. We may\nprovide code generation facilities too for bootstrapping the code, but they\nshould be done purely in the language they generate.\n\nOn the upside of *Protobuf* it can deserialize lookup object and serialize\nagain without loosing any information (even fields that are not in his version\nof a protocol). For *probor* it's not implemented in current libraries for\neffiency reasons, but it can be done with apropriate libraries anyway.\n\n.. _Protobuf: https://github.com/google/protobuf\n\nProbor vs Avro\n--------------\n\nAvro_ needs a schema to be transported \"in-band\", i.e. as a prefix to a data\nsend. We find this redundant.\n\nAlso *Avro* types are somewhat historic from C era. We wanted modern algebraic\ntypes like they are in Rust or Haskell.\n\nAlso *avro* file format is not in IETF spec and does not have such interesting\nextensions like CBOR_ `has`__.\n\n__ mind-boggling_\n\n.. _avro: https://avro.apache.org/\n\n\nProbor vs Thrift\n----------------\n\nThrift doesn't have good description of the binary format (in fact it has two\nboth are not documented in any sensible way) unlike CBOR_ which is IETF\nstandard. Do the data is hard to read without having code generated in advance.\n\n*Thrift* also has ugly union type from 1990x, comparing to nice algebraic types\nwhich we want to use in 2015.\n\n*Thrift* relies on code generation for parsing data which we don't like because\nit makes programs hard to build and it's hard to integrate with native\ntypes (i.e. add a method to generated type).\n\nAlso *thrift* bindings usually have some implementation of *services*\nwhich usually is a cruft because there are too much ways for dealing with\nnetwork in each language to have all of them implemented by thrift authors.\nFurthermore *thrift* has long history of generating code that can't be network\nIO agnostic.\n\n.. _thrift: http://thrift.apache.org/\n\n\nProbor vs Capnproto\n===================\n\n*Capnproto* has ugly and complex serialization format which is useful for\nmapping values directly into memory without decoding. But its more complex to\nimplement correctly than what we target for. We also wanted compact encoding\nwhich *Capnproto* has but it's built on top of already hard to understand\nencoding and complicates things even more.\n\n*Capnproto* like other relies on code generation with ugly protocol objects\nas result of decoding, but we wanted native types.\n\n.. _capnproto: https://capnproto.org/\n\n\nLook-a-Like\n===========\n\nFor example, here is schema::\n\n    struct SearchResults {\n        total_results @0 :int\n        results @1 :array Page\n    }\n    struct Page {\n        url @0 :text,\n        title @1 :text,\n        snippet @2 :optional text,\n    }\n\nNote the following things:\n\n* We use generic type names like int (integer), not fixed width (see FAQ)\n* We give each field a number, they are similar to ones used in other\n  IDL's (like protobuf, thrift or capnproto)\n\nThe structure serialized with probor will look like (displaying json for\nclarity, in fact you will see exact this data if decode CBOR and encode with\nJSON):\n\n.. code-block:: json\n\n   [1100, [\n        [\"http://example.com\", \"Example Com\"],\n        [\"http://example.org\", \"Example Org\", \"Example organization\"]]]\n\nObviously when unpacked, it looks more like (in javascript):\n\n.. code-block:: javascript\n\n   new SearchResults({\"total_results\": 1100,\n                      \"results\": [new Page({\"url\": \"http://example.com\",\n                                            \"title\": \"Example Com\"}),\n                                  new Page({\"url\": \"http://example.org\",\n                                            \"title\": \"Example Org\",\n                                            \"snippet\": \"Example organization\"})]}\n\nActually the object can be serialized like this:\n\n.. code-block:: json\n\n   {\"total_results\": 1100,\n    \"results\": [{\"url\": \"http://example.com\",\n                 \"title\": \"Example Com\"},\n                {\"url\": \"http://example.org\",\n                 \"title\": \"Example Org\",\n                 \"snippet\": \"Example organization\"}]}\n\nAnd this would also be **totally valid** serialized representation. I.e. you\ncan store fields both by names and by numbers. This is occasionally useful for\nad-hoc requests or you may be willing to receive non-compact data from frontend,\nthen validate and push data in more compact format for storage.\n\nIn Python serialization looks like:\n\n.. code-block:: python\n\n    from probor import struct\n\n    class Page(object):\n\n        def __init__(self, url, title, snippet=None):\n            # .. your constructor .. omitted for brevity\n\n        probor_protocol = struct(\n            required={(0, \"url\"): str, (1, \"title\"): str},\n            optional={(2, \"snippet\"): str})\n\n    class SearchResults(object):\n        def __init__(self, total_resutls, results):\n            # .. your constructor .. omitted for brevity\n\n        probor_protocol = struct(\n            required={(0, \"total_results\"): int, (1, \"results\"): Page})\n\n\nTODO: isn't syntax ugly? Should it be more imperative? Is setstate/getstate\nused?\n\n.. note:: It's easy to build a more declarative layer on top of this protocol.\n   I.e. for some ORM model, you might reuse field names and types. But the\n   important property to keep in mind is that you should not rely on field\n   order for numbering fields and **numbers must be explicit**, or otherwise\n   removing a field might go unnoticed.\n\n   Apart from that, integrating probor data types with model and/or validation\n   code is encouraged. And that's actually a reason why we don't provide a\n   nicer syntax for this low-level declarations.\n\n\nSimilarly in Rust it looks like:\n\n.. code-block:: rust\n\n    #[macro_use] extern crate probor;\n\n    use probor::{Encoder, Encodable};\n    use probor::{Decoder, Config, decode};\n    use std::io::Cursor;\n\n    probor_struct!(\n    #[derive(PartialEq, Eq, Debug)]\n    struct Page {\n        url: String =\u003e (#0),\n        title: String =\u003e (#1),\n        snippet: Option\u003cString\u003e =\u003e (#2 optional),\n    });\n\n    probor_struct!(\n    #[derive(PartialEq, Eq, Debug)]\n    struct SearchResults {\n        total_results: u64 =\u003e (#0),\n        results: Vec\u003cPage\u003e =\u003e (#1),\n    });\n\n\n    fn main() {\n        let buf = Vec::new();\n        let mut enc = Encoder::new(buf);\n        SearchResults {\n            total_results: 112,\n            results: vec![Page {\n                url: \"http://url1.example.com\".to_string(),\n                title: \"One example\".to_string(),\n                snippet: None,\n            }, Page {\n                url: \"http://url2.example.com\".to_string(),\n                title: \"Two example\".to_string(),\n                snippet: Some(\"Example Two\".to_string()),\n            }],\n        }.encode(\u0026mut enc).unwrap();\n        let sr: SearchResults = decode(\n            \u0026mut Decoder::new(Config::default(), Cursor::new(enc.into_writer())))\n            .unwrap();\n        println!(\"Results {:?}\", sr);\n    }\n\nThe Rust example is a bit longer which is bearable for rust.  It's hugely based on\nmacros, which may seem as similar to code generation. Still, we find it better,\nbecause you are in control of at least the following things:\n\n1. The specific types used (e.g. u64 for int)\n2. The structure definition (may use meta attributes including\n   ``derive`` and ``repr`` and may use ``struct T(X, Y)``)\n3. How objects are created (e.g. use ``VecDeque`` or ``BTreeMap`` instead of\n   default ``Vec`` and ``HashMap``)\n4. How missing fields are handled (e.g. you can provide defaults for missing\n   fields instead of using ``Option\u003cT\u003e``)\n5. You can include application-specific validation code\n\n.. note:: Leaving the parentheses empty will result in the field\n   strings stored as part of the payload.\n   This would undermine the goal of reducing byte count of data stored,\n   and in such cases, one may as well use CBOR directly.\n\nAt the end of the day, writing a parser explicitly with few helper macros looks\nlike a much better idea than adding all the data as the meta information to the\nschema file.\n\n\nType System\n===========\n\nStructures\n----------\n\nTBD\n\nAlgebraic Types\n---------------\n\nTBD\n\nIn Unsupported Languages\n````````````````````````\n\nIn language which doesn't support algebraic types, they are implemented\nby tying together few normal types. E.g. the following type in Rust:\n\n.. code-block:: rust\n\n    enum HtmlElement {\n        Tag(String, Vec\u003cHtmlElement\u003e),\n        Text(String),\n    }\n\nIs encoded like this in python:\n\n.. code-block:: python\n\n    from probor import enum\n\n    class HtmlElement:\n        \"\"\"Base class\"\"\"\n\n    class Tag(HtmlElement):\n        def __init__(self, tag_name, children):\n            # .. snip ..\n\n        probor_protocol = ...\n\n    class Text(HtmlElement):\n\n        def __init__(self, text)\n            self.text = text\n\n        probor_protocol = ...\n\n    HtmlElement.probor_protocol = enum({\n        (0, 'Tag'): Tag,\n        (1, 'Text'): Text,\n    })\n\nThen you can do pattern-matching-like things by using\n``functools.singledispatch`` (in Python3.4) or just use ``isinstance``.\n\n.. note:: The purescript compiles types similarly. It's unchecked, but\n   I believe probor's searization into Javascript should be compatible with\n   PureScript types.\n\n\nForward/Backward Compatibility\n==============================\n\nComparing with protobuf, the probor serializer always considers all fields as\noptional. The required fields are only in IDL, so if your future type is smart\nenough to\n\nBackwards compatibility is very similar to protobuf.\n\nTBD: exact rules for backward compatibility\n\nTBD: exact rules for forward compatibility\n\nTBD: turning structure in algebraic type with compatibility\n\n\nFAQ\n===\n\n\nWhy Use Generic Types?\n----------------------\n\nWell, there are couple of reasons:\n\n1. Different languages have different types, e.g. Python does have generic\n   integer only, Java does not have unsigned integer types\n\n2. Fixed width types are not good constaint anyway, valid values have often\n   much smaller range than that of the type, so this is not a replacement for\n   data validation anyway\n\n\nWhy No Default Values\n---------------------\n\nThere are couple of reasons:\n\n1. Default value is user-interface feature. And every service might want use\n   it's own default value.\n\n2. It's very application-specific if value that equals to default value may\n   be omitted when serializing. And we want to use native structures for the\n   language without any additional bookkeeping of whether the value is default\n   or just equals to it.\n\n\n.. _CBOR: http://cbor.io/\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftailhook%2Fprobor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftailhook%2Fprobor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftailhook%2Fprobor/lists"}