{"id":21340026,"url":"https://github.com/imandra-ai/cbor-pack","last_synced_at":"2025-03-16T02:41:29.178Z","repository":{"id":165702942,"uuid":"611900515","full_name":"imandra-ai/cbor-pack","owner":"imandra-ai","description":"OCaml library + ppx for CBOR-pack: a serialization layer with sharing on top of CBOR","archived":false,"fork":false,"pushed_at":"2024-01-05T00:07:59.000Z","size":716,"stargazers_count":3,"open_issues_count":2,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-01-22T15:26:29.528Z","etag":null,"topics":["cbor","ocaml","serialization","serialization-format"],"latest_commit_sha":null,"homepage":"http://docs.imandra.ai/cbor-pack/","language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/imandra-ai.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-09T19:19:36.000Z","updated_at":"2024-08-26T08:32:51.000Z","dependencies_parsed_at":"2024-01-04T20:53:43.574Z","dependency_job_id":null,"html_url":"https://github.com/imandra-ai/cbor-pack","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imandra-ai%2Fcbor-pack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imandra-ai%2Fcbor-pack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imandra-ai%2Fcbor-pack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imandra-ai%2Fcbor-pack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/imandra-ai","download_url":"https://codeload.github.com/imandra-ai/cbor-pack/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243817335,"owners_count":20352541,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cbor","ocaml","serialization","serialization-format"],"created_at":"2024-11-22T00:48:49.532Z","updated_at":"2025-03-16T02:41:29.156Z","avatar_url":"https://github.com/imandra-ai.png","language":"OCaml","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cbor-pack\n\n[![build](https://github.com/imandra-ai/cbor-pack/actions/workflows/main.yml/badge.svg)](https://github.com/imandra-ai/cbor-pack/actions/workflows/main.yml)\n\nThis is a serialization format that adds some structure on top of [CBOR](https://cbor.io).\n\nThe additional layer is used to encode sharing in a reasonably efficient way, by\nemulating a small heap in the CBOR structure.\n\nA cbor-pack value (or \"pack\"), on the wire, is the CBOR representation of:\n\n```rust\nstruct Pack {\n  h: Vec\u003cCbor\u003e,\n  k: Cbor\n}\n```\n\nwhich can be understood as:\n- `k` is the key, the entrypoint. It's the actual value.\n- `h` is the heap, used to map _pointers_ to CBOR values.\n\nA pointer, in this context, is a tagged integer `6(n)`, where `n` is the offset\nof the actual value in the array `h`. Any value, including `k`, can contain such\npointers (since they are CBOR values), and deserialization will follow the\npointers automatically.\n\n## ppx\n\nHow to use the ppx:\n\nIn dune:\n```\n(library\n  (name foo)\n  …\n  (libraries … cbor-pack)\n  (preprocess (pps cbor-pack-ppx)))\n```\n\nand in the code:\n\n```ocaml\n# #require \"cbor-pack\";;\n# #require \"cbor-pack-ppx\";;\n```\n\n```ocaml\n# type foo = {\n    x: int;\n    y: (string [@as_bytes]);\n  } [@@deriving cbpack];;\ntype foo = { x : int; y : string; }\nval foo_to_cbpack : Cbor_pack.Ser.state -\u003e foo -\u003e Cbor_pack.cbor = \u003cfun\u003e\nval foo_of_cbpack : Cbor_pack.Deser.state -\u003e Cbor_pack.cbor -\u003e foo = \u003cfun\u003e\n```\n\nwhich creates two functions for serializing and deserializing.\n\n### Example: a record\n\nFor example, in `tests/t1.ml`, we can serialize and deserialize:\n\n```ocaml\n# type foo = {\n    a: int;\n    b: float;\n  } [@@deriving cbpack] ;;\ntype foo = { a : int; b : float; }\nval foo_to_cbpack : Cbor_pack.Ser.state -\u003e foo -\u003e Cbor_pack.cbor = \u003cfun\u003e\nval foo_of_cbpack : Cbor_pack.Deser.state -\u003e Cbor_pack.cbor -\u003e foo = \u003cfun\u003e\n```\n\nThen we can encode a value:\n\n```ocaml\n# let my_foo = { a = 1; b = 2.0 };;\nval my_foo : foo = {a = 1; b = 2.}\n# let c = Cbor_pack.to_cbor foo_to_cbpack my_foo;;\nval c : Cbor_pack.cbor =\n  `Map\n    [(`Text \"k\", `Tag (6, `Int 0));\n     (`Text \"h\", `Array [`Map [(`Int 0, `Int 1); (`Int 1, `Float 2.)]])]\n\n# CBOR.Simple.to_diagnostic c |\u003e print_endline;;\n{\"k\": 6(0), \"h\": [{0: 1, 1: 2.}]}\n- : unit = ()\n\n# let s = Cbor_pack.to_string foo_to_cbpack my_foo;;\n...\n# String.length s;;\n- : int = 21\n```\n\nand deserialize it again:\n\n```ocaml\n# let foo2 = Cbor_pack.of_string_exn foo_of_cbpack s;;\nval foo2 : foo = {a = 1; b = 2.}\n\n# my_foo = foo2;;\n- : bool = true\n```\n\n### Hashconsing\n\nHashconsing is sharing done on the heap itself. If the same CBOR value `c` is\nadded twice to the heap with the hashconsing option enabled, the second\noccurrence will not be added but will reuse a pointer to the first entry.\n\nThis has a cost at runtime (hashtable lookups), but can result in a\nsignificantly smaller pack at the end. Hashconsing is generic and can work on\nany type because it proceeds entirely on serialized values.\n\n### Caching during serialization\n\nThe serializer can cache previously encoded values for types that are comparable and hashable:\n\n```ocaml\ntype foo = {\n  x: int;\n  y: bool\n} [@@deriving cbpack]\n\n(* used inside the cache *)\nmodule Foo = struct\n  type t = foo\n  let equal a b = a.x=b.x \u0026\u0026 a.y=b.y\n  let hash = Hashtbl.hash\nend\n\nlet key_cache_ser_foo = Cbor_pack.Ser.create_cache_key (module Foo) ;;\n\nlet foo_to_cbpack_cached: foo Cbor_pack.Ser.t =\n    Cbor_pack.Ser.with_cache key_cache_ser_foo foo_to_cbpack;;\n```\n\n(Note how `foo_to_cbpack_cached` needs both a key, and the uncached serializer which is used\nfor values that aren't already in the cache).\n\nNow we can encode values and introduce sharing in a way that is more efficient at runtime\nthat using hashconsing (values already encoded are not re-encoded at all).\n\n```ocaml\n# let l =\n    let f1: foo = {x=1; y=true} in\n    let f2: foo = {x=2; y=false} in\n    [f1; f2; f1; f2; f1; f2; f2; f1];;\nval l : foo list =\n  [{x = 1; y = true}; {x = 2; y = false}; {x = 1; y = true};\n   {x = 2; y = false}; {x = 1; y = true}; {x = 2; y = false};\n   {x = 2; y = false}; {x = 1; y = true}]\n\n\n# Cbor_pack.to_cbor Cbor_pack.Ser.(list_of foo_to_cbpack_cached) l;;\n- : Cbor_pack.cbor =\n`Map\n  [(`Text \"k\",\n    `Array\n      [`Tag (6, `Int 0); `Tag (6, `Int 1); `Tag (6, `Int 0);\n       `Tag (6, `Int 1); `Tag (6, `Int 0); `Tag (6, `Int 1);\n       `Tag (6, `Int 1); `Tag (6, `Int 0)]);\n   (`Text \"h\",\n    `Array\n      [`Map [(`Int 0, `Int 1); (`Int 1, `Bool true)];\n       `Map [(`Int 0, `Int 2); (`Int 1, `Bool false)]])]\n\n# Cbor_pack.to_string Cbor_pack.Ser.(list_of foo_to_cbpack_cached) l |\u003e String.length;;\n- : int = 33\n```\n\nNote that without caching we get a bigger value:\n\n```ocaml\n\n# Cbor_pack.to_string Cbor_pack.Ser.(list_of foo_to_cbpack) l |\u003e String.length;;\n- : int = 63\n```\n\n### Caching during deserialization\n\nSimilarly, during deserialization, some values might be referenced many times\nusing _pointers_ into the cbor-pack heap. Ideally we want to decode\neach value only once, and cache the decoded value.\n\nTo do that there is `Cbor_pack.Deser.create_cache_key`. Let's reuse the example from serialization caching:\n\n```ocaml\ntype nonrec foo = foo = {\n  x: int;\n  y: bool\n}\n\nlet key_foo_deser: foo Cbor_pack.Deser.cache_key = Cbor_pack.Deser.create_cache_key()\n\n(* cached deserializer *)\nlet foo_of_cbpack_cached = Cbor_pack.Deser.with_cache key_foo_deser foo_of_cbpack;;\n```\n\n```ocaml\n# let encoded_foo_list = Cbor_pack.to_cbor Cbor_pack.Ser.(list_of foo_to_cbpack_cached) l;;\nval encoded_foo_list : Cbor_pack.cbor =\n  `Map\n    [(`Text \"k\",\n      `Array\n        [`Tag (6, `Int 0); `Tag (6, `Int 1); `Tag (6, `Int 0);\n         `Tag (6, `Int 1); `Tag (6, `Int 0); `Tag (6, `Int 1);\n         `Tag (6, `Int 1); `Tag (6, `Int 0)]);\n     (`Text \"h\",\n      `Array\n        [`Map [(`Int 0, `Int 1); (`Int 1, `Bool true)];\n         `Map [(`Int 0, `Int 2); (`Int 1, `Bool false)]])]\n\n# let l = Cbor_pack.of_cbor_exn Cbor_pack.Deser.(to_list_of foo_of_cbpack_cached) encoded_foo_list;;\nval l : foo list =\n  [{x = 1; y = true}; {x = 2; y = false}; {x = 1; y = true};\n   {x = 2; y = false}; {x = 1; y = true}; {x = 2; y = false};\n   {x = 2; y = false}; {x = 1; y = true}]\n```\n\n### Example: a tree\n\n```ocaml\ntype tree =\n  | Nil\n  | Node of int * tree * tree\n  [@@deriving cbpack] [@@hashcons];;\n```\n\n\n```ocaml\n# let t =\n    let t2 = Node (2, Nil, Nil) in\n    let t3 = Node (3, t2, t2) in\n    let t4 = Node (4, t3, t2) in\n    Node (1, t4, t4);;\nval t : tree =\n  Node (1,\n   Node (4, Node (3, Node (2, Nil, Nil), Node (2, Nil, Nil)),\n    Node (2, Nil, Nil)),\n   Node (4, Node (3, Node (2, Nil, Nil), Node (2, Nil, Nil)),\n    Node (2, Nil, Nil)))\n```\n\nNote that serializing this (quite redundant) tree into CBOR would produce\na similarly-shaped tree. Here, instead, we obtain this:\n\n```ocaml\n# Cbor_pack.to_cbor tree_to_cbpack t;;\n- : Cbor_pack.cbor =\n`Map\n  [(`Text \"k\", `Tag (6, `Int 3));\n   (`Text \"h\",\n    `Array\n      [`Array [`Int 1; `Int 2; `Int 0; `Int 0];\n       `Array [`Int 1; `Int 3; `Tag (6, `Int 0); `Tag (6, `Int 0)];\n       `Array [`Int 1; `Int 4; `Tag (6, `Int 1); `Tag (6, `Int 0)];\n       `Array [`Int 1; `Int 1; `Tag (6, `Int 2); `Tag (6, `Int 2)]])]\n\n# String.length (Cbor_pack.to_string tree_to_cbpack t);;\n- : int = 34\n```\n\nWithout hashconsing we'd have:\n\n```ocaml\ntype tree2 =\n  | Nil\n  | Node of int * tree2 * tree2\n  [@@deriving cbpack];;\n\nlet t: tree2 =\n    let t2 = Node (2, Nil, Nil) in\n    let t3 = Node (3, t2, t2) in\n    let t4 = Node (4, t3, t2) in\n    Node (1, t4, t4);;\n```\n\n```ocaml\n# Cbor_pack.to_cbor tree2_to_cbpack t;;\n- : Cbor_pack.cbor =\n`Map\n  [(`Text \"k\", `Tag (6, `Int 10));\n   (`Text \"h\",\n    `Array\n      [`Array [`Int 1; `Int 2; `Int 0; `Int 0];\n       `Array [`Int 1; `Int 2; `Int 0; `Int 0];\n       `Array [`Int 1; `Int 2; `Int 0; `Int 0];\n       `Array [`Int 1; `Int 3; `Tag (6, `Int 2); `Tag (6, `Int 1)];\n       `Array [`Int 1; `Int 4; `Tag (6, `Int 3); `Tag (6, `Int 0)];\n       `Array [`Int 1; `Int 2; `Int 0; `Int 0];\n       `Array [`Int 1; `Int 2; `Int 0; `Int 0];\n       `Array [`Int 1; `Int 2; `Int 0; `Int 0];\n       `Array [`Int 1; `Int 3; `Tag (6, `Int 7); `Tag (6, `Int 6)];\n       `Array [`Int 1; `Int 4; `Tag (6, `Int 8); `Tag (6, `Int 5)];\n       `Array [`Int 1; `Int 1; `Tag (6, `Int 9); `Tag (6, `Int 4)]])]\n\n# String.length (Cbor_pack.to_string tree2_to_cbpack t);;\n- : int = 73\n```\n\nwhich is more than twice as long.\n\n### Attributes supported\n\n- `[@ser f]` on type: custom serialize function\n- `[@deser f]` on type: custom deserialize function\n- `[@as_bytes]` on a string type: encode to CBOR bytes, not string.\n    Should be used for non-textual data, i.e. strings not containing valid UTF-8.\n- `[@cstor \"x\"]` on constructor: custom key for this constructor (string)\n- `[@key \"x\"]` on record field: custom key for this field (string)\n- `[@@hashcons]` on type decl: enable hashconsing for this type.\n- `[@@use_field_names]` on type decl: use strings for record fields, not integer offsets\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimandra-ai%2Fcbor-pack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimandra-ai%2Fcbor-pack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimandra-ai%2Fcbor-pack/lists"}