{"id":20091012,"url":"https://github.com/raehik/binrep","last_synced_at":"2025-05-06T03:30:55.482Z","repository":{"id":39916318,"uuid":"484461341","full_name":"raehik/binrep","owner":"raehik","description":"Useful, efficient, precise binary schema modelling in Haskell","archived":false,"fork":false,"pushed_at":"2025-03-11T12:10:37.000Z","size":528,"stargazers_count":14,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-18T01:37:33.555Z","etag":null,"topics":["binary-data","haskell","reverse-engineering","serialization"],"latest_commit_sha":null,"homepage":"","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raehik.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-22T14:21:17.000Z","updated_at":"2025-03-11T12:10:08.000Z","dependencies_parsed_at":"2024-03-17T01:56:46.549Z","dependency_job_id":"e9df8774-4b7c-44be-baa5-9d9d3bbb8466","html_url":"https://github.com/raehik/binrep","commit_stats":{"total_commits":97,"total_committers":1,"mean_commits":97.0,"dds":0.0,"last_synced_commit":"514d711d22d22469236bc7e11dc6152a248dd539"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raehik%2Fbinrep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raehik%2Fbinrep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raehik%2Fbinrep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raehik%2Fbinrep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raehik","download_url":"https://codeload.github.com/raehik/binrep/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252616109,"owners_count":21776920,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-data","haskell","reverse-engineering","serialization"],"created_at":"2024-11-13T16:27:48.823Z","updated_at":"2025-05-06T03:30:55.120Z","avatar_url":"https://github.com/raehik.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"[gh-strongweak]:   https://github.com/raehik/strongweak\n[gh-flatparse]:    https://github.com/AndrasKovacs/flatparse\n[gh-mason]:        https://github.com/fumieval/mason\n[gh-refined]:      https://github.com/nikita-volkov/refined\n[hackage-gdf]:     https://hackage.haskell.org/package/generic-data-functions\n[hackage-bytezap]: https://hackage.haskell.org/package/bytezap\n\n# binrep\nbinrep is a Haskell library for *precisely modelling binary schemas*, especially\nlow-context byte-oriented file formats e.g. C enums, and working with them\neffectively and efficiently. Here's why it's useful:\n\n  * **Explicit:** Define Haskell data types with the binary schema \"baked in\".\n    Use highly parameterized binary representation primitives including\n    null-terminated data (e.g. C-style strings), Pascal-style data (length\n    prefixed), sized explicit-endian machine integers, null-padded data.\n  * **Low boilerplate:** Free performant parsers and serializers via generics.\n    _(See [Generic binary representation](#generic-binary-representation).)_\n  * **Easy validation:** Use the [strongweak][gh-strongweak] library design\n    pattern to define an unvalidated data type for easy internal transformation,\n    and get validation code for free.\n  * **Performant:** Parsing and serialization is *extremely fast*, using\n    [bytezap][hackage-bytezap] and [flatparse][gh-flatparse].\n\n## Usage\n### Dependencies\nYou need the **ICU library**. For running, you just need the runtime. For\nbuilding, you need development files as well (headers etc). Alternatively, you\nmay turn off the ICU features with a Cabal flag.\n\n## Philosophy\n### Modelling, not serializing\nbinrep is good at modelling binary data formats. It is not a plain\n\"serialization\" library, where the actual binary representation is hidden from\nthe user (intentionally, with good reason). The binary and cereal libraries are\ngreat choices for that. They are interested in defining efficient binary codecs\nfor Haskell data. However, their codec typeclasses *hide representation\ndecisions* from the user. In cereal,\n\n  * machine integers are encoded with\n    [big endian](https://hackage.haskell.org/package/cereal-0.5.8.2/docs/src/Data.Serialize.html#line-182)\n  * bytestrings are written with an\n    [8-byte length prefix](https://hackage.haskell.org/package/cereal-0.5.8.2/docs/src/Data.Serialize.html#line-498)\n\nThese are fine decisions. But they aren't accurate to the types. Endianness is\nan implementation decision.\n\nbinrep refuses to work with a machine integer unless it knows the endianness.\nBytestrings are split into C-style (null-terminated) and Pascal-style\n(length-prefixed). This enforces careful consideration for the binary data being\nmodelled.\n\n### Validation without boilerplate\nA C-style bytestring must not contain any `0x00` null bytes. A Pascal-style\nbytestring must be short enough to be able to encode its length in the length\nprefix machine integer. But checking such invariants is tedious work. Am I\nreally going to wrap everything in a bunch of newtypes and force users to call a\nbunch of checker functions every time?\n\nYes and no. Yes, binrep uses newtypes extensively, though most are type synonyms\nover the `Refined` newtype from Nikita Volkov's wonderful [refined][gh-refined]\nlibrary. No, binrep doesn't want you to wrangle with these day-to-day. One\nsolution is to define a simplified \"weak\" type, and convert between it and the\nbinary-safe \"strong\" type. My [strongweak][gh-strongweak] library provides\nsupporting definitions for this pattern, and generic derivers which will work\nwith binrep's binary representation primitives.\n\n## Generic binary representation\n_(Generics are now handled by [generic-data-functions][hackage-gdf]. This info\nis largely the same, but the code is elsewhere.)_\n\nbinrep includes powerful generics for automatically writing instances.\nThey all work the same way:\n\n  * Constructors are encoded by sequentially encoding every enclosed field.\n    * Empty constructors thus serialize to 0 bytes.\n  * For sum types, the constructor is disambiguated via a tag obtained from the\n    constructor name.\n    * Tags may be parsed on the type or term level.\n\nNote that when parsing sum types, we compare tags sequentially. You may design\nyour tag schema to have a more efficient approach. In such cases, consider using\n`Generic.Data.FOnCstr` from [generic-data-functions][hackage-gdf].\n\nAs an example, you could encode constructor names as a null-terminated ASCII\nbytestring for a tag. (This is provided at `Binrep.Generic.nullTermCstrPfxTag`.)\nAlternatively, you may encode each constructor at a unique byte value, stated at\nthe end of the constructor name.\n\nSum types (data types with multiple constructors) are handled by first encoding\na \"tag field\", the value of which then indicates which constructor to use. You\nmust provide a function to convert from a constructor name to a (unique) tag.\nYou could encode them as a null-terminated ASCII bytestring (this is the\ndefault), or as a single byte. To ease this, you may consider putting the tag\nvalue in constructor names:\n\n```haskell\ndata BinarySumType = B1 | B2\n\ngetConstructorTag :: String -\u003e Word8\ngetConstructorTag = read . drop 1\n\n-- \u003e\u003e\u003e getConstructorTag \"B1\"\n-- 1\n\n-- Or use our generic helper, which takes hex values:\n--\n-- \u003e\u003e\u003e cSumTagHex @Word8 (drop . 1) \"BFF\"\n-- 255\n```\n\n## Similar projects\n### Kaitai Struct\n[Kaitai Struct](https://kaitai.io/) is a wonderful declarative parser generator\nproject. They bolt an expression language and a whole lot of binary cleverness\non top of a nice YAML schema. It comes with an IDE, a visualizer, and you can\ncompile schemas down to parsers for various different languages (no Haskell...).\n\nDesign principles like their fancy absolute offset handling and language\nneutrality have stunted serialization support. Though it's more like they have\nsuch powerful parsing that they can parse formats that can't be edited and\nre-serialized naively, like archives with file indexes. For proper handling, one\nshould store a file table, and serialization generates the index. So in reverse,\nyou would want to combine them. But it's a bit program-y. In binrep, you are in\na programming language, so it's less of a problem... but I'm not sure if we can\nbe very efficient at absolute offset stuff.\n\nRealistically, Kaitai Struct is the best decision for fast iteration on\nreversing unknown data. binrep is useful for loading data straight into Haskell\nfor further processing, especially converting between simpler formats.\n\n### Wuffs\n[Wuffs](https://github.com/google/wuffs) is a crazy exploration into safe\nlow-level code via strong typing. You have to annotate every possibly dangerous\nstatement with a proof of safety. It's a tedious, explicit, very safe and very\nfast imperative language for defining parsers and serializers.\n\nWuffs is more a codec engineer's tool than a reverse engineer's one. binrep\nisn't really interested in speed, and being a Haskell library we get to focus on\ndefining types and their composition in a declarative \u0026 functional manner. As\nsuch, we get to define more useful things quicker using binrep. Though we share\nmany core ideas, such as refinement types.\n\nCheck out Wuffs if you need to write a bunch of codecs and they really, really\nneed to be both fast and safe. The trade-off is, of course, your time.\n\n### flat\nhttps://hackage.haskell.org/package/flat\n\nCool, bit-oriented rather than byte-oriented.\n\n## License\nProvided under the MIT license. See `LICENSE` for license text.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraehik%2Fbinrep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraehik%2Fbinrep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraehik%2Fbinrep/lists"}