{"id":17537639,"url":"https://github.com/zenhack/layout-dsl","last_synced_at":"2025-04-23T21:06:42.704Z","repository":{"id":151019425,"uuid":"74299790","full_name":"zenhack/layout-dsl","owner":"zenhack","description":"DSL for specifying data layout","archived":false,"fork":false,"pushed_at":"2017-04-26T03:15:30.000Z","size":97,"stargazers_count":5,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-23T21:06:34.437Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Haskell","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zenhack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"zenhack"}},"created_at":"2016-11-20T20:13:43.000Z","updated_at":"2021-09-29T19:17:08.000Z","dependencies_parsed_at":"2023-05-02T16:31:35.226Z","dependency_job_id":null,"html_url":"https://github.com/zenhack/layout-dsl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenhack%2Flayout-dsl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenhack%2Flayout-dsl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenhack%2Flayout-dsl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenhack%2Flayout-dsl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zenhack","download_url":"https://codeload.github.com/zenhack/layout-dsl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250514782,"owners_count":21443209,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-20T20:42:22.142Z","updated_at":"2025-04-23T21:06:42.692Z","avatar_url":"https://github.com/zenhack.png","language":"Haskell","readme":"DSL for describing data layouts. Currently this is Apache-2.0 licensed,\nbut future versions/implementations may be under different licenses\n(definitely FOSS); we'll play it by ear.\n\nNOTE: if you're viewing this on Github, you should be aware that the\nGithub repository is a mirror; the canonical repository is on\nGitlab.com, at:\n\n\u003chttps://gitlab.com/isd/layout-dsl\u003e\n\nPlease report issues and submit pull requests there.\n\n# The Problem\n\nWhen doing low-level programming -- writing hardware drivers,\nimplementing networking protocols, binary file formats, and so\nforth, you often need to deal with some structures that don't\nnicely map to a construct in your programming language. Examples:\n\n* The x86 GDT entries.\n* ACPI tables.\n* IP/Tcp packets\n\nFor the GDT entry, [Zero][1] defines the following struct:\n\n    struct GDTEnt {\n        uint16_t limit_low;\n        uint16_t base_low;\n        uint8_t base_mid;\n        uint8_t access;\n        uint8_t flags_limit_high;\n        uint8_t base_high;\n    }__attribute__((packed));\n\nNote the following:\n\n* We need `__attribute__((packed))`, because otherwise C may introduce\n  unwanted space between some of the fields\n* We have conceptually-single fields spread across multiple locations\n  in the struct; `base` spans three different struct fields.\n* Some fields don't divide evenly into bytes, so they're awkwardly\n  sharing variables. `flags_limit_high` is conceptually a four-bit\n  flags field, and a four-bit chunk of the limit.\n* Some fields are smaller than a single byte; both flags and access are\n  made up of smaller bit fields.\n* Sometimes we *do* want padding, but not the way the compiler will do\n  it. For example, two of the bits in the flags field are supposed to\n  just be zero, and one of the bits in access must always be 1.\n\nWorking with these structures is a bit awkward.\n\nThe trouble is that data definitions in C, and almost every other\nprogramming language, really aren't designed to specify such awkward\nstructures. They simply can't express many of the patterns you see in\nthis space.\n\nC's data types are also also somewhat oriented towards making the\ncompiler's job easier; generating machine code that splits a logical\nvalue across three different locations isn't quite as trivial as storing\na value in a word-aligned memory location.\n\nRust doesn't even define the memory representation for its types, and\nwith good reasons. [See this section of the Rustonomicon][2] for\ndetails.\n\n..Though note that you can impose layout restrictions like those in C,\nif needed.\n\nFinally, as described in the Rustonomicon, using packed structs like the\nabove can be a little risky, since it opens up the possibility of doing\nunaligned loads and stores.\n\n# Proposed solution\n\nA DSL for defining bit-level data layouts in an expressive way. Tools\ncan be written to derive getters and setters for these types, or to\ngenerate constant values to embed in executables, or potentially other\napplications.\n\nThis is still very WIP. Below we describe an overview of current\nthinking. The `prototype` directory has the beginnings of an\nimplementation of a tool that generates getters and setters as suggested\nbelow.  The file `grammar.md` contains a (partial) formal grammar,\nbut it may not be entirely up to date with what the prototype\nimplementation does. The spec will become more accurate once the\nlanguage itself is more stabilized. `examples/` contains some examples.\n\n## Overview\n\nFor each data type, we define two things:\n\n1. The logical structure of the data type\n2. Its physical layout.\n\nExample:\n\n    /* Declares a logical view of the data; base is one conceputal\n     * field, so we express it that way. */\n    type GDTEnt struct {\n        // The `uint` type can be parametrized over any bit length:\n        base: uint\u003c32\u003e\n        limit: uint\u003c20\u003e\n\n        flags: struct {\n            gr, sz: bool\n        }\n\n        access: struct {\n            ac, rw, dc, ex, pr: bool\n            privl: uint\u003c2\u003e\n        }\n    }\n\n    /* Declares the physical layout of the data. The whole thing is\n     * declared to be little endian; this is inherited by component\n     * fields unless they specifically override it (see the section on\n     * endianness, below).\n     *\n     * Endianness is unspecified by default, but if left unspecified\n     * may only be used as part of a larger structure.\n     */\n    layout GDTEnt (little) {\n        // Denotes the bottom 16 bits of the limit field. Can be specified as\n        // either [hi:lo] or [lo:hi]. We allow both to make transcribing\n        // from hardware manuals easier. The range is *inclusive*, with\n        // the lowest bit having index 0.\n        limit[15:0]\n\n        base[23:0]\n\n        access {\n            // Without the slice notation, we embed the whole field.\n            // Booleans are assumed 1 bit, true = 1, false = 0.\n            ac rw dc ex\n\n            // A bit that is always 1. Syntax is Verilog inspired, of\n            // the form \u003clength\u003e'\u003cradix\u003e\u003cvalue\u003e. The radix `b` is\n            // base 2.\n            1'0b1\n\n            privl // 2 bits wide; derived from the type declaration.\n            pr\n        }\n\n        limit[19:16]\n        flags {\n            2'0b0 // 2 bit field with the value 0.\n            sz\n            gr\n        }\n        base[24:31]\n    }\n\nA tool could then be used to generate C code that could be called like\nso:\n\n    GDTEnt_set_base(\u0026ent, 0xffffffff); // set the value of the `base` field.\n    uint32_t lim = GDTEnt_get_limit(\u0026ent); // get the value of the `limit` field.\n\nOr in a language that has a bit more powerful mechanisms for\nabstraction, such as C++ or rust:\n\n    ent.base = 0xffffffff;\n    uint32_t lim = ent.limit;\n\n## Endianness\n\nEndianness can be declared explicitly on an entire layout, or on any field,\nand is inherited by sub-components unless they specifically override it.\nAs an example, suppose we have some sort of packet, with big-endian\nheaders and trailers, but whose payload is little endian. We might have a\nlayout like this:\n\n    layout Packet (big) {\n        type[0:4]\n        0'b4\n        total_len\n        src_addr\n        dest_addr\n        // other header fields\n        payload (little) {\n            foo[0:16]\n            bar[0:32]\n            3'b0\n            // ...\n        }\n        // trailer fields\n        trailer1[0:4]\n        trailer2[0:8]\n    }\n\nA layout is permitted to omit endianness information at the top level,\ne.g:\n\n    layout Foo {\n        bar\n        baz[0:3]\n        6'0x3f\n    }\n\nIn this case, the data type can only be used as part of a larger\nstructure that *does* define the overall endianness.\n\n## Alignment\n\nAlignment could be specified with an annotation similar to that used for\nendianness, e.g:\n\n    layout Foo (little,align=8B) {\n        // ...\n    }\n\nWould denote a little-endian structure that must be 8-byte aligned.\n\nTools should detect inconsistencies in alignment specifications and\nreport them as errors, e.g:\n\n\n    type foo {\n        x: bar\n    }\n\n    type bar {\n        y: bool\n    }\n\n    layout foo (align=4) {\n        x\n    }\n\n    layout bar (align=8) {\n        y 63'b0\n    }\n\nIn this case, foo will be aligned on a 4-byte boundary, but one of it's\nmembers (x) demands an 8-byte alignment.\n\n## Recommendations for code-generation tools\n\nThe most obvious class of tool that uses this language is one which\ngenerates data types and getters/setters for a programming language,\nsuch as C, C++, or Rust. This section lists some general recommendations\nfor these tools.\n\n### Setters should check inputs\n\nIn many cases, the host language will not be able to capture the\nrequired constraints in its type system. For example, C does not have a\n2-bit unsigned integer type, so a setter for the `privl` field in the\n`GDTEnt` structure defined above could be passed values that don't\nactually fit in the field.\n\nSetters should check inputs and panic/abort if they are invalid. In some\ncontexts this may a problem for performance, in which case it is\nacceptable to generate `*_unsafe` variants, but APIs should discourage\ntheir use.\n\n# Open Questions\n\nThis section is basically a brain dump of other things we could add if\ndesired. My inclination is to take wait-and-see approach with these, and\ntry to avoid mission/feature creep.\n\n## Sum types?\n\nIf we have a data structure like (OCaml syntax):\n\n    type t =\n        | Foo of (uint32, uint64)\n        | Bar of bool\n\nA typical approach to laying this out in memory is to have something\nsimilar to this C declaration:\n\n    struct T {\n        int tag;\n        union {\n            struct {\n                uint32_t x;\n                uint64_t y;\n            } foo;\n            struct {\n                bool x;\n            } bar;\n        };\n    };\n\nI've seen similar structures (I think there's one that shows up in one\nof the APIC related tables?), but that have the \"tag\" in some weird spot\nin the middle of the data structure. I want to be able to express this\nsort of thing as well.\n\nOne stab, which is more verbose than I want:\n\n    type TTag enum(uint) {\n        foo = 1\n        bar = 7\n    }\n\n    type T union(TTag) {\n        foo: struct {\n            x: u32\n            y: u64\n        }\n        bar: struct {\n            x: bool\n        }\n    }\n\n    layout T {\n        value[0:56]\n        tag[0:2]\n        value[56:96]\n        tag[2:3]\n    }\n\nAlso, this fails to capture more tricky cases like the mips instructions\ndescribed below.\n\n## Pointers/References?\n\n* Relative vs absolute?\n* Physical/virtual address distinction?\n* \"Far\" pointers? Places this shows up:\n  * real mode x86\n  * Cap'N Proto\n\nI see two ways of approaching this:\n\n* Mostly punt; keep this somewhat impoverished\n* Try to come up with a way to compose things. We're not going to\n  capture every possible addressing model by just enumerating them.\n\n## MMIO/other constrained access methods.\n\nMMIO structures tend to have requirements about the load/store sizes.\nThe language should be able to capture this, and tools should generate\ncode that respects this.\n\nThere are also other cases where data must be accessed in particular\nways, e.g. port IO on x86, or filesystems stored on disk.\n\nIt may make sense to have field access constraints defined as a third\nfacet, alongside logical and physical layout.\n\nFor now, the \"default field access definition\" assumes memory\nwith specified alignments; portio and other weirder things are left\nas future work.\n\n## Variable length fields?\n\nSometimes you'll see data structures that look like this:\n\n\n    +-------------+\n    | length of A |\n    +-------------+\n    | A           |\n    +-------------+\n    | length of B |\n    +-------------+\n    | B           |\n    +-------------+\n    | ...         |\n\nFrom an API standpoint, it would be nice to provide something\niterator-like.\n\n## What code to gen?\n\n* What should the exposed APIs look like? What tools would be useful? We\n  have some notions described above, but should keep thinking on this.\n\nThoughts:\n\n* Would be nice if we could statically embed complex values. In Zero we\n  do some hackery with C macros to have the final GDT embedded in the\n  executable, so there's no run-time construction. In general many data\n  structures are too difficult to express for this.\n\n## Things to look into\n\n* Read about various things like:\n  * Filesystems\n  * Network protocols\n  * Hardware (pick through intel/arm manuals, maybe ppc and some more\n    obscure architectures)\n  * Instruction set encodings?\n    * Mips includes some interesting challenges; See below. Probably\n      similar examples in places I'm less familiar with.\n    * This very well may be out of scope.\n* See what prior art exists.\n\n## Mips encoding\n\nMips has three basic encoding types: R, I, and J. In all cases, the\nfirst six bits are an opcode. For R-Type instructions, the opcode field\nis zero, and there is a secondary 'funct' field elsewhere in the\ninstruction. The I and J types only have one opcode.\n\nThe naive sum type solution above doesn't work here, since you don't\nhave two separate tags for the I vs. J distinction and for the\nindividual opcodes between them.\n\n[1]: https://github.com/zenhack/zero\n[2]: https://doc.rust-lang.org/nomicon/data.html\n","funding_links":["https://github.com/sponsors/zenhack"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenhack%2Flayout-dsl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzenhack%2Flayout-dsl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenhack%2Flayout-dsl/lists"}