{"id":20428716,"url":"https://github.com/cpressey/polyrical","last_synced_at":"2026-02-04T02:08:40.598Z","repository":{"id":149745383,"uuid":"376073095","full_name":"cpressey/PolyRical","owner":"cpressey","description":"[WIP] Design for an architecture-agnostic macro assembler with advanced static analysis","archived":false,"fork":false,"pushed_at":"2021-06-12T11:03:17.000Z","size":12,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-05T05:29:47.182Z","etag":null,"topics":["architecture-agnostic","flow-typing","macro-assembler","static-analysis","symbolic-execution"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cpressey.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-11T15:48:18.000Z","updated_at":"2021-09-06T17:55:29.000Z","dependencies_parsed_at":"2023-04-25T03:38:34.311Z","dependency_job_id":null,"html_url":"https://github.com/cpressey/PolyRical","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cpressey/PolyRical","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpressey%2FPolyRical","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpressey%2FPolyRical/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpressey%2FPolyRical/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpressey%2FPolyRical/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cpressey","download_url":"https://codeload.github.com/cpressey/PolyRical/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpressey%2FPolyRical/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264614401,"owners_count":23637578,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["architecture-agnostic","flow-typing","macro-assembler","static-analysis","symbolic-execution"],"created_at":"2024-11-15T07:28:26.182Z","updated_at":"2026-02-04T02:08:40.564Z","avatar_url":"https://github.com/cpressey.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"PolyRical\n=========\n\n_Work-in-Progress_\n| _See also:_ [SixtyPical](https://github.com/catseye/SixtyPical#readme)\n∘ [Shelta](https://github.com/catseye/Shelta#readme)\n\n- - - -\n\nPolyRical is a successor language to [SixtyPical][].  It explores a few (but by no\nmeans all) of the avenues mentioned in the [Future directions for SixtyPical][] document.\n\nLike SixtyPical, it is a very low-level language that nonetheless supports advanced\nforms of static analysis.  It aims to be based on a cleaner, more general theory of\noperation than SixtyPical, and thus (hopefully) able to target architectures\nother than the 6502.  PolyRical is also more like a macro assembler than SixtyPical,\nand there is less emphasis on permitting optimizations.\n\nThis document is still at the **design stage**.  It is also poorly organized.  No code\nhas been written, and all of the decisions described here are subject to change.\n\n[SixtyPical]: https://catseye.tc/node/SixtyPical\n[Future directions for SixtyPical]: https://gist.github.com/cpressey/f35e104b3e3cf555824aa2b4d15ea858\n\nMotivating example\n------------------\n\nA PolyRical program consists of directives, template definitions, and global declarations.\nTypically, a program would import a set of templates from a library using a directive,\nbut to convey the flavour of the the language, here is a self-contained example program:\n\n    word[8] register A\n    word[8] location score @ 0xc000\n    \n    template load(out A, word[8] value val) {\n        /* LDA immediate */ 0xA9 val\n    }\n    \n    template store(in A, out word[8] location dest) {\n        /* STA absolute */ 0x8D lo(dest) hi(dest)\n    }\n    \n    routine(out score, trash A) main {\n        load(A, 0)\n        store(A, score)\n    }\n\n(The text within `/*` and `*/` is comments and is ignored by the compiler.  It is\nsupplied in the example to clarify what opcodes these templates will emit.)\n\nObserve that this is enough information to consider `main`, shown above, to be a valid\nroutine, and to (almost) produce a machine-language program for it, while rejecting\n\n    routine(out score, trash A) main {\n        store(A, score)\n    }\n\nwith a message such as `'store' expects 'A' to be meaningful but in 'main' it is not necessarily so`,\nand also rejecting\n\n    routine(out score, trash A) main {\n        load(A, 0)\n    }\n\nwith a message such as `'main' does not give a meaningful value to 'score'`.\n\nYou can also read [a more involved example](#a-more-involved-example) below.\n\nIn more detail\n--------------\n\nA global declaration consists of a type, a role, a name, an optional initializer, and\nan optional location.  Some combinations are not valid — for instance, any declaration\nwith the role `value` must include the initializer.\n\nGlobal declarations declare global variables, constants, and routines.  A routine is\njust a constant of `routine` type.  (More specifically, it is a read-only initialized\nlocation of `routine` type.)\n\nEach routine type is parameterized with a set of global references, and constraints on\nthose globals.\n\nThere are no parameters to a routine, and no local variables.  All variables are\nglobal.  But each routine must conform to the constraints that its type imposes on\nthe globals in the program.  So, if the type declares that the variable `A`\nwill be given a meaningful value by the routine, then the routine must give it a\nmeaningful value, or the implementation should signal an error condition.\n\nTemplates, too, have such constraints.  Unlike routines, templates do have a list\nof parameters.  Also unlike routines, the constraints declared by a template are\nnot checked by the implementation (often they would not be checkable, as templates\nprovide machine-level details for implementing an operation).  But they are\npropagated to the rest of the program.  Most programmers would use a library of\ntemplates instead of writing their own.\n\nA routine body consists of a list of template applications.  The template\ndefinition used in a template application is selected based on the types and\nroles of the parameters, each of which may be a global or a literal value.  This is\nlike method overloading in languages such as Java.  Actually, it goes further,\nin that a template definition can list a particular global, rather than just a\ntype and role, as one of its parameters.  This definition will be selected when\nthis exact global is given as the parameter.  The `load` and `store` templates in\nthe above example demonstrate this for the global `A`.\n\n### Meaningful values\n\nThe most prominent property of global declarations that PolyRical tracks is _meaningfulness_.\nThis is very similar to how many C compilers track _defineness_ of variables, and\nare able to warn the user if the code uses a variable that is not defined, or may not\nbe defined in all cases.\n\nMeaningfulness is controlled by three constraints on each routine type and on each\ntemplate:\n\n*   `in` asserts that the global must be meaningful before the routine or template is applied\n*   `out` asserts that the global will be meaningful after the routine or template is applied\n*   `trash` asserts that the global will not be meaningful after the routine or template is applied\n\nThe meaningfulness of all other globals is preserved when a routine or template\nis applied.\n\nIn particular, `in` asserts the meaningfulness of a global during input, so\nin the absence of `trash` on the same global, that global is assumed to\nalso be meaningful on output.\n\nOther properties of globals beyond meaningfulness, such as range, and whether\na routine is  ever called, are trackable by symbolic execution.  SixtyPical already\ntracks several of these, and one day PolyRical might as well.\n\n### Data types\n\nThe most basic data type is the `word`, which, because we want the possibility\nof targeting systems other than the 6502, is parameterized by its size.  An\n8-bit byte is `word[8]` and a 16-bit word is `word[16]`.  Following this pattern,\na single-bit flag register is `word[1]`.\n\nRoutine types have already been discussed.\n\nThere is also an array (or table) type, which represents a contiguous section\nof memory, when applied to a location.  There could, in theory, also be table\nvalues, to represent things like string constants, but this area is not well worked out.\n\nAlthough there may one day be a pointer type, there are two types which are\nrelated but more relevant.\n\nThe first is an index type, which is effectively a pointer within a particular table.\nIt is possibly a role, rather than a type, because it is also effectively just an\ninteger value with a limited range (and the concept of limited range can be applied\nto any value.)\n\nWhichever way it is, templates for operations which access or update a location\nwithin a table will take an index and use it.  The template decides exactly how\nto compute the offset to within the table.  So, for example, a table of 16-bit\nvalues can be implemented as a single memory table, multiplying the index by\ntwo, and retrieving the byte at the index, and the next byte in memory; or it\ncan be implemented by two memory tables, accessed with the same index, with the\nhigh byte stored in one and the low byte in the other.  A template for either\nkind of access ought to be constructible.\n\nThe second is a pointer to code.  The routine type covers most of these cases,\ne.g. a jump table is a table of values of routine type.  But for conditional\ntemplates (see below) we may need to expose a \"label\" type which is a pointer\nto an instruction somewhere inside a routine.  Because of how much these things\ncomplicate analysis, it's likely their use will be highly restricted.\n\nMeaningfulness has \"at least as much as\" properties when it comes to using values of\nroutine type.  Such a table might be defined with a type of routines that\ntrash a given global (for example, `A`).  This should be thought of as\nsaying \"In the worst case, the routines stored in this table trash A\".  It\nshould be entirely possible to assign a routine that does _not_ trash A, to\na cell in that table.\n\n### Roles\n\nAlong with a type, each global has a role.  There are two main roles, which\nare `location` and `value`.  These are similar to the concepts of `lvalue` and\n`rvalue` respectively, in languages such as C.  A global which is a\n`location` supports the operation of having its address taken; it does not\ndirectly support having its value set or read, but there may be machine\ninstructions which do this, and one or more templates defined which use them.\nA `value`, on the other hand, does not support having its address taken, but\ndoes directly support having its value taken.\n\nRoles are considered when selecting a template for given actual parameters.\nThis prevents such things as trying to assign a value to another value; there\nwill typically be no such template with that signature.\n\nEvery literal constant is considered a `value`.\n\nThere are other, more specialized roles.  `register` is like `location` but\ndoes not support having its address taken.\n\n### Control flow\n\nStatic analysis is easy if all programs are entirely straight-line code — the\ncomplications come up when branching occurs.\n\nBut branching is also an opportunity, because every time a branch occurs, the\nanalyzer has more information about what conditions must pertain in each branch.\n(cf. [flow typing][])\n\nFor example, if we branch on the carry flag, we know that, in the code that we\nbranch to, the carry flag must necessarily be set; and in the code where the branch\nwas failed to be taken, the carry flag must necessarily be clear.\n\nIdeally, we'd like to capture control flow in templates; templates should take\nblocks as parameters, allowing the programmer to, for example, define a template\ncalled `ifzero` that works like so:\n\n    template ifzero(block) {\n        /* BNE */ 0xd0 rel(label)\n        block\n        label:\n    }\n\nThat is, it takes a block and generates the machine code for making a conditional\nbranch against the Z flag, and machine code for the entire passed-in block.  This\nallows us to implement control structures in an architecture-agnostic way.\n\nHowever, this complicates analysis significantly.  If the user is allowed to\nwrite arbitrary combinations of branches and labels inside a template, the analyzer\nneeds to be able to handle arbitrary combinations of branches and labels.\nWe'd prefer to avoid that.\n\nWe can avoid that by providing only canned control structures, such as `if`\nand `repeat`, at either the template level or the routine level.  But machine\nlanguages usually have many specialized branch instructions.  It is unclear\ncurrently how best to allow these control structures to use these instructions.\n\nFor other purposes, there are facilities we can use which are somewhat easier to\nhandle.  One such facility is templates that are employed implicitly.\n\nFor instance, the reason we said the motivating example given above is only \"almost\"\nenough information to produce a machine-language program, is that we haven't defined\nwhat the `main` routine should do when it's finished.  Presumably it should return to\nits caller, but the program doesn't provide a way to say that in machine language.\nBut we can provide this information by defining an implicit template like\n\n    template _return() {\n        /* RTS */ 0x90\n    }\n\nand the compiler would insert this at the end of each routine as necessary.\n\nCorrespondingly, we need to define what it means to the machine language to call\na routine:\n\n    template _call(routine r) {\n        /* JSR */ 0x20 lo(r) hi(r)\n    }\n\nIt might be the case that implicit templates can be used for control structures as\nwell, but it is less clear in what exact manner that would happen.\n\nOne part of it would probably be an implicit template for an unconditional jump:\n\n    template _jump(label r) {\n        /* JMP absolute */ 0x4c lo(r) hi(r)\n    }\n\n(And, just as an aside, the library might define symbolic constants for opcodes\ninstead of writing them in comments like we've been doing here.  Like so:\n\n    word[8] value JMP_abs = 0x4c\n    template _jump(label r) {\n        JMP_abs lo(r) hi(r)\n    }\n\nIn subsequent examples we'll start assuming such symbolic constants have been\ndefined.)\n\nIt might be possible to have condition templates which represent the possible\nconditions in an `if` or `repeat` test.  The condition name would be passed\nto the control structure, and the control structure would select the condition\ntemplate for that name, when generating the test part of the control structure.\n\nOne complication is that the condition template needs to know the location\nin the program to branch to.  Unlike most locations, this is the address of\na machine instruction.  So it might have its own special role.\n\nAnother complication is that sometimes it is advantageous for the compiler to\ngenerate a branch for when the condition is true, and sometimes the branch for\nwhen it is *not* true.\n\nSo, condition templates take two extra parameters, supplied by the system:\nthe label to jump to and the sense of the test that is being generated.  The\ntemplate library should provide templates to cover both cases.  For example,\n\n    template zero?(label, true, in A) {\n        BEQ rel(label)\n    }\n\n    template zero?(label, false, in A) {\n        BNE rel(label)\n    }\n\nIn a routine, this template would be invoked when compiling a control\nstructure like `if` or `repeat` that takes a condition, like so:\n\n    routine(out score, in A) main {\n        if zero?(A) {\n            store(A, score)\n        }\n    }\n\nWhat sense of test the compiler wants to generate for this, is up to\nthe compiler.  The library has supplied both templates, it will pick the\nneeded one.\n\n### Template format\n\nThe formal arguments of the template are given in a list; `in`, `out`, and\n`trash` modifiers are attached to them directly.\n\nOnly `location` role arguments can be given `in`, `out`, and `trash`\nmodifiers; they don't make sense on those of `value` role.\n\nThe template may involve the state of the machine beyond just the arguments\nit is given.  When it does this, it should give a list of globals that\nare involved, and `in`, `out`, and `trash` modifiers on them as necessary.\nThis is not an ordered list, it is a set, and it appears after the list of\narguments.\n\nExample (not necessarily a good template, but demonstrates the features):\n\n    template lda(word[8] value val) : (out A) {\n        0xA9 val\n    }\n\nThe body of the template consists of a list of 8-bit bytes.  (Certainly one\ncould argue this is not the apex of architecture-agnosticism, but, we will\naccept some limitations in the name of getting something done.)\n\nThese emitted bytes are specified by literal values, or functions of\nparameter or global names.\n\nLiteral values are emitted directly in the output binary.  They are usually\ngiven in hexadecimal, and correspond to opcodes or constant operands.\n\nParameters or globals of the `value` role resolve to their value.  If the\nvalue consists of more than 8 bits, a function must be used to extract 8 bits\nat a time.\n\nParameters or globals of the `location` role resolve to their address.\nIf the address consists of more than 8 bits, again, a function must be used\nto extract 8 bits at a time.\n\nOther functions should be available to, say, convert an absolute address into\none relative to the current emitting address (for relative branches).\n\nIn the above examples, the functions `lo()`, `hi()`, and `rel()` have served\nthese purposes.\n\n### Limitations on templates\n\nCan templates call other templates?  On the one hand this seems like it could\nbe useful.  On the other hand it complicates analysis.  In a sense, templates\nshould be considered \"atomic units\" with respect to analysis.  They simply\ntell us what it is they affect; we take their word for it, and shouldn't have\nto check them.  Also, any aggregation of template bodies could be done by\nhand, so templates-calling-other-templates isn't strictly necessary.\n\nWhat happens when one of the parameters is the same as one of the globals\nin the \"this template also involves\" set?  This, too, complicates analysis.\nIt's tempting to say that the situation should be just disallowed, because it's\nhard to see how it leads to more utility in a clean way, and easy to see how\ntemplate-hygiene-violation-like errors could happen with it.\n\n### More things to think about\n\nTemplates should still be able to take blocks, to support things like\nSixtyPical's `with interrupts off` and `save`.  Blocks are OK, it's unrestricted\nuse of labels we want to avoid.\n\nSome architectures have a data stack that is shared between routines.  This\nshould be a global of stack type.  Routines should ideally be able to notate\nhow they affect the stack.  Think: type declarations for Forths.\n\nZero-page versus 16-bit addresses (in 6502).  Generalizing this means\nsupporting \"different kinds of pointers\".  Another example is segment:offset\nreferences on 80286.  Possibly type qualifiers a la Dieter would work here.\n\nA more involved example\n-----------------------\n\nThis is [the \"echo\" program from SITU-SOL](https://raw.githubusercontent.com/catseye/SITU-SOL/master/doc/bootstrap-zero/images/tumblr_inline_nqr9ipKWZB1tvda25_540.jpg)\n([hand-assembled version](https://raw.githubusercontent.com/catseye/SITU-SOL/master/doc/bootstrap-zero/images/tumblr_inline_nqr9jfJpBU1tvda25_540.jpg))\nconverted to PolyRical.\n\nGiven that none of this is implemented yet, there are almost certainly shortcomings\nin this code, and you are urged to treat it as a sketch.\n\n    include \"lib/c64.polyrical\"\n\n    word[8][256] location line @ 0xC000\n\n    routine (out line, trash a, trash y) read_tty {\n        load(Y, 0)\n        repeat {\n            chrin()\n            store(A, line, Y)\n            inc(Y)\n        } until equal?(A, 0x0d)\n    }\n\n    routine (in line, trash a, trash y) write_tty {\n        load(Y, 0)\n        repeat {\n            load(A, line, Y)\n            chrout()\n            inc(Y)\n        } until equal?(A, 0x0d)\n    }\n\n    routine (trash a, trash y, trash line) main {\n        repeat {\n            read_tty()\n            write_tty()\n        } forever\n    }\n\nwhere \"lib/c64.polyrical\" would contain something like\n\n    include \"lib/6502.polyrical\"\n\n    extern routine(out a) chrin @ 0xFFCF\n    extern routine(in a, trash a) chrout @ 0xFFD2\n\nand \"lib/6502.polyrical\" would contain something like\n\n    word[8] register A\n    word[8] register Y\n\n    template load(out Y, word[8] value val) {\n        /* LDY immediate */ 0xA0 val\n    }\n\n    template load(out A, in word[8][*] location addr, in Y) {\n        /* LDA absolute, Y */ 0xB9 lo(addr) hi(addr)\n    }\n\n    template store(in A, out word[8][*] location dest, in Y) {\n        /* STA absolute, Y */ 0x99 lo(dest) hi(dest)\n    }\n\n    template inc(in out Y) {\n        /* INY */ 0xC8\n    }\n\n    template equal?(label, true, in A, word[8] value v) {\n        /* CMP immediate */ 0xC9 v\n        /* BEQ */ 0xF0 rel(label)\n    }\n\n    template equal?(label, false, in A, word[8] value v) {\n        /* CMP immediate */ 0xC9 v\n        /* BNE */ 0xD0 rel(label)\n    }\n\nNote that in reality, `INY` and other opcodes here set some flags, so\na more realistic version of this file would have\n\n    word[1] register Z\n    word[1] register N\n\n    template inc(in out Y) : (out Z, out N) {\n        /* INY */ 0xC8\n    }\n\nand so forth.\n\nPutative Grammar\n----------------\n\nIn EBNF.  Likely quite incomplete at this stage.\n\n    Program         ::= {Directive | GlobalDecl | TemplateDefn}.\n    Directive       ::= \"include\" StringLit.\n\n    GlobalDecl      ::= Type Role Ident\u003cnew:global\u003e [\"=\" Initializer] [\"@\" Address].\n    Type            ::= PrimType [TableSize].\n    PrimType        ::= \"word\" \"[\" WordSize \"]\"\n                      | \"routine\" \"(\" DeclUsageList \")\".\n    Role            ::= \"location\" | \"register\" | \"value\".\n    DeclUsageList   ::= DeclUsage {\",\" DeclUsage}.\n    DeclUsage       ::= {AccessQualifier} Type Ident\u003cglobal\u003e.\n    AccessQualifier ::= \"in\" | \"out\" | \"trash\".\n\n    Address         ::= IntLit.\n    Initializer     ::= IntLit | RoutineLit.\n\n    TemplateDefn    ::= \"template\" Ident\u003cnew:global\u003e \"(\" [TemplateFormals] \")\" [\":\" \"(\" DeclUsageList \")\"] TemplateBlock.\n    TemplateFormals ::= TemplateFormal {\",\" TemplateFormal}.\n    TemplateFormal  ::= {AccessQualifier} Type Ident\u003cnew:param\u003e.\n    TemplateBlock   ::= \"{\" {Emittable} \"}\".\n    Emittable       ::= IntLit | Ident\u003cglobal/param\u003e | EmitFunc \"(\" Emittable \")\".\n    EmitFunc        ::= \"lo\" | \"hi\" | \"rel\".\n\n    Ident           ::= \u003c\u003calphabetic (alphanumeric|'?')*\u003e\u003e.\n    StringLit       ::= \u003c\u003c'\"' any* '\"'\u003e\u003e.\n    IntLit          ::= \u003c\u003c('0x' hexdigit+|digit+)\u003e\u003e.\n\n    RoutineLit      ::= RoutineBlock.\n    RoutineBlock    ::= \"{\" Operation {\",\" Operation} \"}\".\n    Operation       ::= Ident\u003cglobal\u003e \"(\" [Actuals] \")\".\n    Actuals         ::= Actual {\",\" Actual}.\n    Actual          ::= Ident\u003cglobal\u003e | IntLit.\n\nImplementation notes\n--------------------\n\nThe control-flow graph is derived from the AST.  This can be done either\nexplicitly (traversing the AST and constructing a separate control-flow graph)\nor implicitly (to traverse the control-flow graph, traverse the AST in the\nmanner which results in the control-flow graph; no explicit data structure is\nconstructed.)\n\nSome of the most convoluted parts of the SixtyPical compiler can be traced\nto traversing the control-flow graph implicitly, via the AST.  For the\nPolyRical compiler, it would probably be a better idea to construct an explicit\ncontrol-flow graph (with explicit join nodes and so forth) on an early pass,\nthen to traverse that graph, instead of the AST, during static analyses and\ncode generation.\n\n[flow typing]: https://en.wikipedia.org/wiki/Flow-sensitive_typing\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcpressey%2Fpolyrical","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcpressey%2Fpolyrical","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcpressey%2Fpolyrical/lists"}