{"id":25290754,"url":"https://github.com/moderocky/whilezie","last_synced_at":"2025-04-06T19:17:05.287Z","repository":{"id":275994972,"uuid":"927824713","full_name":"Moderocky/Whilezie","owner":"Moderocky","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-26T11:17:21.000Z","size":92,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-26T11:38:43.380Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Moderocky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-05T15:53:41.000Z","updated_at":"2025-02-26T11:17:24.000Z","dependencies_parsed_at":"2025-02-05T18:42:13.663Z","dependency_job_id":"a3df0fe4-02e6-408a-b004-2f23e2f6572d","html_url":"https://github.com/Moderocky/Whilezie","commit_stats":null,"previous_names":["moderocky/whilezie"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Moderocky%2FWhilezie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Moderocky%2FWhilezie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Moderocky%2FWhilezie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Moderocky%2FWhilezie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Moderocky","download_url":"https://codeload.github.com/Moderocky/Whilezie/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247535521,"owners_count":20954576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-13T00:49:52.101Z","updated_at":"2025-04-06T19:17:05.269Z","avatar_url":"https://github.com/Moderocky.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"Whilezie\n=====\n\nA compiler for the basic \u003cem\u003eWHILE\u003c/em\u003e language to VM bytecode.\n\n## Preface: Notes on Implementation\n\nI built this in my 'model' language style:\na source code document is turned into a string of tokens, parsed into a complete 'model' of the document, then compiled.\nThis is not the most efficient way to write a compiler -- especially for something like \u003cem\u003eWHILE\u003c/em\u003e,\nwhich needs to know very little about its surrounding code.\n\nThe 'model' stage is a rigorous in-memory structure of the code.\nThis is a fantastic way to verify and test the program, resolve forward references, collect insights,\nor build new adaptations (e.g. 'transpilation' to another language).\nIf I were writing an optimal compiler, the parsing should be done at the same time as the tokenising\nand the model step could be skipped entirely.\n(In fact, \u003cem\u003eWHILE\u003c/em\u003e is so basic it could be done in a single stream without a backtracking buffer.)\n\nThe bytecode assembler is my [_Foundation 3_](https://github.com/Moderocky/Foundation).\nAside from being a lot more fun to use, it allowed me to do a lot of the resolution of values at compile-time.\n\n## Grammar\n\nThere seemed to be some disagreement over what is considered the 'essential' WHILE grammar.\nI chose to eliminate everything other than the bare essentials.\n\n```antlr\nidentifier: [A-Za-z_][A-Za-z0-9_]*\n\nprogram:\n    | \u003cidentifier\u003e read \u003cidentifier\u003e \u003cstatement\u003e write \u003cidentifier\u003e\n\nstatements:\n    | \u003cstatement\u003e \u003cstatements\u003e\n    | ∅\n \nstatement:\n    | \u003cidentifier\u003e := \u003cexpression\u003e\n    | while \u003cexpression\u003e \u003cstatement\u003e\n    | { \u003cstatements\u003e }\n\nexpression:\n    | nil\n    | cons \u003cexpression\u003e \u003cexpression\u003e\n    | hd \u003cexpression\u003e\n    | tl \u003cexpression\u003e\n    | \u003cvariable\u003e\n```\n\n### Grammar Extensions\n\nI also included the following as optional extensions in the parser.\nMost extensions resolve to their real code.\n\n#### Macros\n\n```antlr\nexpression:\n    | \u003c \u003cidentifier\u003e \u003e \u003cexpression\u003e\n```\n\n#### Macros\n\n```antlr\nstatement:\n    | if \u003cexpression\u003e \u003cstatement\u003e else \u003cstatement\u003e\n    | if \u003cexpression\u003e \u003cstatement\u003e\n```\n\n#### Literals\n\n```antlr\nexpressions:\n    | \u003cexpression\u003e\n    | \u003cexpressions\u003e , \u003cexpression\u003e\nexpression:\n    | \u003c \u003cexpression\u003e . \u003cexpression \u003e\n    | [ \u003cexpressions\u003e ]\n    | (true|false)\n    | [0-9]+\n```\n\nThere seem to be some discrepancies about what is considered 'core' to the \u003cem\u003eWHILE\u003c/em\u003e language.\n\n### Text \u0026 Print-out\n\nSome \u003cem\u003eWHILE\u003c/em\u003e implementations include text literals and a `print` keyword.\u003csup\u003e4\u003c/sup\u003e\nI chose not to include these.\nHowever, my tokeniser has quoted text support, so an extender would only need to handle the text to binary tree\nconversion.\n\n### Numbers \u0026 Maths\n\nSome grammars and implementations include number literals.\u003csup\u003e1\u003c/sup\u003e \u003csup\u003e4\u003c/sup\u003e\nI also chose not to include these.\nMy reasoning was that part of the excitement of \u003cem\u003eWHILE\u003c/em\u003e is assembling everything from binary trees.\nMy tokeniser has support for number literals, and the built-in _Java_ operation methods have tree to number conversion.\n\n### If, Else \u0026 Switch\n\nSome grammars include language-level `if` and `switch` statements.\nI found this to be a little antithetical to the original purpose of \u003cem\u003eWHILE\u003c/em\u003e:\nI think the idea of being able to construct every other flow control statement from while-loops is betrayed slightly by\nalso including every other flow control statement.\n\nI included `if` and `if-else` as optional content in the model parser.\n\n### Skip\n\nA no-operation code `skip` is included in some BNF grammars for \u003cem\u003eWHILE\u003c/em\u003e, due to its presence in Hoare logic.\nSince I already had the block `{}` as an empty statement, I did not include this.\n\n## Example Macros\n\nSeveral programs (macros) are included as examples:\n\n1. Logic (not, and, or, xor, implication)\n2. (Positive) addition, subtraction, multiplication, division\n3. Deep-tree equality, number-kind test\n\n## While-in-While\n\nI wanted to create \u003cem\u003eWHILE\u003c/em\u003e-evaluation in \u003cem\u003eWHILE\u003c/em\u003e.\n\n### Instruction Set\n\nI stuck to the simplest possible program representation.\n\n1. A program is a list of instructions.\n2. Each instruction is a three-address list.\n3. The first element in an instruction is a numerical operation code\n   corresponding to the instruction.\n4. The interpretation of the following elements depends on the instruction.\n\nTheoretically, it is not very difficult to represent any \u003cem\u003eWHILE\u003c/em\u003e program as three-address code.\nThis is essentially what the model stage of my compiler does, and doing it in \u003cem\u003eWHILE\u003c/em\u003e itself is no different.\nThe only minor difference between _Java_ bytecode and three-address code is that bytecodes take up a variable number\nof slots, whereas the three-address code is a fixed number.\nI have cheated slightly in that, rather than jumping to subroutines within the instruction set,\nI simply call the evaluator with a sub-instruction.\n\nThe operation codes are displayed below.\n1. while \u003cexpr\u003e \u003cstmt\u003e\n2. read \u003cindex\u003e nil\n3. write \u003cindex\u003e \u003cexpr\u003e\n4. cons \u003cexpr\u003e \u003cexpr\u003e\n5. hd \u003cexpr\u003e nil\n6. tl \u003cexpr\u003e nil\n7. (tuple) \u003cstmt\u003e \u003cstmt\u003e\n\n### Evaluation\n\n1. Variables are indexed numerically in a **register** list\n2. A two-element stack is used to hold values.\n\n\n## References\n\n1. Jonathan Aldrich, \"The \u003cem\u003eWHILE\u003c/em\u003e Language and \u003cem\u003eWHILE3ADDR\u003c/em\u003e\n   Representation\", [cs.cmu.edu](https://www.cs.cmu.edu/~aldrich/courses/15-819O-13sp/resources/while-language.pdf).\n2. Giulio Guerrieri, \"Limits of Computation (4): WHILE-Semantics\", Lecture, University of Sussex, Feb. 2025.\n3. Aho, Sethi, Ullman, \"Compilers: Principles, Techniques, and Tools\", Addison-Wesley, 1986.\n4. Leonardo Lucena, \"While language\", [whilelang](https://lrlucena.github.io/whilelang/#grammar), Federal Institute of\n   Education, Science \u0026 Technology, Brazil.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoderocky%2Fwhilezie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoderocky%2Fwhilezie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoderocky%2Fwhilezie/lists"}