{"id":20554025,"url":"https://github.com/vasilescur/ambush","last_synced_at":"2026-03-19T16:22:36.338Z","repository":{"id":115938484,"uuid":"236882515","full_name":"vasilescur/ambush","owner":"vasilescur","description":"Compiler for Tiger language, written in Standard ML for Duke ECE/CS 553: Compiler Construction. The compiler follows the standard flow of lexing, parsing, semantic analysis and type checking, intermediate representation, liveness analysis, and code generation.","archived":false,"fork":false,"pushed_at":"2020-05-02T01:58:28.000Z","size":2789,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-06T06:23:13.826Z","etag":null,"topics":["compiler-construction","compilers","lexer","parser","sml","tiger-language"],"latest_commit_sha":null,"homepage":"","language":"Standard ML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vasilescur.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-29T01:42:00.000Z","updated_at":"2023-04-28T13:06:52.000Z","dependencies_parsed_at":"2023-06-19T02:15:00.169Z","dependency_job_id":null,"html_url":"https://github.com/vasilescur/ambush","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/vasilescur/ambush","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasilescur%2Fambush","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasilescur%2Fambush/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasilescur%2Fambush/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasilescur%2Fambush/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vasilescur","download_url":"https://codeload.github.com/vasilescur/ambush/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasilescur%2Fambush/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30209796,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T05:23:27.321Z","status":"ssl_error","status_checked_at":"2026-03-07T05:00:17.256Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compiler-construction","compilers","lexer","parser","sml","tiger-language"],"created_at":"2024-11-16T02:46:06.564Z","updated_at":"2026-03-07T08:01:31.195Z","avatar_url":"https://github.com/vasilescur.png","language":"Standard ML","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg width=\"300\" align=\"center\" alt=\"logo-large\" src=\"https://user-images.githubusercontent.com/10100323/73473240-6a6f0600-435a-11ea-95f7-57841d91c49e.png\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  Compiler for Tiger programming language\u003cbr/\u003ewritten in Standard ML for Duke \u003ci\u003eECE/CS 553: Compiler Construction\u003c/i\u003e.\n\u003c/p\u003e\n\n### Group Members\n\n- Jake Derry\n- \u003cstrike\u003eRyan Piersma\u003c/strike\u003e\n- Radu Vasilescu\n\n#### Why the name Ambush?\n\n\u003e A group of tigers is either called a \"streak\" or an \"ambush.\"\n\n\u003csub\u003eSource: Archived copy of *Animal group names*. Zoological Society of San \nDiego. Archived from the original on July 4, 2013.\u003c/sub\u003e\n\n### Usage Instructions\n\n#### Using `make` and packaged executable\n\nTo build:\n\n```bash\nmake\n```\n\nTo use the compiler:\n\n```\n./ambush [tiger file]\n```\n\nThis will place the output file in the same directory as the source file, with\nthe extension `.s` appended (for example `test1.tig` becomes `test1.tig.s`).\n\nTo run the resulting assembly file using SPIM:\n\n```\nspim -file [assembly file]\n```\n\nTo perform this whole process at once, use the `test.sh` script. For \nexample, to build the compiler, compile test 51, and run the result using SPIM, \njust execute:\n\n```bash\ntc=test51.tig ./test.sh\n```\n\nFor more information, check the `Makefile`. \n\n**Note: This has only been tested on MacOS. If using Linux, Windows, or other OS,\nplease build the project manually using `sml` then `CM.make \"sources.cm\"`. **\n\n#### Manually building and testing using REPL\n\nTo test the compiler, open a terminal in the main folder and run:\n\n```bash\nsml run.sml\n```\n\nIn the resulting SML REPL, execute\n\n```sml\nMain.compile \"testcases/myProgram.tig\";\n```\n\n\n### Project Structure and Files\n\n#### General\n\n- `errormsg.sml` provides a signature for creating helpful error messages\n- `sml-style-guide.pdf` outlines a style convention for writing SML code\n- `sources.cm` is the \"Makefile\" for the Compilation Manager\n- The `testcases/` folder contains several Tiger example programs\n- The `.cm/` folder contains Compilation Manager auto-generated files\n- `.gitignore` is used by Git to exclude files\n\n#### Lexer\n\n- `tiger.lex` is our ML-Lex definition file for Tiger\n- `tiger.lex.sml.ours` is the auto-generated lexer from our own `tiger.lex`\n- `tiger.lex.sml` is the auto-generated lexer from the TEXTBOOK author\n- `tokens.sig` and `tokens.sml` are the starter code files for tokens and \n  signatures. **They are unused** currently, in favor of the auto-generated\n  tokens and signatures created by ML-Yacc.\n\n#### Parser\n\n- `tiger.grm` is our ML-Yacc grammar definition for Tiger\n- `tiger.grm.desc` is an auto-generated file from ML-Yacc\n- `tiger.grm.sig` contains auto-generated code from ML-Yacc\n- `tiger.grm.sml` contains the auto-generated parser by ML-Yacc\n\n```\nTODO: Finish this description/list\n```\n\n\n## Lexer\n\nThe lexer is responsible for turning source code into tokens. \n\n### Comments\n\nAmbush handles comments by creating a `COMMENT` state which represents being \ninside a comment. There is alao a variable called `inComment` which keeps track \nof whether the lexer is currently inside a `COMMENT` state. \n\nThese are the relevant few rules:\n\n```sml\nval inComment : bool ref \n```\n\n```sml\n\u003cREM\u003e\"Enter comment.\" =\u003e (continue ());\n  \u003cINITIAL\u003e\"/*\" =\u003e (YYBEGIN COMMENT; inComment := true; continue ());\n\n\u003cREM\u003e\"Exit comment.\" =\u003e (continue ());\n  \u003cCOMMENT\u003e\"*/\" =\u003e (YYBEGIN INITIAL; inComment := false; continue ());\n\n\u003cREM\u003e\"Ignore symbols and reserved words in comments.\" =\u003e (continue ());\n  \u003cCOMMENT\u003e.    =\u003e (continue ());\n```\n\nThe `inComment` variable is used to detect comments that are left unclosed at\nthe end of the file (see *Error Handling* section for more details).\n\n### Strings\n\nWe handle strings by using two additional states: the `STRING` state which\nrepresents being inside a string and the `ESCAPE` state which represents being\ninside an escape character.\n\nAfter entry into the `STRING` state, all characters reached are stored in a\n`currentString` variable. When existing the `STRING` state, the `currentString`\nis used to create the new token which represents a strin.\n\nAdditionally, we deal with escape characters after entering the `ESCAPE` \nstate, adding the appropriate escape character to the `currentString` variable \nafter identifying the escape character with the character(s) following the \nbackslash.\n\n### Error Handling\n\nIf the lexer encounters an invalid character anywhere in any state (that is to \nsay, any portion of code not already matched by a different rule), it presents\nan `ErrorMsg`, and does not emit a token for that portion of code:\n\n```sml\n. =\u003e (ErrorMsg.error yypos \n                     (\"illegal character \" ^ yytext ^ \"(ASCII \"\n                     ^ (Int.toString (Char.ord (hd (String.explode yytext)))) \n                     ^ \")\"); \n      continue ());\n```\n\n### EOF handling\n\nAt `EOF` (End-Of-File), we detect any unfinished strings or comments in the \n`eof` function. If ending in one of these non-accepting states, we raise an \nerror message that indicates whether the program was ended with an unfinished \nstring or comment. We always emit the `EOF` token whether an error was reported \nor not:\n\n```sml\n(* Deals with reaching the end of file. *)\nfun eof () = \n  let val pos = hd (!linePos) \n  in  case (!currentString, !inComment)\n        of (\"\", false) =\u003e Tokens.EOF (pos, pos)\n         | (\"\", true)  =\u003e (ErrorMsg.error pos \n                                          (\"Expected end of comment, \\\n                                          \\ found EOF\");\n                           Tokens.EOF (pos, pos))\n         | (_,  _)     =\u003e (ErrorMsg.error pos \n                                          (\"Expected end of string, \\\n                                          \\ found EOF\");\n                           Tokens.EOF (pos, pos))\n  end\n```\n\n## Parser\n\nImplemented a parser that takes in a set of tokens and outputs an AST.\n\nThe parser currently has neither shift/reduce nor reduce/reduce conflicts, and seems\nto parse the Tiger test cases correctly. For example, the following Tiger program:\n\n```sml\n/* an array type and an array variable */\nlet\n    type  arrtype = array of int\n    var arr1:arrtype := arrtype [10] of 0\nin\n  arr1\nend\n```\n\nParses to the tree:\n\n```sml\nLetExp([\n VarDec(arr1,true,SOME(arrtype),\n  ArrayExp(arrtype,\n   IntExp(10),\n   IntExp(0))),\n TypeDec[\n  (arrtype,\n   ArrayTy(int))]],\n VarExp(\n  SimpleVar(arr1)))\n```\n\n## Type Checking\n\nOur type checker helps the user find typing issues within their program to\nhelp debug their programs when the type checker fails. In addition, there are\nsome conditions when the type checker fails because of an internal issue. These\nconditions are noted by raising an exception that causes the compiler to crash\nand currently, these conditions are unreachable.\n\nAfter type checking the entire program (which helps users find the issues within \ntheir programs), the rest of the compilation process does not continue.\n\nThe type checker produces helpful error messages to make debugging as easy as\npossible. A lot of the type checking error outputs are based on the outputs\nthat SML gives. For example, here is the error message produced for `test22.tig`.\n\n```sml\nlet \n\ttype rectype = {name:string , id:int}\n\tvar rec1 := rectype {name=\"Name\", id=0}\nin\n\trec1.nam := \"asd\"\nend\n```\n\nError message:\n\n```\ntestcases/test22.tig:7.2:Type Checking Error: Could not find field nam\n    Expected: { nam : 'a, ...}\n    Actual:   { id : int, name : string,  }\n```\n\n## Intermediate Representation (IR)\n\nThe next stage is conversion to Intermediate Representation, a format in which\nthe code is represented as a set of trees consisting of basic operations. For \nexample, the following Tiger code:\n\n```sml\nlet var a := 7\n    var b := 9\n    var c := 3\nin  c := a + b\nend\n```\n\nTranslates to the following IR (comments added for explanation):\n\n```sml\n(* Assign initial values to variables on stack *)\nMOVE(MEM (BINOP (PLUS, TEMP tt25, CONST ~4)),\n     CONST 7)\nMOVE(MEM (BINOP (PLUS, TEMP tt25, CONST ~8)),\n     CONST 9)\nMOVE(MEM (BINOP (PLUS, TEMP tt25, CONST ~12)),\n     CONST 3)\n\n(* do c \u003c- a + b, where (a, b, c) are on stack *)\nMOVE(MEM (BINOP (PLUS, TEMP tt25, CONST ~12)),\n     BINOP(PLUS, MEM (BINOP (PLUS, TEMP tt25, CONST ~4)),\n                 MEM (BINOP (PLUS, TEMP tt25, CONST ~8))))\n```\n\n## Instruction Selection \nIn order to produce MIPS assembly language instructions, the IR must be \nconverted to instructions through the Instruction Selection process. \n\nFor example, Instruction Selection produces the following output for the IR\nabove:\n\n```asm\nPROCEDURE L0\nL0: \nL2:\n    addi t0, r0, 7\n    sw t0, ~4(t25)\n    addi t1, r0, 9\n    sw t1, ~8(t25)\n    addi t2, r0, 3\n    sw t2, ~12(t25)\n    lw t4, ~4(t25)\n    lw t5, ~8(t25)\n    add t3, t4, t5\n    sw t3, ~12(t25)\n    j L1\nL1:\nEND L0\n```\n \n\n## Liveness Analysis\n\nThe liveness analysis stage first builds a control-flow graph of the program,\nand then computes the \"liveness\" of each temp at every node in the graph. Then,\nit generates an interference graph, where every node is a temp and every edge \nsignifies that those temps are live at the same time.\n\nHere is an example of a Liveness Analysis on the following Tiger program:\n\n```sml\nlet var a := 0\nin  while(a \u003c 10) do a := a + 1; nil\nend\n```\n\n\nControl-flow Graph:\n\n![cfg](https://user-images.githubusercontent.com/10100323/79820219-bd54ca00-8359-11ea-9952-177c54baf053.png)\n\n\n\nLiveness Results:\n\n```\nNode: nid = 0   liveIn:  , t25  liveOut: , t25  move:    N/A\nNode: nid = 1   liveIn:  , t25  liveOut: , t1, t25      move:    N/A\nNode: nid = 2   liveIn:  , t1, t25      liveOut: , t25  move:    N/A\nNode: nid = 3   liveIn:  , t25  liveOut: , t25  move:    N/A\nNode: nid = 4   liveIn:  , t25  liveOut: , t2, t25      move:    N/A\nNode: nid = 5   liveIn:  , t2, t25      liveOut: , t2, t4, t25  move:    N/A\nNode: nid = 6   liveIn:  , t2, t4, t25  liveOut: , t2, t4, t5, t25      move:    N/A\nNode: nid = 7   liveIn:  , t2, t4, t5, t25      liveOut: , t2, t3, t25  move:    N/A\nNode: nid = 8   liveIn:  , t2, t3, t25  liveOut: , t25  move:    N/A\nNode: nid = 9   liveIn:         liveOut:        move:    N/A\nNode: nid = 10  liveIn:         liveOut:        move:    N/A\nNode: nid = 11  liveIn:  , t25  liveOut: , t25  move:    N/A\nNode: nid = 12  liveIn:  , t25  liveOut: , t6, t25      move:    N/A\nNode: nid = 13  liveIn:  , t6, t25      liveOut: , t0, t25      move:    t0 \u003c- t6\nNode: nid = 14  liveIn:  , t0, t25      liveOut: , t0, t8, t25  move:    N/A\nNode: nid = 15  liveIn:  , t0, t8, t25  liveOut: , t0, t25      move:    N/A\nNode: nid = 16  liveIn:  , t0, t25      liveOut: , t0, t9, t25  move:    N/A\nNode: nid = 17  liveIn:  , t0, t9, t25  liveOut: , t25  move:    N/A\nNode: nid = 18  liveIn:  , t25  liveOut: , t25  move:    N/A\nNode: nid = 19  liveIn:         liveOut:        move:    N/A\n```\n\nWhich then generates the interference graph:\n\n![interference-graph](https://user-images.githubusercontent.com/10100323/79820223-c180e780-8359-11ea-80fb-b8e1de9e86f8.png)\n\n\n## Register Allocation\n\nThe Register Allocation phase of the compiler applies a graph-coloring algorithm to the\ninterference graph in order to \"color\" (assign) temps to certain physical registers. The idea\nis that multiple temps can be assigned to the same physical register, so long as they do not\ninterfere with one another (share an edge in the interference graph AKA are live at the same\ntime). \n\nIn addition, we had to keep track of pre-allocated registers such as special machine registers\nlike the frame pointer, return address, and so on.\n\nWe have not implemented spilling or coalescing, meaning that for now, we can only handle\ncompiling Tiger programs that use a limited number of temps-- if they try to use too many\ntemps, we won't have room in the physical registers and are not yet able to spill to memory. \n\nOne other improvement that we made during this stage is that instead of creating a series of\n`MOVE`s to save and restore the caller-saved registers before and after a function call,\nthe compiler now saves those registers' values to local variables allocated within the\ncurrent frame, which saves a lot of register space and helps raise the limit to spilling. \n\nHere is an example of the register allocater at work. The following Tiger program:\n\n```sml\nlet var a := 0\nin  while(a \u003c 10) do a := a + 1; nil\nend\n```\n\nCompiles to the following MIPS assembly with correct physical registers allocated:\n\n```asm\n.text\n    j    L0\n.text\n# PROCEDURE L0\nL0: \nL5:\n    addi $t1, $0, 0\n    sw   $t1, -4($fp)\nL2:\n    addi $v0, $0, 1\n    lw   $a0, -4($fp)\n    addi $a1, $0, 10\n    slt  $a0, $a0, $a1\n    beq  $v0, $a0, L3\n    b    L1\nL1:\n    j    L4\nL3:\n    lw   $t0, -4($fp)\n    addi $a3, $t0, 1\n    addi $a2, $a3, 0\n    sw   $a2, -4($fp)\n    j    L2\nL4:\n    \n# END L0\n```\n\n## Putting it All Together\n\nTo finish, we added some miscellaneous fixes throughout the project.\n\nFunctions now get their arguments/formals from the right `access`es, and follow\nproper calling conventions for saving and restoring registers.\n\nWe also (fixed and then) added the Tiger runtime library as provided by Professor\nHilton, and augmented it with a few of our own functions, such as `print_int`, \nwhich use MIPS syscalls to easily accomplish what would otherwise have been complicated\nfeatures to implement in plain Tiger.\n\nWe also revamped the project's build and testing system by switching to `make`\nand `ml-build` to package a heap snapshot of the SMLofNJ environment for ease\nof use.\n\n\n## Extra Credit Features\n\n### Musical Tiger\n\nWe had an idea to implement a musical compiler that produces sound output\nrelevant to each stage of the compilation process as it runs. Upon realizing\nthe sheer stupidity of this idea, it has been quarantined to its own branch.\nTo play with this feature, checkout the branch `musical` and see its version\nof the `README.md`.\n\nNo guarantees are made that the `musical` branch will be maintained or updated\npast the state that it was as of 2020-02-27 at 12:15 PM. As of now, it should \nnot be considered part of our official submission.\n\n## Future Possible Compiler Features (Extra Credit)\n\n - [ ] SML Formatter\n    - [ ] Tiger formatter\n    - [ ] Tiger VS Code extension\n    - [ ] Coloring formatting for (let, in, end)\n - [ ] Garbage collector\n - [ ] Rich error reporting\n - [ ] Tiger REPL\n - [ ] More/better Tiger libraries\n    - [ ] Math library\n    - [ ] Data structures library\n - [ ] Optimized compiliation for different processors\n - [ ] Tiger web framework (integration)? See [`SML on Stilts`](https://github.com/j4cbo/stilts)...\n\n\u003cbr/\u003e \u003cbr/\u003e\n\n## Dank Memes For Your Consideration and Enjoyment\n\n![Binary tree pants meme](https://i.kym-cdn.com/photos/images/original/001/272/773/6dd.jpg)\n\n-----\n\n![Functional programming meme](https://pics.me.me/do-you-smoke-functional-very-time-programming-1s-more-effective-36314444.png)\n\n-----\n\n![Commit messages meme](https://pics.me.me/me-i-should-give-this-commit-a-proper-descriptive-message-58056481.png)\n\n-----\n\n\u003eDo you want us to send the cocaine directly to your email? \n\u003e -- Jake Derry, 2020, somehow contextually related to this project\n\n-----\n\n![image](https://user-images.githubusercontent.com/10100323/79821281-584ea380-835c-11ea-88fa-584e40e11d55.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvasilescur%2Fambush","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvasilescur%2Fambush","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvasilescur%2Fambush/lists"}