{"id":34148496,"url":"https://github.com/alysonnumberfive/compilergenerator","last_synced_at":"2026-03-09T21:03:58.138Z","repository":{"id":57622369,"uuid":"391710830","full_name":"AlysonNumberFIVE/CompilerGenerator","owner":"AlysonNumberFIVE","description":"A repo for my current compiler project","archived":false,"fork":false,"pushed_at":"2021-10-09T12:50:30.000Z","size":12194,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-06-20T12:42:59.350Z","etag":null,"topics":["compiler","go","lexer","metacompiler","programming-language"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlysonNumberFIVE.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-08-01T18:47:32.000Z","updated_at":"2022-07-12T06:17:34.000Z","dependencies_parsed_at":"2022-09-26T20:10:43.295Z","dependency_job_id":null,"html_url":"https://github.com/AlysonNumberFIVE/CompilerGenerator","commit_stats":null,"previous_names":["alysonnumberfive/compilergenerator","alysonbee/compilergenerator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AlysonNumberFIVE/CompilerGenerator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlysonNumberFIVE%2FCompilerGenerator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlysonNumberFIVE%2FCompilerGenerator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlysonNumberFIVE%2FCompilerGenerator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlysonNumberFIVE%2FCompilerGenerator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlysonNumberFIVE","download_url":"https://codeload.github.com/AlysonNumberFIVE/CompilerGenerator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlysonNumberFIVE%2FCompilerGenerator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30312140,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T20:05:46.299Z","status":"ssl_error","status_checked_at":"2026-03-09T19:57:04.425Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compiler","go","lexer","metacompiler","programming-language"],"created_at":"2025-12-15T04:46:52.457Z","updated_at":"2026-03-09T21:03:58.133Z","avatar_url":"https://github.com/AlysonNumberFIVE.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CompilerGenerator\nThe purpose of this repo is to store all the code for my current \u003ca href=\"https://alysonn.medium.com/lets-play-with-meta-compilers-designing-an-automated-scanner-2ffe2e0e609c\" target=\"_blank\"\u003eMedium\u003c/a\u003e project.\u003cbr\u003e\n\nThis project, currently entitled Compiler Generator, is a series of project/articles covering my attempt at making a metacompiler from scratch in Golang, following along with the book \u003ca href=\"https://www.amazon.com/Engineering-Compiler-Keith-Cooper/dp/012088478X\" blank_=\"\" \u003eEngineering a Compiler\u003c/a\u003e.\u003cbr\u003e\u003cbr\u003e\n\n## Table of Contents\n  - \u003cb\u003eCompiler Structure\u003c/b\u003e\n  - \u003cb\u003eProgress so Far\u003c/b\u003e\n      - \u003cb\u003eThe Scanner\u003c/b\u003e\n      - \u003cb\u003eSpec File Format\u003c/b\u003e\n      - \u003cb\u003eIMPORTANT: Controlling spec syntax tokens clashing with target language\u003c/b\u003e\n  - \u003cb\u003eTODO\u003c/b\u003e\n  - \u003cb\u003eA short demo\u003c/b\u003e\n  - \u003cb\u003eHow to run it\u003c/b\u003e\n       - \u003cb\u003eChanges to Execution on the way\u003c/b\u003e\n\n## Compiler Structure\n\nThis project is divided into three parts;\n\n-\u003cb\u003eThe Frontend\u003c/b\u003e    - The part of the compiler responsible for injesting a raw source code and extracting valid tokens and verifying accurate semantics of the language.\u003cbr\u003e\n-\u003cb\u003eThe middleware\u003c/b\u003e  - Intermediate code generation that converts source code into an intermediate pseudocode-like form. Optimization happens here.\u003cbr\u003e\n-\u003cb\u003eThe Backend\u003c/b\u003e      - The part of the compiler responsible for converting the intermediate representation into the target system's assmebly architecture.\u003cbr\u003e\n\nThe aim is to create a metacompiler that, once a language's specification is ingested alongside valid source file(s) of the language to be compiled, the language is successfully compiled into the target architecture of a specified system.\u003cbr\u003e\n\n## Progress so Far \n\n### The Scanner\n\nCurrently, an automated scanner generator is implemented. Similar to the `lex/flex` utilities, this one runs on a spec file mainly based on the usage of regex pattern matching.\u003cbr\u003e\n\nA C language spec file (see the full file in `specfiles/c.spec`):\n```\n\n# Classifier\n\nalphabet    [_a-zA-Z]\ndigit       [0-9]\nnumber      {digit}+\nnewline     %NEWLINE\nword        {alphabet}({alphabet}|{digit})*\nsymbols     [-+/\\*\u0026!\\|\\{\\}=\u003e\u003c:^;,]\nequ         ([+-/=*!\u0026\\|]|((\u003e)?\u003e)|((\u003c)?\u003c))?=\nleft        (\u003c)?\u003c\nright       (\u003e)?\u003e\nbrackets    [\\[\\]\\(\\)]\ncomment     //.*{newline}\nmcomment    /\\*.*\\*/\nfloat       [0-9]+((\\.[0-9]*)|e((\\+|-)?[0-9]+))\nhex         0[xX][a-fA-Z0-9]+\nstring      \".*\"\nchar        '[(\\')(\\t)(\\n)]|(.*)'\narrow\t\t\t  -\u003e\n\n%%\n\n# Delim\n'     {char}\n\"     {string}\n//    {comment}\n/\\*   {mcomment}\n\n%%\n\n# TokenType\n\n{string}    STRING\n{number}    INTEGER\n{word}      ID\n{char}      CH\nchar        CHAR\nint         INT\nlong        LONG\nvoid        VOID\nunsigned    UNSIGNED\n*           STAR\n...\n```\n### Spec file format\nThe spec file is akin to modified version of the `.l` format used to compile `lex` parsers. Unlike `lex` which generates `.c` source files that compile into the parser themselves, `spec` is the first part of the metacompiler and directly outputs a `Go` list of structs that contain each token. With a `spec` file set, it reads input files and tokenizes the input file going ooff the rules in the spec file. \n\n#### Structure\nSimilar to the `lex` file format, the `spec` format has 3 sections, each divided by a pair of `%%`. The spec design is based on \ntable driven scanner design.\u003cbr\u003e\n- Note: a table driven scanner makes use of 3 tables, one with a collection of acceptable DFAs used by the language, a second for classifying input types and a third that holds the valid tokens generated by accepting DFA states. The spec file format is designed to mimick this behaviour with a few modifications to simplify the programming process.\u003cbr\u003e\n\nEach section of the spec file's data is laid out as follows:\u003cbr\u003e\n- \u003cb\u003eClassifier list\u003c/b\u003e - This has a list of all the language's regexes. These, alongside helper functions inside the scanner, simulate the concept of DFA state traversal, the saving of accepting states and rollbacks in case of failed states.\n- \u003cb\u003eDelim list\u003c/b\u003e      - This table is for all language constructs that rely on delimiting tokens to end their parsing. Tokens that go here are those for strings and comments.\n- \u003cb\u003eTokenType list\u003c/b\u003e  - These are all the valid tokens that are accepted by the specified language.\n\nA slight drawback with this format's design is the heavy reliance on regex knowledge to simulate the DFAs needed for the target language. An optional add-on will be an init file that auto-generates common regexes for common language constructs to speed up spec file development.\n\n### IMPORTANT: Controlling spec syntax tokens clashing with target language\n\nA glaring issue I ran into was the differentiaion of tokens like `{` and `}` between the ones to be parsed and the ones that are part of the spec file's syntax. This is handled by variables written in all-caps and prefixed with `%` tokens. Ones that are currently set by default are:\n```\n%NEWLINE in place of \\n\n%TAB in place of \\t\n%PC in place of %\n%LBC in place of {\n%RBC in place of }\n%HSH in plaec of #\n```\nShould you need to define `\\n` or `\\t` as part of your language's spec, use these variables in place of their literals to avoid your tokens being treated as spec syntax.\n\n\n## TODO\n- Add functionality for error logging and reporting. This will be quite monolithic as this must span entire commpiler.\n- Complete unittesting\n- More reading for the Automated Parsing phase\n- Resolve clashing tokens -- IN TESTING\n\n## A short demo\n\nNote: this demo's slow as I didn't realize until I'd done the work of extracting it and saving it that my mouse scrolling wasn't tracket (I forgot session recording works :'| ) so all the still/empty space is me scrolling... I'd recommend you give it a go if it looks interesting but this kinda gives you an idea of how it works anyway :).\u003cbr\u003e\u003cbr\u003e\n\u003cimg src=\"https://github.com/AlysonBee/CompilerGenerator/blob/master/assets/demoScreen.gif\" height=\"400\"/\u003e\n\n## How to run it\n\nStart by compiling the project\n```\ngo build .\n```\nThen, set your spec file in the `config` file. Current spec (for demo purposes) can be found in `./specfiles`.\u003cbr\u003e\u003cbr\u003e\nInside `config`\n```\nconfig:[specfile path]\n```\nAnd then run the scanner against a target source file\n```\n./compiler [source file]\n```\n#### Changes to Execution on the way\n\n1. The option of passing in a config spec file will be added as a command line parameter\u003cbr\u003e\nExample:\n```\n./compiler --config [spec file]\n```\n2. The option to output a token file with all your tokenized values with a commmand line parameter\u003cbr\u003e\nExample:\n```\n./compiler --outfile [filename] \n```\n3. A default `init` option for spec file creation that will initialize an empty spec file with all common regex patterns already set.\u003cbr\u003e\nExample:\n```\n$\u003e./compiler --init-spec\n# Classifier \n\nalphabet                [_a-zA-Z]\ndigit                   [0-9]\nnumber                  {digit}+\nnewline                 %NEWLINE\nword                    {alphabet}({alphabet}|{digit})*\nsymbols                 [-+/\\*\u0026!\\|=\u003e\u003c:^;,]\nlbrace                  %LBRC\nrbrace                  %RBRC\nequ                     ([+-/=*!\u0026\\|]|((\u003e)?\u003e)|((\u003c)?\u003c))?=\nleft                    (\u003c)?\u003c\nright                   (\u003e)?\u003e\nbrackets                [\\[\\]\\(\\)]\nfloat                   [0-9]+((\\.[0-9]*)|e((\\+|-)?[0-9]+))\nhex                     0[xX][a-fA-Z0-9]+\nstring                  \".*\"\nchar                    '[(\\')(\\t)(\\n)]|(.*)'\n\n%%\n\n# Delims\n'       {char}\n\"       {string}\n0[xX]   {hex}\n\n%%\n# TokenType\n\n# tokens go here\n$\u003e\n```\n\n#### Coded extensively by AlysonBee (Alyson Ngonyama) \n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falysonnumberfive%2Fcompilergenerator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falysonnumberfive%2Fcompilergenerator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falysonnumberfive%2Fcompilergenerator/lists"}