{"id":13467486,"url":"https://github.com/franko/luajit-lang-toolkit","last_synced_at":"2025-04-05T06:09:51.037Z","repository":{"id":13969950,"uuid":"16670536","full_name":"franko/luajit-lang-toolkit","owner":"franko","description":"A Lua bytecode compiler written in Lua itself for didactic purposes or for new language implementations","archived":false,"fork":false,"pushed_at":"2020-08-29T14:16:58.000Z","size":464,"stargazers_count":665,"open_issues_count":8,"forks_count":91,"subscribers_count":44,"default_branch":"master","last_synced_at":"2025-03-29T05:10:07.588Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/franko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"franko","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2014-02-09T16:53:44.000Z","updated_at":"2025-03-21T12:18:16.000Z","dependencies_parsed_at":"2022-08-09T08:36:20.099Z","dependency_job_id":null,"html_url":"https://github.com/franko/luajit-lang-toolkit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franko%2Fluajit-lang-toolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franko%2Fluajit-lang-toolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franko%2Fluajit-lang-toolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franko%2Fluajit-lang-toolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/franko","download_url":"https://codeload.github.com/franko/luajit-lang-toolkit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247294541,"owners_count":20915340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T15:00:57.167Z","updated_at":"2025-04-05T06:09:51.016Z","avatar_url":"https://github.com/franko.png","language":"Lua","funding_links":["https://github.com/sponsors/franko"],"categories":["Lua","Resources"],"sub_categories":["Lua VM in Lua"],"readme":"LuaJIT Language Toolkit\n===\n\nThe LuaJIT Language Toolkit is an implementation of the Lua programming language written in Lua itself.\nIt works by generating LuaJIT bytecode, including debug information, and uses LuaJIT's virtual machine to run the generated bytecode.\n\nOn its own, the language toolkit does not do anything useful, since LuaJIT itself does the same things natively.\nThe purpose of the language toolkit is to provide a starting point to implement a programming language that targets the LuaJIT virtual machine.\n\nWith the LuaJIT Language Toolkit, it is easy to create a new language or modify the Lua language because the parser is cleanly separated from the bytecode generator and the virtual machine.\n\nThe toolkit implements a complete pipeline to parse a Lua program, generate an AST, and generate the corresponding bytecode.\n\nLexer\n---\n\nIts role is to recognize lexical elements from the program text.\nIt takes the text of the program as input and produces a stream of \"tokens\" as its output.\n\nUsing the language toolkit you can run the lexer only, to examinate the stream of tokens:\n\n```\nluajit run-lexer.lua tests/test-1.lua\n```\n\nThe command above will lex the following code fragment:\n\n```lua\nlocal x = {}\nfor k = 1, 10 do\n    x[k] = k*k + 1\nend\n```\n\n...to generate the list of tokens:\n\n    TK_local\n    TK_name\tx\n    =\n    {\n    }\n    TK_for\n    TK_name\tk\n    =\n    TK_number\t1\n    ,\n    TK_number\t10\n    TK_do\n    TK_name\tx\n    [\n    TK_name\tk\n    ]\n    =\n    TK_name\tk\n    *\n    TK_name\tk\n    +\n    TK_number\t1\n    TK_end\n\nEach line represents a token where the first element is the kind of token and the second element is its value, if any.\n\nThe Lexer's code is an almost literal translation of the LuaJIT's lexer.\n\nParser\n---\n\nThe parser takes the token stream from the lexer and builds statements and expressions according to the language's grammar.\nThe parser is based on a list of parsing rules that are invoked each time the input matches a given rule.\nWhen the input matches a rule, a corresponding function in the AST (abstract syntax tree) module is called to build an AST node.\nThe generated nodes in turns are passed as arguments to the other parsing rules until the whole program is parsed and a complete AST is built for the program text.\n\nThe AST is very useful as an abstraction of the structure of the program, and is easier to manipulate.\n\nWhat distinguishes the language toolkit from LuaJIT is that the parser phase generates an AST, and the bytecode generation is done in a separate phase only when the AST is complete.\n\nLuaJIT itself operates differently.\nDuring the parsing phase it does not generate any AST but instead the bytecode is directly generated and loaded into the memory to be executed by the VM.\nThis means that LuaJIT's C implementation performs the three operations:\n\n- parse the program text\n- generate the bytecode\n- load the bytecode into memory\n\nin one single pass.\nThis approach is remarkable and very efficient, but makes it difficult to modify or extend the programming language.\n\n### Parsing Rule example ###\n\nTo illustrate how parsing works in the language toolkit, let us make an example.\nThe grammar rule for the \"return\" statement is:\n\n```\nexplist ::= {exp ','} exp\n\nreturn_stmt ::= return [explist]\n```\n\nIn this case the toolkit parser's rule will parse the optional expression list by calling the function `expr_list`.\nThen, once the expressions are parsed the AST's rule `ast:return_stmt(exps, line)` will be invoked by passing the expressions list obtained before.\n\n```lua\nlocal function parse_return(ast, ls, line)\n    ls:next() -- Skip 'return'.\n    ls.fs.has_return = true\n    local exps\n    if EndOfBlock[ls.token] or ls.token == ';' then -- Base return.\n        exps = { }\n    else -- Return with one or more values.\n        exps = expr_list(ast, ls)\n    end\n    return ast:return_stmt(exps, line)\nend\n```\n\nAs you can see, the AST functions are invoked using the `ast` object.\n\nIn addition, the parser provides information about:\n\n* the function prototype\n* the syntactic scope\n\nThe first is used to keep track of some information about the current function being parsed.\n\nThe syntactic scope rules tell the user's rule when a new syntactic block begins or end.\nCurrently this is not really used by the AST builder but it can be useful for other implementations.\n\nThe Abstract Syntax Tree (AST)\n---\n\nThe abstract syntax tree represent the whole Lua program, with all the information the parser has gathered about it.\n\nOne possible approach to implement a new programming language is to generate an AST that more closely corresponds to the target programming language, and then transform the tree into a Lua AST in a separate phase.\n\nAnother possible approach is to directly generate the appropriate Lua AST nodes from the parser itself.\n\nCurrently the language toolkit does not perform any additional transformations, and just passes the AST to the bytecode generator module.\n\nBytecode Generator\n---\n\nOnce the AST is generated, it can be fed to the bytecode generator module, which will generate the corresponding LuaJIT bytecode.\n\nThe bytecode generator is based on the original work of Richard Hundt for the Nyanga programming language.\nIt was largely modified by myself to produce optimized code similar to what LuaJIT would generate, itself.\nA lot of work was also done to ensure the correctness of the bytecode and of the debug information.\n\nAlternative Lua Code generator\n------------------------------\n\nInstead of passing the AST to the bytecode generator, an alternative module can be used to generate Lua code.\nThe module is called \"luacode-generator\" and can be used exactly like the bytecode generator.\n\nThe Lua code generator has the advantage of being more simple and more safe as the code is parsed directly by LuaJIT, ensuring from the beginning complete compatibility of the bytecode.\n\nCurrently the Lua Code Generator backend does not preserve the line numbers of the original source code. This is meant to be fixed in the future.\n\nUse this backend instead of the bytecode generator if you prefer to have a more safe backend to convert the Lua AST to code.\nThe module can also be used for pretty-printing a Lua AST, since the code itself is probably the most human readable representation of the AST.\n\nC API\n---\n\nThe language toolkit provides a very simple set of C APIs to implement a custom language.\nThe functions provided by the C API are:\n\n```c\n/* The functions above are the equivalent of the luaL_* corresponding\n   functions. */\nextern int language_init(lua_State *L);\nextern int language_report(lua_State *L, int status);\nextern int language_loadbuffer(lua_State *L, const char *buff, size_t sz, const char *name);\nextern int language_loadfile(lua_State *L, const char *filename);\n\n\n/* This function push on the stack a Lua table with the functions:\n   loadstring, loadfile, dofile and loader.\n   The first three function can replace the Lua functions while the\n   last one, loader, can be used as a customized \"loader\" function for\n   the \"require\" function. */\nextern int luaopen_langloaders(lua_State *L);\n\n/* OPTIONAL:\n   Load into package.preload lang.* modules using embedded bytecode. */\nextern void language_bc_preload(lua_State *L)\n```\n\nThe functions above can be used to create a custom LuaJIT executable that use the language toolkit implementation.\n\nWhen the function `language_*` is used, an independent `lua_State` is created behind the scenes and used to compile the bytecode.\nOnce the bytecode is generated it is loaded into the user's `lua_State` ready to be executed.\nThe approach of using a separate Lua's state ensure that the process of compiling does not interfere with the user's application.\n\nThe function `language_bc_preload` is useful to create a standalone executable that does not depend on the presence of the Lua files at runtime.\nThe `lang.*` are compiled into bytecode and stored as static C data into the executable.\nBy calling the function `language_bc_preload` all the modules are *preloaded* using the embedded bytecode.\nThis feature can be disabled by changing the `BC_PRELOAD` variable in `src/Makefile`.\n\nHow to build\n---\n\nThe LuaJIT Language toolkit can be compiled and optionally installed using Meson. Ensure that Meson is installed, the easyest way is to use PIP, the Python installer. Ensure also that LuaJIT is correctly installed since it is required for the language toolkit.\n\nOnce Meson and LuaJIT are installed configure the build with the command:\n\n```sh\nmeson setup build\n```\n\nso that the 'build' directory will be used to build. You may also pass the preload option:\n\n```sh\nmeson setup -Dpreload=true build\n```\n\nthen to build use 'ninja', the default Meson's backend.\n\n```sh\n# build\nninja -C build\n\n# install\nninja -C build install\n```\n\nThe Meson-based build will take care of installing all the required Lua files, the library itself, the luajit-x executable and a pkg-config file.\n\nPlease note that when using the 'preload' option the Lua files will not be installed since they are embedded in the library itself.\n\nRunning the Application\n---\n\nThe application can be run with the following command:\n\n```\nluajit run.lua [lua-options] \u003cfilename\u003e\n```\n\nThe \"run.lua\" script will just invoke the complete pipeline of the lexer, parser and bytecode generator and it will pass the bytecode to luajit with \"loadstring\".\n\nThe language toolkit also provides a customized executable named `luajit-x` that uses the language toolkit's pipeline instead of the native one.\nOtherwise, the program `luajit-x` works exactly the same as `luajit` itself, and accepts the same options.\n\nIn the standard build `luajit-x` will contain the `lang.*` modules as embedded bytecode data so that it does not rely on the Lua files at runtime.\n\nThis means that you can experiment with the language by modifying the Lua implementation of the language and test the changes immediately.\nIf the option `BC_PRELOAD` in `src/Makefile` is activated you just need to recompile `luajit-x`.\n\nIf you works with the Lua files of the language toolkit you may choose to disable the `BC_PRELOAD` variable to avoid recompiling the executable for each change in the Lua code.\n\n### Generated Bytecode ###\n\nYou can inspect the bytecode generated by the language toolkit by using the \"-b\" options.\nThey can be invoked either with standard luajit by using \"run.lua\" or directly using the customized program `luajit-x`.\n\nFor example you can inspect the bytecode using the following command:\n\n```\nluajit run.lua -bl tests/test-1.lua\n```\n\nor alternatively:\n\n```\n./src/luajit-x -bl tests/test-1.lua\n```\n\nwhere we suppose that you are running `luajit-x` from the language toolkit's root directory.\n\nEither way, when you use one of the two commands above to generate the bytecode you will the see following on the screen:\n\n```\n-- BYTECODE -- \"test-1.lua\":0-7\n00001    TNEW     0   0\n0002    KSHORT   1   1\n0003    KSHORT   2  10\n0004    KSHORT   3   1\n0005    FORI     1 =\u003e 0010\n0006 =\u003e MULVV    5   4   4\n0007    ADDVN    5   5   0  ; 1\n0008    TSETV    5   0   4\n0009    FORL     1 =\u003e 0006\n0010 =\u003e KSHORT   1   1\n0011    KSHORT   2  10\n0012    KSHORT   3   1\n0013    FORI     1 =\u003e 0018\n0014 =\u003e GGET     5   0      ; \"print\"\n0015    TGETV    6   0   4\n0016    CALL     5   1   2\n0017    FORL     1 =\u003e 0014\n0018 =\u003e RET0     0   1\n```\n\nYou can compare it with the bytecode generated natively by LuaJIT using the command:\n\n```\nluajit -bl tests/test-1.lua\n```\n\nIn the example above the generated bytecode will be *identical* to that generated by LuaJIT.\nThis is not an accident, since the Language Toolkit's bytecode generator is designed to produce the same bytecode that LuaJIT itself would generate.\nIn some cases, the generated code will differ. But, this is not considered a big problem as long as the generated code is still semantically correct.\n\n### Bytecode Annotated Dump ###\n\nIn addition to the standard LuaJIT bytecode functions, the language toolkit also supports a special debug mode where the bytecode is printed byte-by-byte in hex format with some annotations on the right side of the screen.\nThe annotations will explain the meaning of each chunk of bytes and decode them as appropriate.\n\nFor example:\n\n```\nluajit run.lua -bx tests/test-1.lua\n```\n\nwill display something like:\n\n```\n1b 4c 4a 01             | Header LuaJIT 2.0 BC\n00                      | Flags: None\n11 40 74 65 73 74 73 2f | Chunkname: @tests/test-1.lua\n74 65 73 74 2d 31 2e 6c |\n75 61                   |\n                        | .. prototype ..\n8a 01                   | prototype length 138\n02                      | prototype flags PROTO_VARARG\n00                      | parameters number 0\n07                      | framesize 7\n00 01 01 12             | size uv: 0 kgc: 1 kn: 1 bc: 19\n31                      | debug size 49\n00 07                   | firstline: 0 numline: 7\n                        | .. bytecode ..\n32 00 00 00             | 0001    TNEW     0   0\n27 01 01 00             | 0002    KSHORT   1   1\n27 02 0a 00             | 0003    KSHORT   2  10\n27 03 01 00             | 0004    KSHORT   3   1\n49 01 04 80             | 0005    FORI     1 =\u003e 0010\n20 05 04 04             | 0006 =\u003e MULVV    5   4   4\n14 05 00 05             | 0007    ADDVN    5   5   0  ; 1\n39 05 04 00             | 0008    TSETV    5   0   4\n4b 01 fc 7f             | 0009    FORL     1 =\u003e 0006\n27 01 01 00             | 0010 =\u003e KSHORT   1   1\n27 02 0a 00             | 0011    KSHORT   2  10\n27 03 01 00             | 0012    KSHORT   3   1\n49 01 04 80             | 0013    FORI     1 =\u003e 0018\n34 05 00 00             | 0014 =\u003e GGET     5   0      ; \"print\"\n36 06 04 00             | 0015    TGETV    6   0   4\n3e 05 02 01             | 0016    CALL     5   1   2\n4b 01 fc 7f             | 0017    FORL     1 =\u003e 0014\n47 00 01 00             | 0018 =\u003e RET0     0   1\n                        | .. uv ..\n                        | .. kgc ..\n0a 70 72 69 6e 74       | kgc: \"print\"\n                        | .. knum ..\n02                      | knum int: 1\n                        | .. debug ..\n01                      | pc001: line 1\n02                      | pc002: line 2\n02                      | pc003: line 2\n02                      | pc004: line 2\n02                      | pc005: line 2\n...\n```\n\nThis kind of output is especially useful for debugging the language toolkit itself because it does account for every byte of the bytecode and include all the sections of the bytecode.\nFor example, you will be able to inspect the `kgc` or `knum` sections where the prototype's constants are stored.\nThe output will also include the debug section in decoded form so that it can be easily inspected.\n\nThere is a small trick to compare with the bytecode generated by LuaJIT because this latter it doesn't support the `-bx` option. You should generate first the bytecode using luajit:\n\n```\nluajit -bg tests/test-1.lua test-1.bc\n```\n\nand then you can use the language toolkit with the `-bx` option to dump the content on the luajit generated bytecode:\n\n```\nluajit run.lua -bx test-1.bc\n```\n\nso that you can compare the two outputs.\n\nCurrent Status\n---\n\nCurrently LuaJIT Language Toolkit should be considered as beta software.\n\nThe implementation is now complete in term of features and well tested, even for the most complex cases, and a complete test suite is used to verify the correctness of the generated bytecode.\n\nThe language toolkit is currently capable of executing itself.\nThis means that the language toolkit is able to correctly compile and load all of its module and execute them correctly.\n\nYet some bugs are probably present and you should be cautious when you use LuaJIT language toolkit.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffranko%2Fluajit-lang-toolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffranko%2Fluajit-lang-toolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffranko%2Fluajit-lang-toolkit/lists"}