{"id":13645513,"url":"https://github.com/pdziepak/codegen","last_synced_at":"2025-04-09T14:15:18.988Z","repository":{"id":145833097,"uuid":"185667341","full_name":"pdziepak/codegen","owner":"pdziepak","description":"Experimental wrapper over LLVM for generating and compiling code at run-time.","archived":false,"fork":false,"pushed_at":"2019-12-04T15:27:44.000Z","size":112,"stargazers_count":379,"open_issues_count":2,"forks_count":19,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-04-02T08:36:03.877Z","etag":null,"topics":["c-plus-plus","codegen","jit","llvm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pdziepak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-05-08T19:27:46.000Z","updated_at":"2025-02-28T18:05:41.000Z","dependencies_parsed_at":"2024-01-14T09:57:42.887Z","dependency_job_id":"6043525c-94b6-4544-b4a3-b2654e3747da","html_url":"https://github.com/pdziepak/codegen","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pdziepak%2Fcodegen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pdziepak%2Fcodegen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pdziepak%2Fcodegen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pdziepak%2Fcodegen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pdziepak","download_url":"https://codeload.github.com/pdziepak/codegen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248054193,"owners_count":21039952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","codegen","jit","llvm"],"created_at":"2024-08-02T01:02:36.256Z","updated_at":"2025-04-09T14:15:18.970Z","avatar_url":"https://github.com/pdziepak.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"# CodeGen\n\n[![Build Status](https://travis-ci.com/pdziepak/codegen.svg?branch=master)](https://travis-ci.com/pdziepak/codegen)\n[![codecov](https://codecov.io/gh/pdziepak/codegen/branch/master/graph/badge.svg)](https://codecov.io/gh/pdziepak/codegen)\n\nExperimental wrapper over LLVM for generating and compiling code at run-time.\n\n## About\n\nCodeGen is a library that builds on top of LLVM.  It facilitates just-in-time code generation and compilation, including debugging information and human-readable source code. C++ type system is employed to guard against, at least some, errors in the generated intermediate representation. The intention is to allow the application to improve performance by taking advantage of information that becomes available only once it is running. A sample use case would be prepared statements in of database engines.\n\nThe general idea is not unlike that described in [P1609R0: C++ Should Support Just-in-Time Compilation](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1609r0.html).\n\n## Building\n\nThe build requirements are as follows:\n\n* CMake 3.12\n* GCC 8+ or Clang 8+\n* LLVM 8\n* fmt\n* Google Test (optional)\n\n`fedora:30` docker container may be a good place to start.\n\nThe build instructions are quite usual for a CMake-based project:\n\n```\ncd \u003cbuild-directory\u003e\ncmake -DCMAKE_BUILD_TYPE=\u003cDebug|Release\u003e -G Ninja \u003csource-directory\u003e\nninja\nninja test\n```\n\n## Design\n\nThe main object representing the JIT compiler is `codegen::compiler`. All function pointers to the compiled code remain valid during its lifetime. `codegen::module_builder` allows creating an LLVM builder, while `codegen::module` represents an already compiled module. The general template that for CodeGen use looks as follows:\n\n```c++\n  namespace cg = codegen;\n  auto compiler = cg::compiler{};\n  auto builder = cg::module_builder(compiler, \"module_name\");\n\n  auto function_reference = builder.create_function\u003cint(int)\u003e(\"function_name\",\n    [](cg::value\u003cint\u003e v) {\n      /* more logic here */\n      cg::return_(v + cg::constant\u003cint\u003e(1));\n    });\n\n  auto module = std::move(builder).build();\n  using function_pointer_type = int(*)(int);\n  function_pointer_type function_pointer = module.get_address(function_reference);\n```\n\nThe code above compiles a function that returns an integer that was passed to it as an argument incremented by one. Each module may contain multiple functions. `codegen::module_builder::create_function` returns a function reference that can be used to obtain a pointer to the function after the module is compiled (as in this example) or to call it from another function generated with CodeGen.\n\n`codegen::value\u003cT\u003e` is a typed equivalent of `llvm::Value` and represents a SSA value. As of now, only fundamental types are supported. CodeGen provides operators for those arithmetic and relational operations that make sense for a given type. Expression templates are used in a limited fashion to allow producing more concise human-readable source code. Unlike C++ there are no automatic promotions or implicit casts of any kind. Instead, `bit_cast\u003cT\u003e` or `cast\u003cT\u003e` need to be explicitly used where needed.\n\nSSA starts getting a bit more cumbersome to use once the control flow diverges, and a Φ function is required. This can be avoided by using local variables `codegen::variable\u003cT\u003e`. The resulting IR is not going to be perfect, but the LLVM optimisation passes tend to do an excellent job converting those memory accesses.\n\n### Statements\n\n\n* `return_()`, `return_(Value)` – returns from function. Note, that the type of the returned value is not verified and CodeGen will not prevent generating a function of type `int()` that returns `void`.\n* `load(Pointer)` – takes a pointer of type `T*` and loads the value from memory,  `codegen::value\u003cT \u003e`.\n* `store(Value, Pointer)` – stores `Value` of type `T` at the location pointed to by `Pointer`. The type of the pointer needs to be `T*`.\n* `if_(Value, TrueBlock, FalseBlock)`, `if_(Value, TrueBlock)` – an `if` conditional statement. The type of the provided value needs to be `bool`. `TrueBlock` and `FalseBlock` are expected to be lambdas. For example:\n\n```c++\nauto silly_function = builder.create_function\u003cbool(bool)\u003e(\"silly_function\",\n    [](cg::value\u003cbool\u003e is_true) {\n      cg::if_(is_true, [] { cg::return_(cg::true_()); }, [] { cg::return_(cg::false_()); });\n    });\n```\n\n* `while_(Condition, LoopBody)` – a `while` loop. `Condition` is a lambda returning a value of type `bool`. `LoopBody` is a lambda that generates the body of the loop. For example:\n\n```c++\nauto silly_function2 = builder.create_function\u003cunsigned(unsigned)\u003e(\"silly_function2\",\n    [](cg::value\u003cunsigned\u003e target) {\n      auto var = cg::variable\u003cunsigned\u003e(\"var\", cg::constant\u003cunsigned\u003e(0));\n      cg::while_([\u0026] { return var.get() \u003c target; },\n        [\u0026] {\n          var.set(var.get() + cg::constant\u003cunsigned\u003e(1));\n        });\n      cg::return_(var.get());\n    });\n```\n\n* `call(Function, Arguments...)` – a function call. `Function` is a function reference. `Arguments...` is a list of arguments matching the function type.\n\n## Examples\n\n### Tuple comparator\n\n\nIn this example, let's consider tuples which element's types are known only at run-time. If the goal is to write a less-comparator for such tuples, the naive approach would be to have a virtual function call for each element. That is far from ideal if the actual comparison is very cheap, e.g. the elements are integers. With CodeGen, we can do better. First, let's write a comparator for an element of a fundamental type:\n\n```c++\ntemplate\u003ctypename T\u003e\nsize_t less_cmp(cg::value\u003cstd::byte const*\u003e a_ptr, cg::value\u003cstd::byte const*\u003e b_ptr, size_t off) {\n  auto a_val = cg::load(cg::bit_cast\u003cT*\u003e(a_ptr + cg::constant\u003cuint64_t\u003e(off)));\n  auto b_val = cg::load(cg::bit_cast\u003cT*\u003e(b_ptr + cg::constant\u003cuint64_t\u003e(off)));\n  cg::if_(a_val \u003c b_val, [\u0026] { cg::return_(cg::true_()); });\n  cg::if_(a_val \u003e b_val, [\u0026] { cg::return_(cg::false_()); });\n  return sizeof(T) + off;\n}\n```\n\nThis function template generates comparison code for any fundamental type. The arguments are pointers to buffers containing both tuples and an offset at which the element is located. The return value is the offset of the next element.\n\nNow, let's say we want to generate a less-comparator for `tuple\u003ci32, float, u16\u003e`.\n\n```c++\n  auto less = builder.create_function\u003cbool(std::byte const*, std::byte const*)\u003e(\n      \"less\", [\u0026](cg::value\u003cstd::byte const*\u003e a_ptr, cg::value\u003cstd::byte const*\u003e b_ptr) {\n        size_t offset = 0;\n        offset = less_cmp\u003cint32_t\u003e(a_ptr, b_ptr, offset);\n        offset = less_cmp\u003cfloat\u003e(a_ptr, b_ptr, offset);\n        offset = less_cmp\u003cuint16_t\u003e(a_ptr, b_ptr, offset);\n        (void)offset;\n        cg::return_(cg::false_());\n      });\n```\n\nAs we can see, building the actual comparator is quite straightforward. The human-readable source code that CodeGen generates looks like this:\n\n```c\n1   bool less(byte* arg0, byte* arg1) {\n2       val0 = *bit_cast\u003ci32*\u003e((arg0 + 0))\n3       val1 = *bit_cast\u003ci32*\u003e((arg1 + 0))\n4       if ((val0 \u003c val1)) {\n5           return true;\n6       }\n7       if ((val0 \u003e val1)) {\n8           return false;\n9       }\n10      val2 = *bit_cast\u003cf32*\u003e((arg0 + 4))\n11      val3 = *bit_cast\u003cf32*\u003e((arg1 + 4))\n12      if ((val2 \u003c val3)) {\n13          return true;\n14      }\n15      if ((val2 \u003e val3)) {\n16          return false;\n17      }\n18      val4 = *bit_cast\u003cu16*\u003e((arg0 + 8))\n19      val5 = *bit_cast\u003cu16*\u003e((arg1 + 8))\n20      if ((val4 \u003c val5)) {\n21          return true;\n22      }\n23      if ((val4 \u003e val5)) {\n24          return false;\n25      }\n26      return false;\n27  }\n\n```\n\nThe assembly that LLVM emits:\n\n```x86asm\n   0x00007fffefd57000 \u003c+0\u003e:   mov    (%rdi),%ecx\n   0x00007fffefd57002 \u003c+2\u003e:   mov    (%rsi),%edx\n   0x00007fffefd57004 \u003c+4\u003e:   mov    $0x1,%al\n   0x00007fffefd57006 \u003c+6\u003e:   cmp    %edx,%ecx\n   0x00007fffefd57008 \u003c+8\u003e:   jl     0x7fffefd57026 \u003cless+38\u003e\n   0x00007fffefd5700a \u003c+10\u003e:  cmp    %edx,%ecx\n   0x00007fffefd5700c \u003c+12\u003e:  jg     0x7fffefd57024 \u003cless+36\u003e\n   0x00007fffefd5700e \u003c+14\u003e:  vmovss 0x4(%rdi),%xmm0\n   0x00007fffefd57013 \u003c+19\u003e:  vmovss 0x4(%rsi),%xmm1\n   0x00007fffefd57018 \u003c+24\u003e:  vucomiss %xmm0,%xmm1\n   0x00007fffefd5701c \u003c+28\u003e:  ja     0x7fffefd57026 \u003cless+38\u003e\n   0x00007fffefd5701e \u003c+30\u003e:  vucomiss %xmm1,%xmm0\n   0x00007fffefd57022 \u003c+34\u003e:  jbe    0x7fffefd57027 \u003cless+39\u003e\n   0x00007fffefd57024 \u003c+36\u003e:  xor    %eax,%eax\n   0x00007fffefd57026 \u003c+38\u003e:  retq\n   0x00007fffefd57027 \u003c+39\u003e:  movzwl 0x8(%rdi),%eax\n   0x00007fffefd5702b \u003c+43\u003e:  cmp    0x8(%rsi),%ax\n   0x00007fffefd5702f \u003c+47\u003e:  setb   %al\n   0x00007fffefd57032 \u003c+50\u003e:  retq\n```\n\nSince CodeGen takes care of emitting all necessary debugging information, and informing GDB about the JIT-ed functions, the debugging experience shouldn't be too bad:\n\n```\n(gdb) b 3\nBreakpoint 2 at 0x7fffefd57002: file /tmp/examples-11076310111440055155/tuple_i32f32u16_less.txt, line 3.\n(gdb) c\nContinuing.\n\nBreakpoint 2, less (arg0=0x60200001c7b0 \"\", arg1=0x60200001c790 \"\\001\") at /tmp/examples-11076310111440055155/tuple_i32f32u16_less.txt:3\n3\t    val1 = *bit_cast\u003ci32*\u003e((arg1 + 0))\n(gdb) p val0\n$1 = 0\n(gdb) n\n4\t    if ((val0 \u003c val1)) {\n(gdb) p val1\n$3 = 1\n(gdb) n\nless (arg0=0x60200001c7b0 \"\", arg1=0x60200001c790 \"\\001\") at /tmp/examples-11076310111440055155/tuple_i32f32u16_less.txt:5\n5\t        return true;\n```\n\nA more complicated example would be if one of the tuple elements was an ASCII string. The following code generates a comparator for `tuple\u003ci32, string\u003e` assuming that a string is serialised in the form of `\u003clength:u32\u003e\u003cbytes...\u003e`:\n\n```c++\n  auto less = builder.create_function\u003cbool(std::byte const*, std::byte const*)\u003e(\n      \"less\", [\u0026](cg::value\u003cstd::byte const*\u003e a_ptr, cg::value\u003cstd::byte const*\u003e b_ptr) {\n        size_t offset = 0;\n        offset = less_cmp\u003cint32_t\u003e(a_ptr, b_ptr, offset);\n\n        auto a_len = cg::load(cg::bit_cast\u003cuint32_t*\u003e(a_ptr + cg::constant\u003cuint64_t\u003e(offset)));\n        auto b_len = cg::load(cg::bit_cast\u003cuint32_t*\u003e(b_ptr + cg::constant\u003cuint64_t\u003e(offset)));\n        // TODO: extract to a separate function\n        auto len = cg::call(min, a_len, b_len);\n        auto ret = cg::builtin::memcmp(a_ptr + cg::constant\u003cuint64_t\u003e(offset) + 4_u64,\n                                       b_ptr + cg::constant\u003cuint64_t\u003e(offset) + 4_u64, len);\n        cg::if_(ret \u003c 0_i32, [\u0026] { cg::return_(cg::true_()); });\n        cg::if_(ret \u003e 0_i32, [\u0026] { cg::return_(cg::false_()); });\n        cg::return_(a_len \u003c b_len);\n      });\n```\n\nLet's look at the emitted assembly mixed with human-readable source code:\n\n```\n(gdb) disas /s less\nDump of assembler code for function less:\n/tmp/examples-12144749341750180701/tuple_i32str_less.txt:\n7  bool less(byte* arg0, byte* arg1) {\n   0x00007fffefd47010 \u003c+0\u003e:   push   %rbp\n   0x00007fffefd47011 \u003c+1\u003e:   push   %r14\n   0x00007fffefd47013 \u003c+3\u003e:   push   %rbx\n\n8      val6 = *bit_cast\u003ci32*\u003e((arg0 + 0))\n   0x00007fffefd47014 \u003c+4\u003e:   mov    (%rdi),%eax\n\n9      val7 = *bit_cast\u003ci32*\u003e((arg1 + 0))\n   0x00007fffefd47016 \u003c+6\u003e:   mov    (%rsi),%ecx\n   0x00007fffefd47018 \u003c+8\u003e:   mov    $0x1,%bl\n\n10      if ((val6 \u003c val7)) {\n   0x00007fffefd4701a \u003c+10\u003e:  cmp    %ecx,%eax\n   0x00007fffefd4701c \u003c+12\u003e:  jl     0x7fffefd4704e \u003cless+62\u003e\n\n12      }\n13      if ((val6 \u003e val7)) {\n   0x00007fffefd4701e \u003c+14\u003e:  cmp    %ecx,%eax\n   0x00007fffefd47020 \u003c+16\u003e:  jle    0x7fffefd47026 \u003cless+22\u003e\n   0x00007fffefd47022 \u003c+18\u003e:  xor    %ebx,%ebx\n   0x00007fffefd47024 \u003c+20\u003e:  jmp    0x7fffefd4704e \u003cless+62\u003e\n\n14          return false;\n15      }\n16      val8 = *bit_cast\u003cu32*\u003e((arg0 + 4))\n   0x00007fffefd47026 \u003c+22\u003e:  mov    0x4(%rdi),%r14d\n\n17      val9 = *bit_cast\u003cu32*\u003e((arg1 + 4))\n   0x00007fffefd4702a \u003c+26\u003e:  mov    0x4(%rsi),%ebp\n\n2      if ((arg0 \u003c arg1)) {\n   0x00007fffefd4702d \u003c+29\u003e:  cmp    %ebp,%r14d\n   0x00007fffefd47030 \u003c+32\u003e:  mov    %ebp,%edx\n   0x00007fffefd47032 \u003c+34\u003e:  cmovb  %r14d,%edx\n\n18      min_ret = min(val8, val9, );\n19      memcmp_ret = memcmp(((arg0 + 4) + 4), ((arg1 + 4) + 4), min_ret);\n   0x00007fffefd47036 \u003c+38\u003e:  add    $0x8,%rdi\n   0x00007fffefd4703a \u003c+42\u003e:  add    $0x8,%rsi\n   0x00007fffefd4703e \u003c+46\u003e:  movabs $0x7ffff764bd90,%rax\n   0x00007fffefd47048 \u003c+56\u003e:  callq  *%rax\n\n20      if ((memcmp_ret \u003c 0)) {\n   0x00007fffefd4704a \u003c+58\u003e:  test   %eax,%eax\n   0x00007fffefd4704c \u003c+60\u003e:  jns    0x7fffefd47055 \u003cless+69\u003e\n\n11          return true;\n   0x00007fffefd4704e \u003c+62\u003e:  mov    %ebx,%eax\n   0x00007fffefd47050 \u003c+64\u003e:  pop    %rbx\n   0x00007fffefd47051 \u003c+65\u003e:  pop    %r14\n   0x00007fffefd47053 \u003c+67\u003e:  pop    %rbp\n   0x00007fffefd47054 \u003c+68\u003e:  retq\n\n2      if ((arg0 \u003c arg1)) {\n   0x00007fffefd47055 \u003c+69\u003e:  cmp    %ebp,%r14d\n   0x00007fffefd47058 \u003c+72\u003e:  setb   %cl\n\n21          return true;\n22      }\n23      if ((memcmp_ret \u003e 0)) {\n   0x00007fffefd4705b \u003c+75\u003e:  test   %eax,%eax\n   0x00007fffefd4705d \u003c+77\u003e:  sete   %al\n   0x00007fffefd47060 \u003c+80\u003e:  and    %cl,%al\n   0x00007fffefd47062 \u003c+82\u003e:  pop    %rbx\n   0x00007fffefd47063 \u003c+83\u003e:  pop    %r14\n   0x00007fffefd47065 \u003c+85\u003e:  pop    %rbp\n   0x00007fffefd47066 \u003c+86\u003e:  retq\n```\n\nAs we can see, LLVM has inlined calls to `min`. `memcmp` is an external function, so it could never be inlined. The source code lines match the assembly most of the time, but slight confusion there is expected since the code is compiled with aggressive optimisations.\n\n### Vectorisation\n\nIn the previous example, we knew the computations that we wanted to perform but didn't know the data. Let's now look at the opposite situation. The data organised as a structure of arrays, but we don't know ahead of time what arithmetic operations the application will need to execute. How the information about the desired computations is represented is out of the scope of CodeGen, though we may suspect an abstract syntax tree being involved there. The application would have to translate that to appropriate calls to CodeGen. For example, if for a value `a` and arrays `b` and `c` we wanted to compute `d[i] = a * b[i] + c[i]` it could be achieved by the code like this:\n\n```c++\n  auto compute = builder.create_function\u003cvoid(int32_t, int32_t const*, int32_t const*, int32_t*, uint64_t)\u003e(\n      \"compute\", [\u0026](cg::value\u003cint32_t\u003e a, cg::value\u003cint32_t const*\u003e b_ptr, cg::value\u003cint32_t const*\u003e c_ptr,\n                     cg::value\u003cint32_t*\u003e d_ptr, cg::value\u003cuint64_t\u003e n) {\n        auto idx = cg::variable\u003cuint64_t\u003e(\"idx\", 0_u64);\n        cg::while_([\u0026] { return idx.get() \u003c n; },\n                   [\u0026] {\n                     auto i = idx.get();\n                     cg::store(a * cg::load(b_ptr + i) + cg::load(c_ptr + i), d_ptr + i);\n                     idx.set(i + 1_u64);\n                   });\n        cg::return_();\n      });\n```\n\nCodeGen configures LLVM so that it takes advantage of the features available on the CPU it executes on. For instance, Skylake supports AVX2, so it is going to be used to vectorise the loop.\n\n```x86asm\n6            val11 = *(arg1 + idx)\n7            *(arg3 + idx) = ((arg0 * val11) + val10)\n   0x00007fffefd27140 \u003c+320\u003e:    vpmulld (%rsi,%r9,4),%ymm0,%ymm1\n   0x00007fffefd27146 \u003c+326\u003e:    vpmulld 0x20(%rsi,%r9,4),%ymm0,%ymm2\n   0x00007fffefd2714d \u003c+333\u003e:    vpmulld 0x40(%rsi,%r9,4),%ymm0,%ymm3\n   0x00007fffefd27154 \u003c+340\u003e:    vpmulld 0x60(%rsi,%r9,4),%ymm0,%ymm4\n   0x00007fffefd2715b \u003c+347\u003e:    vpaddd (%rdx,%r9,4),%ymm1,%ymm1\n   0x00007fffefd27161 \u003c+353\u003e:    vpaddd 0x20(%rdx,%r9,4),%ymm2,%ymm2\n   0x00007fffefd27168 \u003c+360\u003e:    vpaddd 0x40(%rdx,%r9,4),%ymm3,%ymm3\n   0x00007fffefd2716f \u003c+367\u003e:    vpaddd 0x60(%rdx,%r9,4),%ymm4,%ymm4\n   0x00007fffefd27176 \u003c+374\u003e:    vmovdqu %ymm1,(%rcx,%r9,4)\n   0x00007fffefd2717c \u003c+380\u003e:    vmovdqu %ymm2,0x20(%rcx,%r9,4)\n   0x00007fffefd27183 \u003c+387\u003e:    vmovdqu %ymm3,0x40(%rcx,%r9,4)\n   0x00007fffefd2718a \u003c+394\u003e:    vmovdqu %ymm4,0x60(%rcx,%r9,4)\n   0x00007fffefd27191 \u003c+401\u003e:    vpmulld 0x80(%rsi,%r9,4),%ymm0,%ymm1\n   0x00007fffefd2719b \u003c+411\u003e:    vpmulld 0xa0(%rsi,%r9,4),%ymm0,%ymm2\n   0x00007fffefd271a5 \u003c+421\u003e:    vpmulld 0xc0(%rsi,%r9,4),%ymm0,%ymm3\n   0x00007fffefd271af \u003c+431\u003e:    vpmulld 0xe0(%rsi,%r9,4),%ymm0,%ymm4\n   0x00007fffefd271b9 \u003c+441\u003e:    vpaddd 0x80(%rdx,%r9,4),%ymm1,%ymm1\n   0x00007fffefd271c3 \u003c+451\u003e:    vpaddd 0xa0(%rdx,%r9,4),%ymm2,%ymm2\n   0x00007fffefd271cd \u003c+461\u003e:    vpaddd 0xc0(%rdx,%r9,4),%ymm3,%ymm3\n   0x00007fffefd271d7 \u003c+471\u003e:    vpaddd 0xe0(%rdx,%r9,4),%ymm4,%ymm4\n   0x00007fffefd271e1 \u003c+481\u003e:    vmovdqu %ymm1,0x80(%rcx,%r9,4)\n   0x00007fffefd271eb \u003c+491\u003e:    vmovdqu %ymm2,0xa0(%rcx,%r9,4)\n   0x00007fffefd271f5 \u003c+501\u003e:    vmovdqu %ymm3,0xc0(%rcx,%r9,4)\n   0x00007fffefd271ff \u003c+511\u003e:    vmovdqu %ymm4,0xe0(%rcx,%r9,4)\n8            idx = (idx + 1);\n   0x00007fffefd27209 \u003c+521\u003e:    add    $0x40,%r9\n   0x00007fffefd2720d \u003c+525\u003e:    add    $0x2,%r11\n   0x00007fffefd27211 \u003c+529\u003e:    jne    0x7fffefd27140 \u003ccompute+320\u003e\n   0x00007fffefd27217 \u003c+535\u003e:    test   %r10,%r10\n   0x00007fffefd2721a \u003c+538\u003e:    je     0x7fffefd2726d \u003ccompute+621\u003e\n```\n\nAt the moment, CodeGen doesn't need to know anything about the ABI or the hardware architecture, which means that it can easily support all compilation targets that LLVM does. Below is the core part of the same loop compiled for aarch64 Cortex-A53.\n\n```\n5\t        val10 = *(arg1 + idx)\n   0x0000007fb050e070 \u003c+112\u003e:\tldp\tq1, q2, [x9, #-16]\n\n6\t        val11 = *(arg2 + idx)\n   0x0000007fb050e074 \u003c+116\u003e:\tldp\tq3, q4, [x10, #-16]\n\n8\t        idx = (idx + 1);\n   0x0000007fb050e078 \u003c+120\u003e:\tadd\tx9, x9, #0x20\n   0x0000007fb050e07c \u003c+124\u003e:\tadd\tx10, x10, #0x20\n\n7\t        *(arg3 + idx) = ((arg0 * val10) + val11)\n   0x0000007fb050e080 \u003c+128\u003e:\tmla\tv3.4s, v1.4s, v0.4s\n\n8\t        idx = (idx + 1);\n   0x0000007fb050e084 \u003c+132\u003e:\tsubs\tx12, x12, #0x8\n\n7\t        *(arg3 + idx) = ((arg0 * val10) + val11)\n   0x0000007fb050e088 \u003c+136\u003e:\tmla\tv4.4s, v2.4s, v0.4s\n   0x0000007fb050e08c \u003c+140\u003e:\tstp\tq3, q4, [x11, #-16]\n\n8\t        idx = (idx + 1);\n   0x0000007fb050e090 \u003c+144\u003e:\tadd\tx11, x11, #0x20\n   0x0000007fb050e094 \u003c+148\u003e:\tb.ne\t0x7fb050e070 \u003ccompute+112\u003e  // b.any\n```\n\n## TODO\n\n* Support for aggregate types. This requires CodeGen to be aware of the ABI and would benefit if C++ had any form of static reflection.\n* Add missing operations (e.g. shifts).\n* Type-Based Alias Anaylsis.\n* Allow the user to tune optimisation options and disable generation of debugging information.\n* Bind compiled functions lifetimes to their module instead of the compiler object.\n* Support for other versions of LLVM.\n* Allow adding more metadata and attribute, e.g. `noalias` for function parameters.\n* Try harder to use C++ type system to prevent generation of invalid LLVM IR.\n* The TODO list is incomplete. Add more items to it.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpdziepak%2Fcodegen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpdziepak%2Fcodegen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpdziepak%2Fcodegen/lists"}