{"id":13574497,"url":"https://github.com/banach-space/llvm-tutor","last_synced_at":"2025-05-13T22:11:24.733Z","repository":{"id":38314881,"uuid":"177661574","full_name":"banach-space/llvm-tutor","owner":"banach-space","description":"A collection of out-of-tree LLVM passes for teaching and learning","archived":false,"fork":false,"pushed_at":"2025-04-27T10:53:35.000Z","size":601,"stargazers_count":3164,"open_issues_count":10,"forks_count":404,"subscribers_count":54,"default_branch":"main","last_synced_at":"2025-04-30T14:16:02.651Z","etag":null,"topics":["compilers","llvm","llvm-ir","llvm-pass","llvm-plugins","llvm-tutor","llvm-tutorial","tutorial"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/banach-space.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-03-25T20:44:46.000Z","updated_at":"2025-04-29T09:23:24.000Z","dependencies_parsed_at":"2023-11-19T17:25:21.427Z","dependency_job_id":"e3abdf67-5861-45be-9460-2194e2a4dc34","html_url":"https://github.com/banach-space/llvm-tutor","commit_stats":{"total_commits":280,"total_committers":37,"mean_commits":"7.5675675675675675","dds":"0.17142857142857137","last_synced_commit":"1ebf16d64fab1f8e75ee2d2db15af1dd49adb179"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banach-space%2Fllvm-tutor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banach-space%2Fllvm-tutor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banach-space%2Fllvm-tutor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banach-space%2Fllvm-tutor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/banach-space","download_url":"https://codeload.github.com/banach-space/llvm-tutor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254036842,"owners_count":22003654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compilers","llvm","llvm-ir","llvm-pass","llvm-plugins","llvm-tutor","llvm-tutorial","tutorial"],"created_at":"2024-08-01T15:00:52.117Z","updated_at":"2025-05-13T22:11:24.713Z","avatar_url":"https://github.com/banach-space.png","language":"C++","funding_links":[],"categories":["C++","其他_安全与渗透","Starchart"],"sub_categories":["网络服务_其他"],"readme":"llvm-tutor\n=========\n[![Apple Silicon](https://github.com/banach-space/llvm-tutor/actions/workflows/apple-silicon.yml/badge.svg?branch=main)](https://github.com/banach-space/llvm-tutor/actions/workflows/apple-silicon.yml)\n[![x86-Ubuntu](https://github.com/banach-space/llvm-tutor/actions/workflows/x86-ubuntu.yml/badge.svg?branch=main)](https://github.com/banach-space/llvm-tutor/actions/workflows/x86-ubuntu.yml)\n\n\nExample LLVM passes - based on **LLVM 19**\n\n**llvm-tutor** is a collection of self-contained reference LLVM passes. It's a\ntutorial that targets novice and aspiring LLVM developers. Key features:\n\n* **Out-of-tree** - builds against a binary LLVM installation (no need to build LLVM from sources)\n* **Complete** - includes `CMake` build scripts, LIT tests, CI set-up and documentation\n* **Modern** - based on the latest version of LLVM (and updated with every release)\n\n### Overview\nLLVM implements a very rich, powerful and popular API. However, like many\ncomplex technologies, it can be quite daunting and overwhelming to learn and\nmaster. The goal of this LLVM tutorial is to showcase that LLVM can in fact be\neasy and fun to work with. This is demonstrated through a range self-contained,\ntestable LLVM passes, which are implemented using idiomatic LLVM.\n\nThis document explains how to set-up your environment, build and run the\nexamples, and go about debugging. It contains a high-level overview of the\nimplemented examples and contains some background information on writing LLVM\npasses. The source files, apart from the code itself, contain comments that\nwill guide you through the implementation. All examples are complemented with\n[LIT](https://llvm.org/docs/TestingGuide.html) tests and reference [input\nfiles](https://github.com/banach-space/llvm-tutor/blob/main/inputs).\n\nVisit [**clang-tutor**](https://github.com/banach-space/clang-tutor/) if you\nare internested in similar tutorial for Clang.\n\n### Table of Contents\n* [HelloWorld: Your First Pass](#helloworld-your-first-pass)\n* Part 1: **llvm-tutor** in more detail\n  * [Development Environment](#development-environment)\n  * [Building \u0026 Testing](#building--testing)\n  * [Overview of the Passes](#overview-of-the-passes)\n  * [Debugging](#debugging)\n* Part 2: Passes In LLVM\n  * [Analysis vs Transformation Pass](#analysis-vs-transformation-pass)\n  * [Dynamic vs Static Plugins](#dynamic-vs-static-plugins)\n  * [Optimisation Passes Inside LLVM](#optimisation-passes-inside-llvm)\n* [References](#references)\n\n\nHelloWorld: Your First Pass\n===========================\nThe **HelloWorld** pass from\n[HelloWorld.cpp](https://github.com/banach-space/llvm-tutor/blob/main/HelloWorld/HelloWorld.cpp)\nis a self-contained *reference example*. The corresponding\n[CMakeLists.txt](https://github.com/banach-space/llvm-tutor/blob/main/HelloWorld/CMakeLists.txt)\nimplements the minimum set-up for an out-of-source pass.\n\nFor every function defined in the input module, **HelloWorld** prints its name\nand the number of arguments that it takes. You can build it like this:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\nmkdir build\ncd build\ncmake -DLT_LLVM_INSTALL_DIR=$LLVM_DIR \u003csource/dir/llvm/tutor\u003e/HelloWorld/\nmake\n```\n\nBefore you can test it, you need to prepare an input file:\n\n```bash\n# Generate an LLVM test file\n$LLVM_DIR/bin/clang -O1 -S -emit-llvm \u003csource/dir/llvm/tutor\u003e/inputs/input_for_hello.c -o input_for_hello.ll\n```\n\nFinally, run **HelloWorld** with\n[**opt**](http://llvm.org/docs/CommandGuide/opt.html) (use `libHelloWorld.so`\non Linux and `libHelloWorld.dylib` on Mac OS):\n\n```bash\n# Run the pass\n$LLVM_DIR/bin/opt -load-pass-plugin ./libHelloWorld.{so|dylib} -passes=hello-world -disable-output input_for_hello.ll\n# Expected output\n(llvm-tutor) Hello from: foo\n(llvm-tutor)   number of arguments: 1\n(llvm-tutor) Hello from: bar\n(llvm-tutor)   number of arguments: 2\n(llvm-tutor) Hello from: fez\n(llvm-tutor)   number of arguments: 3\n(llvm-tutor) Hello from: main\n(llvm-tutor)   number of arguments: 2\n```\n\nThe **HelloWorld** pass doesn't modify the input module. The `-disable-output`\nflag is used to prevent **opt** from printing the output bitcode file.\n\nDevelopment Environment\n=======================\n## Platform Support And Requirements\nThis project has been tested on **Ubuntu 22.04** and **Mac OS X 11.7**. In\norder to build **llvm-tutor** you will need:\n  * LLVM 19\n  * C++ compiler that supports C++17\n  * CMake 3.20 or higher\n\nIn order to run the passes, you will need:\n  * **clang-19** (to generate input LLVM files)\n  * [**opt**](http://llvm.org/docs/CommandGuide/opt.html) (to run the passes)\n\nThere are additional requirements for tests (these will be satisfied by\ninstalling LLVM 19):\n  * [**lit**](https://llvm.org/docs/CommandGuide/lit.html) (aka **llvm-lit**,\n    LLVM tool for executing the tests)\n  * [**FileCheck**](https://llvm.org/docs/CommandGuide/FileCheck.html) (LIT\n    requirement, it's used to check whether tests generate the expected output)\n\n## Installing LLVM 19 on Mac OS X\nOn Darwin you can install LLVM 19 with [Homebrew](https://brew.sh/):\n\n```bash\nbrew install llvm@19\n```\n\nIf you already have an older version of LLVM installed, you can upgrade it to\nLLVM 19 like this:\n\n```bash\nbrew upgrade llvm\n```\n\nOnce the installation (or upgrade) is complete, all the required header files,\nlibraries and tools will be located in `/opt/homebrew/opt/llvm/`.\n\n## Installing LLVM 19 on Ubuntu\nOn Ubuntu Jammy Jellyfish, you can install modern LLVM from the official\n[repository](http://apt.llvm.org/):\n\n```bash\nwget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -\nsudo apt-add-repository \"deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main\"\nsudo apt-get update\nsudo apt-get install -y llvm-19 llvm-19-dev llvm-19-tools clang-19\n```\nThis will install all the required header files, libraries and tools in\n`/usr/lib/llvm-19/`.\n\n## Building LLVM 19 From Sources\nBuilding from sources can be slow and tricky to debug. It is not necessary, but\nmight be your preferred way of obtaining LLVM 19. The following steps will work\non Linux and Mac OS X:\n\n```bash\ngit clone https://github.com/llvm/llvm-project.git\ncd llvm-project\ngit checkout release/19.x\nmkdir build\ncd build\ncmake -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS=clang \u003cllvm-project/root/dir\u003e/llvm/\ncmake --build .\n```\nFor more details read the [official\ndocumentation](https://llvm.org/docs/CMake.html).\n\nBuilding \u0026 Testing\n===================\n## Building\nYou can build **llvm-tutor** (and all the provided pass plugins) as follows:\n\n```bash\ncd \u003cbuild/dir\u003e\ncmake -DLT_LLVM_INSTALL_DIR=\u003cinstallation/dir/of/llvm/19\u003e \u003csource/dir/llvm/tutor\u003e\nmake\n```\n\nThe `LT_LLVM_INSTALL_DIR` variable should be set to the root of either the\ninstallation or build directory of LLVM 19. It is used to locate the\ncorresponding `LLVMConfig.cmake` script that is used to set the include and\nlibrary paths.\n\n## Testing\nIn order to run **llvm-tutor** tests, you need to install **llvm-lit** (aka\n**lit**). It's not bundled with LLVM 19 packages, but you can install it with\n**pip**:\n\n```bash\n# Install lit - note that this installs lit globally\npip install lit\n```\nRunning the tests is as simple as:\n\n```bash\n$ lit \u003cbuild_dir\u003e/test\n```\nVoilà! You should see all tests passing.\n\n## LLVM Plugins as shared objects\nIn **llvm-tutor** every LLVM pass is implemented in a separate shared object\n(you can learn more about shared objects\n[here](http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html)).\nThese shared objects are essentially dynamically loadable plugins for **opt**.\nAll plugins are built in the `\u003cbuild/dir\u003e/lib` directory.\n\nNote that the extension of dynamically loaded shared objects differs between\nLinux and Mac OS. For example, for the **HelloWorld** pass you will get:\n\n* `libHelloWorld.so` on Linux\n* `libHelloWorld.dylib` on MacOS.\n\nFor the sake of consistency, in this README.md file all examples use the `*.so`\nextension. When working on Mac OS, use `*.dylib` instead.\n\nOverview of The Passes\n======================\nThe available passes are categorised as either Analysis, Transformation or CFG.\nThe difference between Analysis and Transformation passes is rather\nself-explanatory ([here](#analysis-vs-transformation-pass) is a more technical\nbreakdown). A CFG pass is simply a Transformation pass that modifies the Control\nFlow Graph. This is frequently a bit more complex and requires some extra bookkeeping,\nhence a dedicated category.\n\nIn the following table the passes are grouped thematically and ordered by the\nlevel of complexity.\n\n| Name      | Description     | Category |\n|-----------|-----------------|------|\n|[**HelloWorld**](#helloworld-your-first-pass) | visits all functions and prints their names | Analysis |\n|[**OpcodeCounter**](#opcodecounter) | prints a summary of LLVM IR opcodes in the input module | Analysis |\n|[**InjectFuncCall**](#injectfunccall) | instruments the input module by inserting calls to `printf` | Transformation |\n|[**StaticCallCounter**](#staticcallcounter) | counts direct function calls at compile-time (static analysis) | Analysis |\n|[**DynamicCallCounter**](#dynamiccallcounter) | counts direct function calls at run-time (dynamic analysis) | Transformation |\n|[**MBASub**](#mbasub) | obfuscate integer `sub` instructions | Transformation |\n|[**MBAAdd**](#mbaadd) | obfuscate 8-bit integer `add` instructions | Transformation |\n|[**FindFCmpEq**](#findfcmpeq) | finds floating-point equality comparisons | Analysis |\n|[**ConvertFCmpEq**](#convertfcmpeq) | converts direct floating-point equality comparisons to difference comparisons | Transformation |\n|[**RIV**](#riv) | finds reachable integer values for each basic block | Analysis |\n|[**DuplicateBB**](#duplicatebb) | duplicates basic blocks, requires **RIV** analysis results | CFG |\n|[**MergeBB**](#mergebb) | merges duplicated basic blocks | CFG |\n\nOnce you've [built](#building--testing) this project, you can experiment with\nevery pass separately. All passes, except for\n[**HelloWorld**](#helloworld-your-first-pass), are described in more details\nbelow.\n\nLLVM passes work with LLVM IR files. You can generate one like this:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n# Textual form\n$LLVM_DIR/bin/clang -O1 -emit-llvm input.c -S -o out.ll\n# Binary/bit-code form\n$LLVM_DIR/bin/clang -O1 -emit-llvm input.c -c -o out.bc\n```\nIt doesn't matter whether you choose the binary, `*.bc` (default), or\ntextual/LLVM assembly form (`.ll`, requires the `-S` flag). Obviously, the\nlatter is more human-readable. Similar logic applies to **opt** - by default it\ngenerates `*.bc` files. You can use `-S` to have the output written as `*.ll`\nfiles instead.\n\nNote that `clang` adds the `optnone` [function\nattribute](https://llvm.org/docs/LangRef.html#function-attributes) if either\n\n* no optimization level is specified, or\n* `-O0` is specified.\n\nIf you want to compile at `-O0`, you need to specify `-O0 -Xclang\n-disable-O0-optnone` or define a static\n[isRequired](https://llvm.org/docs/WritingAnLLVMNewPMPass.html#required-passes)\nmethod in your pass.  Alternatively, you can specify `-O1` or higher.\nOtherwise the new pass manager will register the pass but your pass will not be\nexecuted.\n\nAs noted [earlier](#llvm-plugins-as-shared-objecs), all examples in this file\nuse the `*.so` extension for pass plugins. When working on Mac OS, use\n`*.dylib` instead.\n\n## OpcodeCounter\n**OpcodeCounter** is an Analysis pass that prints a summary of the [LLVM IR\nopcodes](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/IR/Instruction.cpp#L397-L480)\nencountered in every function in the input module. This pass can be [run\nautomatically](#auto-registration-with-optimisation-pipelines) with one of the\npre-defined optimisation pipelines. However, let's use our tried and tested method\nfirst.\n\n### Run the pass\nWe will use\n[input_for_cc.c](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_cc.c)\nto test **OpcodeCounter**. Since **OpcodeCounter** is an Analysis pass, we want\n**opt** to _print_ its results. To this end, we will use a [printing\npass](#printing-passes-for-the-new-pass-manager) that corresponds to\n**OpcodeCounter**. This pass is called `print\u003copcode-counter\u003e`. No extra\narguments are needed, but it's a good idea to add `-disable-output` to prevent\n**opt** from printing the output LLVM IR module - we are only interested in the\nresults of the analysis rather than the module itself. In fact, as this pass\ndoes not modify the input IR, the output module would be identical to the\ninput anyway.\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n# Generate an LLVM file to analyze\n$LLVM_DIR/bin/clang -emit-llvm -c \u003csource_dir\u003e/inputs/input_for_cc.c -o input_for_cc.bc\n# Run the pass through opt\n$LLVM_DIR/bin/opt -load-pass-plugin \u003cbuild_dir\u003e/lib/libOpcodeCounter.so --passes=\"print\u003copcode-counter\u003e\" -disable-output input_for_cc.bc\n```\n\nFor `main`, **OpcodeCounter** prints the following summary (note that when running the pass,\na summary for other functions defined in `input_for_cc.bc` is also printed):\n\n```\n=================================================\nLLVM-TUTOR: OpcodeCounter results for `main`\n=================================================\nOPCODE               #N TIMES USED\n-------------------------------------------------\nload                 2\nbr                   4\nicmp                 1\nadd                  1\nret                  1\nalloca               2\nstore                4\ncall                 4\n-------------------------------------------------\n```\n\n### Auto-registration with optimisation pipelines\nYou can run **OpcodeCounter** by simply specifying an optimisation level (e.g.\n`-O{1|2|3|s}`). This is achieved through auto-registration with the existing\noptimisation pass pipelines. Note that you still have to specify the plugin\nfile to be loaded:\n\n```bash\n$LLVM_DIR/bin/opt -load-pass-plugin \u003cbuild_dir\u003e/lib/libOpcodeCounter.so --passes='default\u003cO1\u003e' input_for_cc.bc\n```\n\nThis is implemented in\n[OpcodeCounter.cpp](https://github.com/banach-space/llvm-tutor/blob/main/lib/OpcodeCounter.cpp),\non\n[line 106](https://github.com/banach-space/llvm-tutor/blob/main/lib/OpcodeCounter.cpp#L106-L110).\n\n## InjectFuncCall\nThis pass is a _HelloWorld_ example for _code instrumentation_. For every function\ndefined in the input module, **InjectFuncCall** will add (_inject_) the following\ncall to [`printf`](https://en.cppreference.com/w/cpp/io/c/fprintf):\n\n```C\nprintf(\"(llvm-tutor) Hello from: %s\\n(llvm-tutor)   number of arguments: %d\\n\", FuncName, FuncNumArgs)\n```\nThis call is added at the beginning of each function (i.e. before any other\ninstruction). `FuncName` is the name of the function and `FuncNumArgs` is the\nnumber of arguments that the function takes.\n\n### Run the pass\nWe will use\n[input_for_hello.c](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_hello.c)\nto test **InjectFuncCall**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n# Generate an LLVM file to analyze\n$LLVM_DIR/bin/clang -O0 -emit-llvm -c \u003csource_dir\u003e/inputs/input_for_hello.c -o input_for_hello.bc\n# Run the pass through opt\n$LLVM_DIR/bin/opt -load-pass-plugin \u003cbuild_dir\u003e/lib/libInjectFuncCall.so --passes=\"inject-func-call\" input_for_hello.bc -o instrumented.bin\n```\nThis generates `instrumented.bin`, which is the instrumented version of\n`input_for_hello.bc`. In order to verify that **InjectFuncCall** worked as\nexpected, you can either check the output file (and verify that it contains\nextra calls to `printf`) or run it:\n\n```\n$LLVM_DIR/bin/lli instrumented.bin\n(llvm-tutor) Hello from: main\n(llvm-tutor)   number of arguments: 2\n(llvm-tutor) Hello from: foo\n(llvm-tutor)   number of arguments: 1\n(llvm-tutor) Hello from: bar\n(llvm-tutor)   number of arguments: 2\n(llvm-tutor) Hello from: foo\n(llvm-tutor)   number of arguments: 1\n(llvm-tutor) Hello from: fez\n(llvm-tutor)   number of arguments: 3\n(llvm-tutor) Hello from: bar\n(llvm-tutor)   number of arguments: 2\n(llvm-tutor) Hello from: foo\n(llvm-tutor)   number of arguments: 1\n```\n\n### InjectFuncCall vs HelloWorld\nYou might have noticed that **InjectFuncCall** is somewhat similar to\n[**HelloWorld**](#helloworld-your-first-pass). In both cases the pass visits\nall functions, prints their names and the number of arguments. The difference\nbetween the two passes becomes quite apparent when you compare the output\ngenerated for the same input file, e.g. `input_for_hello.c`. The number of\ntimes `Hello from` is printed is either:\n* once per every function call in the case of **InjectFuncCall**, or\n* once per function definition in the case of **HelloWorld**.\n\nThis makes perfect sense and hints how different the two passes are. Whether to\nprint `Hello from` is determined at either:\n* run-time for **InjectFuncCall**, or\n* compile-time for **HelloWorld**.\n\nAlso, note that in the case of **InjectFuncCall** we had to first run the pass\nwith **opt** and then execute the instrumented IR module in order to see the\noutput.  For **HelloWorld** it was sufficient to run the pass with **opt**.\n\n## StaticCallCounter\nThe **StaticCallCounter** pass counts the number of _static_ function calls in\nthe input LLVM module. _Static_ refers to the fact that these function calls\nare compile-time calls (i.e. visible during the compilation). This is in\ncontrast to _dynamic_ function calls, i.e. function calls encountered at\nrun-time (when the compiled module is run). The distinction becomes apparent\nwhen analysing functions calls within loops, e.g.:\n```c\n  for (i = 0; i \u003c 10; i++)\n    foo();\n```\nAlthough at run-time `foo` will be executed 10 times, **StaticCallCounter**\nwill report only 1 function call.\n\nThis pass will only consider direct functions calls. Functions calls via\nfunction pointers are not taken into account.\n\n### Run the pass through **opt**\nWe will use\n[input_for_cc.c](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_cc.c)\nto test **StaticCallCounter**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n# Generate an LLVM file to analyze\n$LLVM_DIR/bin/clang -emit-llvm -c \u003csource_dir\u003e/inputs/input_for_cc.c -o input_for_cc.bc\n# Run the pass through opt\n$LLVM_DIR/bin/opt -load-pass-plugin \u003cbuild_dir\u003e/lib/libStaticCallCounter.so -passes=\"print\u003cstatic-cc\u003e\" -disable-output input_for_cc.bc\n```\nYou should see the following output:\n\n```\n=================================================\nLLVM-TUTOR: static analysis results\n=================================================\nNAME                 #N DIRECT CALLS\n-------------------------------------------------\nfoo                  3\nbar                  2\nfez                  1\n-------------------------------------------------\n```\n\nNote that in order to print the output, you will have to use the printing pass\nthat corresponds to **StaticCallCounter** (by passing\n`-passes=\"print\u003cstatic-cc\u003e\"` to **opt**). We discussed printing passes in more\ndetail [here](#run-the-pass).\n\n### Run the pass through `static`\nYou can run **StaticCallCounter** through a standalone tool called `static`.\n`static` is an LLVM based tool implemented in\n[StaticMain.cpp](https://github.com/banach-space/llvm-tutor/blob/main/tools/StaticMain.cpp).\nIt is a command line wrapper that allows you to run **StaticCallCounter**\nwithout the need for **opt**:\n\n```bash\n\u003cbuild_dir\u003e/bin/static input_for_cc.bc\n```\nIt is an example of a relatively basic static analysis tool. Its implementation\ndemonstrates how basic pass management in LLVM works (i.e. it handles that for\nitself instead of relying on **opt**).\n\n## DynamicCallCounter\nThe **DynamicCallCounter** pass counts the number of _run-time_ (i.e.\nencountered during the execution) function calls. It does so by inserting\ncall-counting instructions that are executed every time a function is called.\nOnly calls to functions that are _defined_ in the input module are counted.\nThis pass builds on top of ideas presented in\n[**InjectFuncCall**](#injectfunccall). You may want to experiment with that\nexample first.\n\n### Run the pass\nWe will use\n[input_for_cc.c](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_cc.c)\nto test **DynamicCallCounter**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n# Generate an LLVM file to analyze\n$LLVM_DIR/bin/clang -emit-llvm -c \u003csource_dir\u003e/inputs/input_for_cc.c -o input_for_cc.bc\n# Instrument the input file\n$LLVM_DIR/bin/opt -load-pass-plugin=\u003cbuild_dir\u003e/lib/libDynamicCallCounter.so -passes=\"dynamic-cc\" input_for_cc.bc -o instrumented_bin\n```\nThis generates `instrumented.bin`, which is the instrumented version of\n`input_for_cc.bc`. In order to verify that **DynamicCallCounter** worked as\nexpected, you can either check the output file (and verify that it contains\nnew call-counting instructions) or run it:\n\n```bash\n# Run the instrumented binary\n$LLVM_DIR/bin/lli  ./instrumented_bin\n```\nYou will see the following output:\n\n```\n=================================================\nLLVM-TUTOR: dynamic analysis results\n=================================================\nNAME                 #N DIRECT CALLS\n-------------------------------------------------\nfoo                  13\nbar                  2\nfez                  1\nmain                 1\n```\n\n### DynamicCallCounter vs StaticCallCounter\nThe number of function calls reported by **DynamicCallCounter** and\n**StaticCallCounter** are different, but both results are correct. They\ncorrespond to _run-time_ and _compile-time_ function calls respectively. Note\nalso that for **StaticCallCounter** it was sufficient to run the pass through\n**opt** to have the summary printed. For **DynamicCallCounter** we had to _run\nthe instrumented binary_ to see the output. This is similar to what we observed\nwhen comparing [HelloWorld and InjectFuncCall](#injectfunccall-vs-helloworld).\n\n## Mixed Boolean Arithmetic Transformations\nThese passes implement [mixed\nboolean arithmetic](https://tel.archives-ouvertes.fr/tel-01623849/document)\ntransformations. Similar transformation are often used in code obfuscation (you\nmay also know them from [Hacker's\nDelight](https://www.amazon.co.uk/Hackers-Delight-Henry-S-Warren/dp/0201914654))\nand are a great illustration of what and how LLVM passes can be used for.\n\nSimilar transformations are possible at the source-code level. The relevant\nClang plugins are available in\n[**clang-tutor**](https://github.com/banach-space/clang-tutor#obfuscator).\n\n### MBASub\nThe **MBASub** pass implements this rather basic expression:\n\n```\na - b == (a + ~b) + 1\n```\nBasically, it replaces all instances of integer `sub` according to the above\nformula. The corresponding LIT tests verify that both the formula  and that the\nimplementation are correct.\n\n#### Run the pass\nWe will use\n[input_for_mba_sub.c](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_mba_sub.c)\nto test **MBASub**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -emit-llvm -S \u003csource_dir\u003e/inputs/input_for_mba_sub.c -o input_for_sub.ll\n$LLVM_DIR/bin/opt -load-pass-plugin=\u003cbuild_dir\u003e/lib/libMBASub.so -passes=\"mba-sub\" -S input_for_sub.ll -o out.ll\n```\n\n### MBAAdd\nThe **MBAAdd** pass implements a slightly more involved formula that is only\nvalid for 8 bit integers:\n\n```\na + b == (((a ^ b) + 2 * (a \u0026 b)) * 39 + 23) * 151 + 111\n```\nSimilarly to `MBASub`, it replaces all instances of integer `add` according to\nthe above identity, but only for 8-bit integers. The LIT tests verify that both\nthe formula and the implementation are correct.\n\n#### Run the pass\nWe will use\n[input_for_add.c](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_mba.c)\nto test **MBAAdd**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -O1 -emit-llvm -S \u003csource_dir\u003e/inputs/input_for_mba.c -o input_for_mba.ll\n$LLVM_DIR/bin/opt -load-pass-plugin=\u003cbuild_dir\u003e/lib/libMBAAdd.so -passes=\"mba-add\" -S input_for_mba.ll -o out.ll\n```\n\n## RIV\n**RIV** is an analysis pass that for each [basic\nblock](http://llvm.org/docs/ProgrammersManual.html#the-basicblock-class) BB in\nthe input function computes the set reachable integer values, i.e. the integer\nvalues that are visible (i.e. can be used) in BB. Since the pass operates on\nthe LLVM IR representation of the input file, it takes into account all values\nthat have [integer type](https://llvm.org/docs/LangRef.html#integer-type) in\nthe [LLVM IR](https://llvm.org/docs/LangRef.html) sense. In particular, since\nat the LLVM IR level booleans are represented as 1-bit wide integers (i.e.\n`i1`), you will notice that booleans are also included in the result.\n\nThis pass demonstrates how to request results from other analysis passes in\nLLVM. In particular, it relies on the [Dominator\nTree](https://en.wikipedia.org/wiki/Dominator_(graph_theory)) analysis pass\nfrom LLVM, which is used to obtain the dominance tree for the basic blocks\nin the input function.\n\n### Run the pass\nWe will use\n[input_for_riv.c](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_riv.c)\nto test **RIV**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n# Generate an LLVM file to analyze\n$LLVM_DIR/bin/clang -emit-llvm -S -O1 \u003csource_dir\u003e/inputs/input_for_riv.c -o input_for_riv.ll\n# Run the pass through opt\n$LLVM_DIR/bin/opt -load-pass-plugin \u003cbuild_dir\u003e/lib/libRIV.so -passes=\"print\u003criv\u003e\" -disable-output input_for_riv.ll\n```\nYou will see the following output:\n\n```\n=================================================\nLLVM-TUTOR: RIV analysis results\n=================================================\nBB id      Reachable Integer Values\n-------------------------------------------------\nBB %entry\n             i32 %a\n             i32 %b\n             i32 %c\nBB %if.then\n               %add = add nsw i32 %a, 123\n               %cmp = icmp sgt i32 %a, 0\n             i32 %a\n             i32 %b\n             i32 %c\nBB %if.end8\n               %add = add nsw i32 %a, 123\n               %cmp = icmp sgt i32 %a, 0\n             i32 %a\n             i32 %b\n             i32 %c\nBB %if.then2\n               %mul = mul nsw i32 %b, %a\n               %div = sdiv i32 %b, %c\n               %cmp1 = icmp eq i32 %mul, %div\n               %add = add nsw i32 %a, 123\n               %cmp = icmp sgt i32 %a, 0\n             i32 %a\n             i32 %b\n             i32 %c\nBB %if.else\n               %mul = mul nsw i32 %b, %a\n               %div = sdiv i32 %b, %c\n               %cmp1 = icmp eq i32 %mul, %div\n               %add = add nsw i32 %a, 123\n               %cmp = icmp sgt i32 %a, 0\n             i32 %a\n             i32 %b\n             i32 %c\n```\n\nNote that in order to print the output, you will have to use the printing pass\nthat corresponds to **RIV** (by passing `-passes=\"print\u003criv\u003e\"` to **opt**). We\ndiscussed printing passes in more detail [here](#run-the-pass).\n\n## DuplicateBB\nThis pass will duplicate all basic blocks in a module, with the exception of\nbasic blocks for which there are no reachable integer values (identified through\nthe **RIV** pass). An example of such a basic block is the entry block in a\nfunction that:\n* takes no arguments and\n* is embedded in a module that defines no global values.\n\nBasic blocks are duplicated by first inserting an `if-then-else` construct and\nthen cloning all the instructions from the original basic block (with the\nexception of [PHI\nnodes](https://en.wikipedia.org/wiki/Static_single_assignment_form)) into two\nnew basic blocks (clones of the original basic block). The `if-then-else`\nconstruct is introduced as a non-trivial mechanism that decides which of the\ncloned basic blocks to branch to. This condition is equivalent to:\n\n```cpp\nif (var == 0)\n  goto clone 1\nelse\n  goto clone 2\n```\nin which:\n* `var` is a randomly picked variable from the `RIV` set for the current basic\n  block\n* `clone 1` and `clone 2` are labels for the cloned basic blocks.\n\nThe complete transformation looks like this:\n\n```c\nBEFORE:                     AFTER:\n-------                     ------\n                              [ if-then-else ]\n             DuplicateBB           /  \\\n[ BB ]      ------------\u003e   [clone 1] [clone 2]\n                                   \\  /\n                                 [ tail ]\n\nLEGEND:\n-------\n[BB]           - the original basic block\n[if-then-else] - a new basic block that contains the if-then-else statement (inserted by DuplicateBB)\n[clone 1|2]    - two new basic blocks that are clones of BB (inserted by DuplicateBB)\n[tail]         - the new basic block that merges [clone 1] and [clone 2] (inserted by DuplicateBB)\n```\nAs depicted above, **DuplicateBB** replaces qualifying basic blocks with 4 new\nbasic blocks. This is implemented through LLVM's\n[SplitBlockAndInsertIfThenElse](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h#L471).\n**DuplicateBB** does all the necessary preparation and clean-up. In other\nwords, it's an elaborate wrapper for LLVM's `SplitBlockAndInsertIfThenElse`.\n\n### Run the pass\nThis pass depends on the **RIV** pass, which also needs be loaded in order for\n**DuplicateBB** to work. Let's use\n[input_for_duplicate_bb.c](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_duplicate_bb.c)\nas our sample input. First, generate the LLVM file:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -emit-llvm -S -O1 \u003csource_dir\u003e/inputs/input_for_duplicate_bb.c -o input_for_duplicate_bb.ll\n```\n\nFunction `foo` in `input_for_duplicate_bb.ll` should look like this (all metadata has been stripped):\n\n```llvm\ndefine i32 @foo(i32) {\n  ret i32 1\n}\n```\nNote that there's only one basic block (the _entry_ block) and that `foo` takes\none argument (this means that the result from **RIV** will be a non-empty set).\nWe will now apply **DuplicateBB** to `foo`:\n\n```bash\n$LLVM_DIR/bin/opt -load-pass-plugin \u003cbuild_dir\u003e/lib/libRIV.so -load-pass-plugin \u003cbuild_dir\u003e/lib/libDuplicateBB.so -passes=duplicate-bb -S input_for_duplicate_bb.ll -o duplicate.ll\n```\nAfter the instrumentation `foo` will look like this (all metadata has been stripped):\n\n```llvm\ndefine i32 @foo(i32) {\nlt-if-then-else-0:\n  %2 = icmp eq i32 %0, 0\n  br i1 %2, label %lt-if-then-0, label %lt-else-0\n\nclone-1-0:\n  br label %lt-tail-0\n\nclone-2-0:\n  br label %lt-tail-0\n\nlt-tail-0:\n  ret i32 1\n}\n```\nThere are four basic blocks instead of one. All new basic blocks end with a\nnumeric id of the original basic block (`0` in this case). `lt-if-then-else-0`\ncontains the new `if-then-else` condition. `clone-1-0` and `clone-2-0` are\nclones of the original basic block in `foo`. `lt-tail-0` is the extra basic\nblock that's required to merge `clone-1-0` and `clone-2-0`.\n\n## MergeBB\n**MergeBB** will merge qualifying basic blocks that are identical. To some\nextent, this pass reverts the transformations introduced by **DuplicateBB**.\nThis is illustrated below:\n\n```c\nBEFORE:                     AFTER DuplicateBB:                 AFTER MergeBB:\n-------                     ------------------                 --------------\n                              [ if-then-else ]                 [ if-then-else* ]\n             DuplicateBB           /  \\               MergeBB         |\n[ BB ]      ------------\u003e   [clone 1] [clone 2]      --------\u003e    [ clone ]\n                                   \\  /                               |\n                                 [ tail ]                         [ tail* ]\n\nLEGEND:\n-------\n[BB]           - the original basic block\n[if-then-else] - a new basic block that contains the if-then-else statement (**DuplicateBB**)\n[clone 1|2]    - two new basic blocks that are clones of BB (**DuplicateBB**)\n[tail]         - the new basic block that merges [clone 1] and [clone 2] (**DuplicateBB**)\n[clone]        - [clone 1] and [clone 2] after merging, this block should be very similar to [BB] (**MergeBB**)\n[label*]       - [label] after being updated by **MergeBB**\n```\nRecall that **DuplicateBB** replaces all qualifying basic block with four new\nbasic blocks, two of which are clones of the original block.  **MergeBB** will\nmerge those two clones back together, but it will not remove the remaining two\nblocks added by **DuplicateBB** (it will update them though).\n\n### Run the pass\nLet's use the following IR implementation of `foo` as input. Note that basic\nblocks 3 and 5 are identical and can safely be merged:\n\n```llvm\ndefine i32 @foo(i32) {\n  %2 = icmp eq i32 %0, 19\n  br i1 %2, label %3, label %5\n\n; \u003clabel\u003e:3:\n  %4 = add i32 %0,  13\n  br label %7\n\n; \u003clabel\u003e:5:\n  %6 = add i32 %0,  13\n  br label %7\n\n; \u003clabel\u003e:7:\n  %8 = phi i32 [ %4, %3 ], [ %6, %5 ]\n  ret i32 %8\n}\n```\nWe will now apply **MergeBB** to `foo`:\n\n```bash\n$LLVM_DIR/bin/opt -load \u003cbuild_dir\u003e/lib/libMergeBB.so -legacy-merge-bb -S foo.ll -o merge.ll\n```\nAfter the instrumentation `foo` will look like this (all metadata has been stripped):\n```llvm\ndefine i32 @foo(i32) {\n  %2 = icmp eq i32 %0, 19\n  br i1 %2, label %3, label %3\n\n3:\n  %4 = add i32 %0, 13\n  br label %5\n\n5:\n  ret i32 %4\n}\n```\nAs you can see, basic blocks 3 and 5 from the input module have been merged\ninto one basic block.\n\n\n### Run MergeBB on the output from DuplicateBB\nIt is really interesting to see the effect of **MergeBB** on the output from\n**DuplicateBB**. Let's start with the same input as we used for **DuplicateBB**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -emit-llvm -S -O1 \u003csource_dir\u003e/inputs/input_for_duplicate_bb.c -o input_for_duplicate_bb.ll\n```\n\nNow we will apply **DuplicateBB** _and_ **MergeBB** (in this order) to `foo`.\nRecall that **DuplicateBB** requires **RIV**, which means that in total we have\nto load three plugins:\n\n```bash\n$LLVM_DIR/bin/opt -load-pass-plugin \u003cbuild_dir\u003e/lib/libRIV.so -load-pass-plugin \u003cbuild_dir\u003e/lib/libMergeBB.so -load-pass-plugin \u003cbuild-dir\u003e/lib/libDuplicateBB.so -passes=duplicate-bb,merge-bb -S input_for_duplicate_bb.ll -o merge_after_duplicate.ll\n```\nAnd here's the output:\n\n```llvm\ndefine i32 @foo(i32) {\nlt-if-then-else-0:\n  %1 = icmp eq i32 %0, 0\n  br i1 %1, label %lt-clone-2-0, label %lt-clone-2-0\n\nlt-clone-2-0:\n  br label %lt-tail-0\n\nlt-tail-0:\n  ret i32 1\n}\n```\nCompare this with the [output generated by **DuplicateBB**](#run-the-pass-7).\nOnly one of the clones, `lt-clone-2-0`, has been  preserved, and\n`lt-if-then-else-0` has been updated accordingly. Regardless of the value of of\nthe `if` condition (more precisely, variable `%1`), the control flow jumps to\n`lt-clone-2-0`.\n\n## FindFCmpEq\nThe **FindFCmpEq** pass finds all floating-point comparison operations that \ndirectly check for equality between two values. This is important because these\nsorts of comparisons can sometimes be indicators of logical issues due to \n[rounding errors](https://en.wikipedia.org/wiki/Machine_epsilon) inherent in \nfloating-point arithmetic.\n\n**FindFCmpEq** is implemented as two passes: an analysis pass (`FindFCmpEq`) and a \nprinting pass (`FindFCmpEqPrinter`). The legacy implementation (`FindFCmpEqWrapper`) \nmakes use of both of these passes.\n\n### Run the pass\nWe will use [input_for_fcmp_eq.ll](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_fcmp_eq.c)\nto test **FindFCmpEq**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n# Generate the input file\n$LLVM_DIR/bin/clang -emit-llvm -S -Xclang -disable-O0-optnone -c \u003csource_dir\u003e/inputs/input_for_fcmp_eq.c -o input_for_fcmp_eq.ll\n# Run the pass\n$LLVM_DIR/bin/opt --load-pass-plugin \u003cbuild_dir\u003e/lib/libFindFCmpEq.so --passes=\"print\u003cfind-fcmp-eq\u003e\" -disable-output input_for_fcmp_eq.ll\n```\n\nYou should see the following output which lists the direct floating-point equality comparison instructions found:\n\n```llvm\nFloating-point equality comparisons in \"sqrt_impl\":\n  %11 = fcmp oeq double %9, %10\nFloating-point equality comparisons in \"main\":\n  %9 = fcmp oeq double %8, 1.000000e+00\n  %13 = fcmp oeq double %11, %12\n  %19 = fcmp oeq double %17, %18\n```\n\n## ConvertFCmpEq\nThe **ConvertFCmpEq** pass is a transformation that uses the analysis results\nof [**FindFCmpEq**](#FindFCmpEq) to convert direct floating-point equality\ncomparison instructions into logically equivalent ones that use a\npre-calculated rounding threshold.\n\n### Run the pass\nAs with [**FindFCmpEq**](#FindFCmpEq), we will use\n[input_for_fcmp_eq.ll](https://github.com/banach-space/llvm-tutor/blob/main/inputs/input_for_fcmp_eq.c)\nto test **ConvertFCmpEq**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -emit-llvm -S -Xclang -disable-O0-optnone \\\n  -c \u003csource_dir\u003e/inputs/input_for_fcmp_eq.c -o input_for_fcmp_eq.ll\n$LLVM_DIR/bin/opt --load-pass-plugin \u003cbuild_dir\u003e/lib/libFindFCmpEq.so \\\n  --load-pass-plugin \u003cbuild_dir\u003e/lib/libConvertFCmpEq.so \\\n  --passes=convert-fcmp-eq -S input_for_fcmp_eq.ll -o fcmp_eq_after_conversion.ll\n```\n\nFor the legacy implementation, the `opt` command would be changed to the following:\n\n```bash\n$LLVM_DIR/bin/opt -load \u003cbuild_dir\u003e/lib/libFindFCmpEq.so \\\n  \u003cbuild_dir\u003e/lib/libConvertFCmpEq.so -convert-fcmp-eq \\\n  -S input_for_fcmp_eq.ll -o fcmp_eq_after_conversion.ll\n```\n\nNotice that both `libFindFCmpEq.so` _and_ `libConvertFCmpEq.so` must be loaded\n-- and the load order matters. Since **ConvertFCmpEq** requires\n[**FindFCmpEq**](#FindFCmpEq), its library must be loaded before\n**ConvertFCmpEq**. If both passes were built as part of the same library, this\nwould not be required.\n\nAfter transformation, both `fcmp oeq` instructions will have been converted to\ndifference based `fcmp olt` instructions using the IEEE 754 double-precision\nmachine epsilon constant as the round-off threshold:\n\n```llvm\n  %cmp = fcmp oeq double %0, %1\n```\n\n... has now become\n\n```llvm\n  %3 = fsub double %0, %1\n  %4 = bitcast double %3 to i64\n  %5 = and i64 %4, 9223372036854775807\n  %6 = bitcast i64 %5 to double\n  %cmp = fcmp olt double %6, 0x3CB0000000000000\n```\n\nThe values are subtracted from each other and the absolute value of their\ndifference is calculated. If this absolute difference is less than the value of\nthe machine epsilon, the original two floating-point values are considered to\nbe equal.\n\nDebugging\n==========\nBefore running a debugger, you may want to analyze the output from\n[LLVM_DEBUG](http://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option)\nand\n[STATISTIC](http://llvm.org/docs/ProgrammersManual.html#the-statistic-class-stats-option)\nmacros. For example, for **MBAAdd**:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -emit-llvm -S -O1 \u003csource_dir\u003e/inputs/input_for_mba.c -o input_for_mba.ll\n$LLVM_DIR/bin/opt -S -load-pass-plugin \u003cbuild_dir\u003e/lib/libMBAAdd.so -passes=mba-add input_for_mba.ll -debug-only=mba-add -stats -o out.ll\n```\nNote the `-debug-only=mba-add` and `-stats` flags in the command line - that's\nwhat enables the following output:\n\n```bash\n  %12 = add i8 %1, %0 -\u003e   \u003cbadref\u003e = add i8 111, %11\n  %20 = add i8 %12, %2 -\u003e   \u003cbadref\u003e = add i8 111, %19\n  %28 = add i8 %20, %3 -\u003e   \u003cbadref\u003e = add i8 111, %27\n===-------------------------------------------------------------------------===\n                          ... Statistics Collected ...\n===-------------------------------------------------------------------------===\n\n3 mba-add - The # of substituted instructions\n```\nAs you can see, you get a nice summary from **MBAAdd**. In many cases this will\nbe sufficient to understand what might be going wrong. Note that for these\nmacros to work you need a debug build of LLVM (i.e. **opt**) and **llvm-tutor**\n(i.e. use `-DCMAKE_BUILD_TYPE=Debug` instead of `-DCMAKE_BUILD_TYPE=Release`).\n\nFor tricker issues just use a debugger. Below I demonstrate how to debug\n[**MBAAdd**](#mbaadd). More specifically, how to set up a breakpoint on entry\nto `MBAAdd::run`. Hopefully that will be sufficient for you to start.\n\n## Mac OS X\nThe default debugger on OS X is [LLDB](http://lldb.llvm.org). You will\nnormally use it like this:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -emit-llvm -S -O1 \u003csource_dir\u003e/inputs/input_for_mba.c -o input_for_mba.ll\nlldb -- $LLVM_DIR/bin/opt -S -load-pass-plugin \u003cbuild_dir\u003e/lib/libMBAAdd.dylib -passes=mba-add input_for_mba.ll -o out.ll\n(lldb) breakpoint set --name MBAAdd::run\n(lldb) process launch\n```\nor, equivalently, by using LLDBs aliases:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -emit-llvm -S -O1 \u003csource_dir\u003e/inputs/input_for_mba.c -o input_for_mba.ll\nlldb -- $LLVM_DIR/bin/opt -S -load-pass-plugin \u003cbuild_dir\u003e/lib/libMBAAdd.dylib -passes=mba-add input_for_mba.ll -o out.ll\n(lldb) b MBAAdd::run\n(lldb) r\n```\nAt this point, LLDB should break at the entry to `MBAAdd::run`.\n\n## Ubuntu\nOn most Linux systems, [GDB](https://www.gnu.org/software/gdb/) is the most\npopular debugger. A typical session will look like this:\n\n```bash\nexport LLVM_DIR=\u003cinstallation/dir/of/llvm/19\u003e\n$LLVM_DIR/bin/clang -emit-llvm -S -O1 \u003csource_dir\u003e/inputs/input_for_mba.c -o input_for_mba.ll\ngdb --args $LLVM_DIR/bin/opt -S -load-pass-plugin \u003cbuild_dir\u003e/lib/libMBAAdd.so -passes=mba-add input_for_mba.ll -o out.ll\n(gdb) b MBAAdd.cpp:MBAAdd::run\n(gdb) r\n```\nAt this point, GDB should break at the entry to `MBAAdd::run`.\n\nAnalysis vs Transformation Pass\n===============================\nThe implementation of a pass depends on whether it is an Analysis or a\nTransformation pass:\n\n* a transformation pass will normally inherit from [PassInfoMixin](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/include/llvm/IR/PassManager.h#L371),\n* an analysis pass will inherit from [AnalysisInfoMixin](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/include/llvm/IR/PassManager.h#L394).\n\nThis is one of the key characteristics of the New Pass Managers - it makes the\nsplit into Analysis and Transformation passes very explicit. An Analysis pass\nrequires a bit more bookkeeping and hence a bit more code.  For example, you\nneed to add an instance of\n[AnalysisKey](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/include/llvm/IR/PassManager.h#L410)\nso that it can be identified by the New Pass Manager.\n\nNote that for small standalone examples, the difference between Analysis and\nTransformation passes becomes less relevant.\n[**HelloWorld**](#helloworld-your-first-pass) is a good example. It does not\ntransform the input module, so in practice it is an Analysis pass. However, in\norder to keep the implementation as simple as possible, I used the API for\nTransformation passes.\n\nWithin **llvm-tutor** the following passes can be used as reference Analysis\nand Transformation examples:\n\n* [**OpcodeCounter**](#opcodecounter) - analysis pass\n* [**MBASub**](#mbasub) - transformation pass\n\nOther examples also adhere to LLVM's convention, but may contain other\ncomplexities. However, only in the case of\n[**HelloWorld**](#helloworld-your-first-pass) simplicity was favoured over\nstrictness (i.e. it is neither a transformation nor analysis pass).\n\n### Printing passes for the new pass manager\nA printing pass for an Analysis pass is basically a Transformation pass that:\n\n* requests the results of the analysis from the original pass, and\n* prints these results.\n\nIn other words, it's just a wrapper pass. There's a convention to register such\npasses under the `print\u003canalysis-pass-name\u003e` command line option.\n\nDynamic vs Static Plugins\n=========================\nBy default, all examples in **llvm-tutor** are built as\n[dynamic plugins](#llvm-plugins-as-shared-objecs). However, LLVM provides\ninfrastructure for both _dynamic_ and _static_ plugins\n([documentation](https://llvm.org/docs/WritingAnLLVMPass.html#building-pass-plugins)).\nStatic plugins are simply libraries linked into your executable (e.g. **opt**)\nstatically. This way, unlike dynamic plugins, they don't require to be loaded at\nruntime with `-load-pass-plugin`.\n\nStatic plugins are normally developed in-tree, i.e. within `llvm-project/llvm`,\nand all examples in **llvm-tutor** can be adapted to work this way. You can use\n[static_registation.sh](https://github.com/banach-space/llvm-tutor/blob/main/utils/static_registration.sh)\nto see it can be done for [**MBASub**](#mbasub). This script will:\n\n* copy the required source and test files into `llvm-project/llvm`\n* adapt in-tree CMake scripts so that the in-tree version of **MBASub** is actually built\n* remove `-load` and `-load-pass-plugin` from the in-tree tests for **MBASub**\n\nNote that this script will modify `llvm-project/llvm`, but leave **llvm-tutor**\nintact. After running the script you will have to re-build **opt**. Two\nadditional CMake flags have to be set: `LLVM_BUILD_EXAMPLES` and\n`LLVM_MBASUB_LINK_INTO_TOOLS`:\n\n```bash\n# LLVM_TUTOR_DIR: directory in which you cloned llvm-tutor\ncd $LLVM_TUTOR_DIR\n# LLVM_PROJECT_DIR: directory in which you cloned llvm-project\nbash utils/static_registration.sh --llvm_project_dir $LLVM_PROJECT_DIR\n# LLVM_BUILD_DIR: directory in which you previously built opt\ncd $LLVM_BUILD_DIR\ncmake -DLLVM_BUILD_EXAMPLES=On -DLLVM_MBASUB_LINK_INTO_TOOLS=On .\ncmake --build . --target opt\n```\n\nOnce **opt** is re-built, **MBASub** will be statically linked\ninto **opt**. Now you can run it like this:\n\n```bash\n$LLVM_BUILD_DIR/bin/opt --passes=mba-sub -S $LLVM_TUTOR_DIR/test/MBA_sub.ll\n```\n\nNote that this time we didn't have to use `-load-pass-plugin` to load\n**MBASub**. If you want to dive deeper into the required steps for static\nregistration, you can scan `static_registation.sh` or run:\n\n```bash\ncd $LLVM_PROJECT_DIR\ngit diff\ngit status\n```\n\nThis will print all the changes within `llvm-project/llvm` introduced by the\nscript.\n\nOptimisation Passes Inside LLVM\n=================================\nApart from writing your own transformations an analyses, you may want to\nfamiliarize yourself with [the passes available within\nLLVM](https://llvm.org/docs/Passes.html). It is a great resource for learning\nhow LLVM works and what makes it so powerful and successful. It is also a great\nresource for discovering how compilers work in general. Indeed, many of the\npasses implement general concepts known from the theory of compiler development.\n\nThe list of the available passes in LLVM can be a bit daunting. Below is a list\nof the selected few that are a good starting point. Each entry contains a link\nto the implementation in LLVM, a short description and a link to test files\navailable within **llvm-tutor**. These test files contain a collection of\nannotated test cases for the corresponding pass. The goal of these tests is to\ndemonstrate the functionality of the tested pass through relatively simple\nexamples.\n\n| Name      | Description     | Test files in llvm-tutor |\n|-----------|-----------------|--------------------------|\n|[**dce**](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/Transforms/Scalar/DCE.cpp) | Dead Code Elimination | [dce.ll](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/dce.ll) |\n|[**memcpyopt**](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp) | Optimise calls to `memcpy` (e.g. replace them with `memset`) | [memcpyopt.ll](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/memcpyopt.ll) |\n|[**reassociate**](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/Transforms/Scalar/Reassociate.cpp) | Reassociate (e.g. 4 + (x + 5) -\u003e x + (4 + 5)). This enables further optimisations, e.g. LICM. | [reassociate.ll](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/reassociate.ll) |\n|[**always-inline**](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/Transforms/IPO/AlwaysInliner.cpp) | Always inlines functions decorated with [`alwaysinline`](https://llvm.org/docs/LangRef.html#function-attributes) | [always-inline.ll](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/always-inline.ll) |\n|[**loop-deletion**](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/Transforms/Scalar/LoopDeletion.cpp) | Delete unused loops | [loop-deletion.ll](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/loop-deletion.ll) |\n|[**licm**](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/Transforms/Scalar/LICM.cpp) | [Loop-Invariant Code Motion](https://en.wikipedia.org/wiki/Loop-invariant_code_motion) (a.k.a. LICM) | [licm.ll](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/licm.ll) |\n|[**slp**](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp) | [Superword-level parallelism vectorisation](https://llvm.org/docs/Vectorizers.html#the-slp-vectorizer) | [slp\\_x86.ll](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/slp_x86.ll), [slp\\_aarch64.ll](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/slp_aarch64.ll)  |\n\nThis list focuses on [LLVM's transform\npasses](https://llvm.org/docs/Passes.html#transform-passes) that are relatively\neasy to demonstrate through small, standalone examples. You can ran an\nindividual test like this:\n\n```bash\nlit \u003csource/dir/llvm/tutor\u003e/test/llvm/always-inline.ll\n```\n\nTo run an individual pass, extract one [RUN line](https://github.com/banach-space/llvm-tutor/blob/main/test/llvm/always-inline.ll#L2)\nfrom the test file and run it:\n\n```bash\n$LLVM_DIR/bin/opt -inline-threshold=0 -passes=always-inline -S \u003csource/dir/llvm/tutor\u003e/test/llvm/always-inline.ll\n```\n\nReferences\n===========\nBelow is a list of LLVM resources available outside the official online\ndocumentation that I have found very helpful. Where possible, the items are sorted by\ndate.\n\n* **LLVM IR**\n  *  _”LLVM IR Tutorial-Phis,GEPs and other things, ohmy!”_, V.Bridgers, F.\nPiovezan, EuroLLVM, ([slides](https://llvm.org/devmtg/2019-04/slides/Tutorial-Bridgers-LLVM_IR_tutorial.pdf),\n  [video](https://www.youtube.com/watch?v=m8G_S5LwlTo\u0026feature=youtu.be))\n  * _\"Mapping High Level Constructs to LLVM IR\"_, M. Rodler ([link](https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/))\n* **Examples in LLVM**\n  * Control Flow Graph simplifications:\n    [llvm/examples/IRTransforms/](https://github.com/llvm/llvm-project/tree/release/19.x/llvm/examples/IRTransforms)\n  * Hello World Pass:\n    [llvm/lib/Transforms/Hello/](https://github.com/llvm/llvm-project/blob/release/19.x/llvm/lib/Transforms/Hello)\n  * Good Bye World Pass:\n    [llvm/examples/Bye/](https://github.com/llvm/llvm-project/tree/release/19.x/llvm/examples/Bye)\n* **LLVM Pass Development**\n  * _\"Writing an LLVM Optimization\"_,  Jonathan Smith [video](https://www.youtube.com/watch?v=MagR2KY8MQI\u0026t)\n  * _\"Getting Started With LLVM: Basics \"_, J. Paquette, F. Hahn, LLVM Dev Meeting 2019 [video](https://www.youtube.com/watch?v=3QQuhL-dSys\u0026t=826s)\n  * _\"Writing an LLVM Pass: 101\"_, A. Warzyński, LLVM Dev Meeting 2019 [video](https://www.youtube.com/watch?v=ar7cJl2aBuU)\n  * _\"Writing LLVM Pass in 2018\"_, Min-Yih Hsu [blog](https://medium.com/@mshockwave/writing-llvm-pass-in-2018-preface-6b90fa67ae82)\n  * _\"Building, Testing and Debugging a Simple out-of-tree LLVM Pass\"_ Serge Guelton, Adrien Guinet, LLVM Dev Meeting 2015 ([slides](https://llvm.org/devmtg/2015-10/slides/GueltonGuinet-BuildingTestingDebuggingASimpleOutOfTreePass.pdf), [video](https://www.youtube.com/watch?v=BnlG-owSVTk\u0026index=8\u0026list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21))\n* **LLVM Based Tools Development**\n  * _\"Introduction to LLVM\"_, M. Shah, Fosdem 2018, [link](http://www.mshah.io/fosdem18.html)\n  * _\"Building an LLVM-based tool. Lessons learned\"_, A. Denisov, [blog](https://lowlevelbits.org/building-an-llvm-based-tool.-lessons-learned/), [video](https://www.youtube.com/watch?reload=9\u0026v=Yvj4G9B6pcU)\n\nCredits\n========\nThis is first and foremost a community effort. This project wouldn't be\npossible without the amazing LLVM [online\ndocumentation](http://llvm.org/docs/), the plethora of great comments in the\nsource code, and the llvm-dev mailing list. Thank you!\n\nIt goes without saying that there's plenty of great presentations on YouTube,\nblog posts and GitHub projects that cover similar subjects. I've learnt a great\ndeal from them - thank you all for sharing! There's one presentation/tutorial\nthat has been particularly important in my journey as an aspiring LLVM\ndeveloper and that helped to _democratise_ out-of-source pass development:\n\n* \"Building, Testing and Debugging a Simple out-of-tree LLVM Pass\" Serge\n  Guelton, Adrien Guinet\n  ([slides](https://llvm.org/devmtg/2015-10/slides/GueltonGuinet-BuildingTestingDebuggingASimpleOutOfTreePass.pdf),\n  [video](https://www.youtube.com/watch?v=BnlG-owSVTk\u0026index=8\u0026list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21))\n\nAdrien and Serge came up with some great, illustrative and self-contained\nexamples that are great for learning and tutoring LLVM pass development. You'll\nnotice that there are similar transformation and analysis passes available in\nthis project. The implementations available here reflect what **I** found most\nchallenging while studying them.\n\nLicense\n========\nThe MIT License (MIT)\n\nCopyright (c) 2019 Andrzej Warzyński\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of\nthis software and associated documentation files (the \"Software\"), to deal in\nthe Software without restriction, including without limitation the rights to\nuse, copy, modify, merge, publish, distribute, sublicense, and/or sell copies\nof the Software, and to permit persons to whom the Software is furnished to do\nso, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbanach-space%2Fllvm-tutor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbanach-space%2Fllvm-tutor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbanach-space%2Fllvm-tutor/lists"}