{"id":18397005,"url":"https://github.com/iith-compilers/ir2vec","last_synced_at":"2025-04-12T15:43:43.120Z","repository":{"id":39710403,"uuid":"333716228","full_name":"IITH-Compilers/IR2Vec","owner":"IITH-Compilers","description":"Implementation of IR2Vec, LLVM IR Based Scalable Program Embeddings","archived":false,"fork":false,"pushed_at":"2025-03-26T15:01:16.000Z","size":457550,"stargazers_count":92,"open_issues_count":8,"forks_count":43,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-03T20:09:23.317Z","etag":null,"topics":["embeddings","llvm"],"latest_commit_sha":null,"homepage":"","language":"LLVM","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IITH-Compilers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-28T09:58:11.000Z","updated_at":"2025-03-19T12:25:18.000Z","dependencies_parsed_at":"2023-02-09T13:15:52.763Z","dependency_job_id":"29406435-3446-4574-8cac-426a8040738f","html_url":"https://github.com/IITH-Compilers/IR2Vec","commit_stats":{"total_commits":400,"total_committers":24,"mean_commits":"16.666666666666668","dds":0.795,"last_synced_commit":"b35627cf3571d84b233a313a072947b4ed7d33ab"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IITH-Compilers%2FIR2Vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IITH-Compilers%2FIR2Vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IITH-Compilers%2FIR2Vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IITH-Compilers%2FIR2Vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IITH-Compilers","download_url":"https://codeload.github.com/IITH-Compilers/IR2Vec/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248590958,"owners_count":21129921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embeddings","llvm"],"created_at":"2024-11-06T02:15:25.320Z","updated_at":"2025-04-12T15:43:43.081Z","avatar_url":"https://github.com/IITH-Compilers.png","language":"LLVM","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IR2Vec\n`IR2Vec` is a LLVM IR based framework to generate distributed representations for the source code in an unsupervised manner, which can be used to represent programs as input to solve machine learning tasks that take programs as inputs.\n\nThis repo contains the source code and relevant information described in the [paper](https://doi.org/10.1145/3418463) ([arXiv](https://arxiv.org/abs/1909.06228)).\nPlease see [here](https://compilers.cse.iith.ac.in/projects/ir2vec/) for more details.\n\n\u003e IR2Vec: LLVM IR Based Scalable Program Embeddings, S. VenkataKeerthy, Rohit Aggarwal, Shalini Jain, Maunendra Sankar Desarkar, Ramakrishna Upadrasta, and Y. N. Srikant\n\n[![LLVM](https://img.shields.io/badge/LLVM-v20.1.0-blue)](https://github.com/llvm/llvm-project/releases/tag/llvmorg-20.1.0)\n[![PyPI Version](https://img.shields.io/pypi/v/IR2Vec)](https://pypi.org/project/IR2Vec/)\n![Tests](https://github.com/IITH-Compilers/IR2Vec/workflows/Tests/badge.svg)\n![Publish](https://github.com/IITH-Compilers/IR2Vec/workflows/Publish/badge.svg)\n![pre-commit checks](https://github.com/IITH-Compilers/IR2Vec/workflows/pre-commit%20checks/badge.svg)\n\n![Image](images/ir2vec.jpg)\n\n## LLVM Version Archive\n\n| LLVM Version | Branch |\n| ------------ | ------ |\n| LLVM 20.1.0 | [main](https://github.com/IITH-Compilers/IR2Vec) |\n| LLVM 19.1.7 | [llvm19](https://github.com/IITH-Compilers/IR2Vec/tree/llvm19) |\n| LLVM 18.1.8 | [llvm18](https://github.com/IITH-Compilers/IR2Vec/tree/llvm18) |\n| LLVM 17.0.6 | [llvm17](https://github.com/IITH-Compilers/IR2Vec/tree/llvm17) |\n| LLVM 16.0.1 | [llvm16](https://github.com/IITH-Compilers/IR2Vec/tree/llvm16) |\n| LLVM 14.0.1 | [llvm14](https://github.com/IITH-Compilers/IR2Vec/tree/llvm14) |\n| LLVM 12.0.0 | [llvm12](https://github.com/IITH-Compilers/IR2Vec/tree/llvm12) |\n| LLVM 10.0.1 | [llvm10](https://github.com/IITH-Compilers/IR2Vec/tree/llvm10) |\n| LLVM 8.0.1 | [llvm8](https://github.com/IITH-Compilers/IR2Vec/tree/llvm8) |\n\n## Table Of Contents\n- [IR2Vec](#ir2vec)\n    - [LLVM Version Archive](#llvm-version-archive)\n    - [Table Of Contents](#table-of-contents)\n    - [Installation](#installation)\n    - [Python](#python)\n    - [Cpp](#cpp)\n    - [Requirements](#requirements)\n    - [Building from source](#building-from-source)\n    - [Generating program representations](#generating-program-representations)\n        - [Using Binary](#using-binary)\n            - [Command-Line options](#command-line-options)\n            - [Flow-Aware Embeddings](#flow-aware-embeddings)\n            - [Symbolic Embeddings](#symbolic-embeddings)\n    - [Using Libraries](#using-libraries)\n    - [Using Python package (IR2Vec-Wheels)](#using-python-package-ir2vec-wheels)\n        - [Initialization -ir2vec.initEmbedding](#initialization--ir2vecinitembedding)\n        - [getProgramVector](#getprogramvector)\n        - [getFunctionVectors](#getfunctionvectors)\n        - [getInstructionVectors](#getinstructionvectors)\n    - [Example](#example)\n    - [Binaries, Libraries and Wheels - Artifacts](#binaries-libraries-and-wheels---artifacts)\n    - [Experiments](#experiments)\n        - [Note](#note)\n    - [Citation](#citation)\n    - [Contributions](#contributions)\n    - [License](#license)\n\n## Installation\n\n`IR2Vec` can be installed in different ways to accommodate individual preferences and requirements effectively. You may select to install via a user-friendly Python wheel setup if you are a Python user, or opt for a C++ based installation if you are looking to integrate with a compiler pass or necessitate advanced control and enhanced integration capabilities. The detailed setup steps are mentioned in the following sections.\n\n## Python\n\nIf you prefer working with Python, you can easily install `IR2Vec` using `pip`.\n\n```\npip install -U ir2vec\n```\nNow, you can import and use IR2Vec in your Python projects. Make sure you have a good understanding of Python and its package management system.\n\nWe are actively working on improving the Python interfaces and providing better support. If you find any good-to-have interfaces that you may need for your use case missing, please feel free to raise a request.\n\n## Cpp\n\nIf you're a C++ developer and require low-level control, optimization, or integration with C++ projects, you can build `IR2Vec` from source. First, ensure the below requirements are satisfied, then follow the steps mentioned in the [Building from source](#building-from-source) section.\n\n## Requirements\n* cmake (\u003e= 3.13.4)\n* GNU Make (4.2.1)\n* LLVM (20.1.0) - [src](https://github.com/llvm/llvm-project/tree/release/20.x), [release](https://releases.llvm.org/download.html#20.1.0)\n    * Support for latest LLVM versions would be added soon\n* Eigen library (3.3.7) (Optional)\n* Python (3.6.7)\n* Other python requirements\n    * For training the vocabulary are available in [seed_embeddings/OpenKE/requirements.txt](./seed_embeddings/OpenKE/requirements.txt), and\n    * For running experiments are available in [experiments/exp_requirements.yaml](./experiments/exp_requirements.yaml)\n    * Conda/Anaconda based virtual environment is assumed\n* LIT and FileCheck\n    * To install LIT, run `pip3 install --user lit`\n    * To install FileCheck, run `pip3 install --user filecheck`\n\n(Experiments are done on an Ubuntu 20.04 machine)\n\n\n## Building from source\n1. `mkdir build \u0026\u0026 cd build`\n2. IR2Vec uses Eigen library. If your system already have Eigen (3.3.7) setup, you can skip this step.\n    1. Download and extract the released version.\n        * `wget https://gitlab.com/libeigen/eigen/-/archive/3.3.7/eigen-3.3.7.tar.gz`\n        * `tar -xvzf eigen-3.3.7.tar.gz`\n    2. `mkdir eigen-build \u0026\u0026 cd eigen-build`\n    3. `cmake ../eigen-3.3.7 \u0026\u0026 make`\n    4. `cd ../`\n3. `cmake -DLT_LLVM_INSTALL_DIR=\u003cpath_to_LLVM_build_dir\u003e -DEigen3_DIR=\u003cpath_to_eigen_build_dir\u003e [-DCMAKE_INSTALL_PREFIX=\u003cinstall_dir\u003e] ..`\n4. `make [\u0026\u0026 make install]`\n\nThis process would generate `ir2vec` binary under `build/bin` directory, `libIR2Vec.a` and `libIR2Vec.so` under `build/lib` directory.\n\nTo ensure the correctness, run `make check_ir2vec`\n\n\n\n## Generating program representations\n`IR2Vec` can be used either as a stand-alone tool using binary or can be integrated with any third-party tools using libraries. Please see below for the usage\ninstructions.\n\n### Using Binary\n\u003e ir2vec -\\\u003cmode\\\u003e -dim \\\u003cdimensions\\\u003e -o \\\u003coutput-file\\\u003e -level \\\u003cp|f\\\u003e -class \\\u003cclass-number\\\u003e -funcName=\\\u003cfunction-name\\\u003e \\\u003cinput-ll-file\\\u003e\n\n#### Command-Line options\n\n- `mode` - can be one of `sym`/`fa`\n    - `sym` denotes Symbolic representation\n    - `fa` denotes Flow-Aware representation\n- `dim` - Dimensions of embeddings\n    - This is an optional argument. Defaults to `300`.\n    - Other supported dimensions are `75` and `100`\n-  `o` - file in which the embeddings are to be appended;     (Note : If  file doesn’t exist, new file would be created, else embeddings would be appended)\n- `level` - can be one of chars `p`/`f`.\n    - `p` denotes `program level` encoding\n    - `f` denotes `function level` encoding\n- `class` - non-mandatory argument. Used for the purpose of mentioning class labels for *classification tasks* (To be used with the `level p`). Defaults to *-1*.  When, not equal to -1, the pass prints `class-number` followed by the corresponding  embeddings\n- `funcName` - also a non-mandatory argument. Used for generating embeddings only for the functions with given name. `level` should be `f` while using this option\n\nPlease use `--help` for further details.\n\n**Format of the output embeddings in `output_file`**\n- If the `level` is `p`:\n\n\u003e     \u003cclass-number\u003e \u003cEmbeddings\u003e\n*class-number would be printed only if it is not -1*\n\n - If the `level` is `f`\n\n\u003e     \u003cfunction-name\u003e = \u003cEmbeddings\u003e\n\n#### Flow-Aware Embeddings\nFor all functions\n* `` ir2vec -fa -dim \u003cdimension\u003e -o \u003coutput_file\u003e -level \u003cp|f\u003e  -class \u003cclass-number\u003e \u003cinput_ll_file\u003e``\n\nFor a specific function\n* `` ir2vec -fa -dim \u003cdimension\u003e -o \u003coutput_file\u003e -level f  -class \u003cclass-number\u003e -funcName=\\\u003cfunction-name\\\u003e\u003cinput_ll_file\u003e``\n\n#### Symbolic Embeddings\nFor all functions\n * `` ir2vec -sym -dim \u003cdimension\u003e -o \u003coutput_file\u003e -level \u003cp|f\u003e -class \u003cclass-number\u003e \u003cinput_ll_file\u003e``\nFor a specific function\n * `` ir2vec -sym -dim \u003cdimension\u003e -o \u003coutput_file\u003e -level f -class \u003cclass-number\u003e -funcName=\\\u003cfunction-name\\\u003e \u003cinput_ll_file\u003e``\n\n## Using Libraries\nThe libraries can be installed by passing the installation location to the `CMAKE_INSTALL_PREFIX` flag during `cmake` followed by `make install`.\nThe interfaces are available in [`IR2Vec.h`](./src/include/IR2Vec.h). External projects that would like to use `IR2Vec` can access the functionality\nusing these exposed interfaces on including `IR2Vec.h` from the installed location after linking statically or dynamically.\n\n* If the project does not use LLVM, LLVM dependencies have to be linked and included separately.\n* Please ensure that the IR2Vec libraries are compiled with compatible LLVM.\n   * If you are getting errors, please recompile IR2Vec by passing the current LLVM install directory path to `LT_LLVM_INSTALL_DIR` during cmake.\n\nThe following template can be used to link IR2vec libraries on a cmake based project.\n\n```cmake\nset(IR2VEC_INSTALL_DIR \"\" CACHE PATH \"IR2Vec installation directory\")\ninclude_directories(\"${IR2VEC_INSTALL_DIR}/include\")\ntarget_link_libraries(\u003cyour_executable_or_library\u003e PUBLIC ${IR2VEC_INSTALL_DIR}/lib/\u003clibIR2Vec.a or libIR2Vec.so\u003e)\n```\n\nAnd then pass the location of IR2Vec's install prefix to `DIR2VEC_INSTALL_DIR` during cmake.\n\nThe following example snippet shows how to query the exposed vector representations.\n\n```c++\n#include \"IR2Vec.h\"\n\n// Creating object to generate FlowAware representation\nauto ir2vec =\n      IR2Vec::Embeddings(\u003cLLVM Module\u003e, IR2Vec::IR2VecMode::FlowAware, \u003cDIM\u003e);\n\n// Getting Instruction vectors corresponding to the instructions in \u003cLLVM Module\u003e\nauto instVecMap = ir2vec.getInstVecMap();\n// Access the generated vectors\nfor (auto instVec : instVecMap) {\n    outs() \u003c\u003c \"Instruction : \";\n    instVec.first-\u003eprint(outs());\n    outs() \u003c\u003c \": \";\n\n    for (auto val : instVec.second)\n      outs() \u003c\u003c val \u003c\u003c \"\\t\";\n}\n\n// Getting vectors corresponding to the functions in \u003cLLVM Module\u003e\nauto funcVecMap = ir2vec.getFunctionVecMap();\n// Access the generated vectors\nfor (auto funcVec : funcVecMap) {\n    outs() \u003c\u003c \"Function : \" \u003c\u003c funcVec.first-\u003egetName() \u003c\u003c \"\\n\";\n    for (auto val : funcVec.second)\n      outs() \u003c\u003c val \u003c\u003c \"\\t\";\n  }\n\n// Getting the program vector\nauto pgmVec = ir2vec.getProgramVector();\n// Access the generated vector\nfor (auto val : pgmVec)\n    outs() \u003c\u003c val \u003c\u003c \"\\t\";\n```\n\n## Using Python package (IR2Vec-Wheels)\n### Initialization -ir2vec.initEmbedding\n\n**Description:** Initialize IR2Vec embedding for an LLVM IR file.\n\n**Parameters:**\n\n* `file_path`: str - Path to the `.ll` or `.bc` file.\n* `encoding_type`: str - Choose `fa` (Flow-Aware) or `sym` (Symbolic).\n* `level`: str - Choose `p` for program-level or `f` for function-level.\n* `dim`: uint - Choose from `[300, 100, 75]`. Default value is `300`\n* `output_file`: str - If provided, embeddings are saved to this file. Default is an empty string.\n\n**Returns:**\n\n* `IR2VecObject`: Initialized object for accessing embeddings.\n\n**Example:**\n\n```python\nimport ir2vec\n\n# Approach 1\ninitObj = ir2vec.initEmbedding(\"/path/to/file.ll\", \"fa\", \"p\")\n\n# Approach 2\ninitObj = ir2vec.initEmbedding(\"/path/to/file.ll\", \"fa\", \"p\", 100)\n\n# Approach 3\ninitObj = ir2vec.initEmbedding(\"/path/to/file.ll\", \"fa\", \"p\", 100, \"output.txt\")\n```\n\n### getProgramVector\n\n**Description:** Gets the program-level vector representation.\n\n**Parameters:** optional\n\n**Returns:**\n\n- `progVector`: ndarray - The program-level embedding vector.\n\n**Example:**\n\n```python\n# Getting the program-level vector\nprogVector = initObj.getProgramVector()\n```\n### getFunctionVectors\n\n**Description:** Gets function-level vectors for all functions in the LLVM IR file.\n\n**Parameters:** optional\n\n**Returns:**\n\n- `functionVectorMap`: dict - A dictionary where keys are function names and values are ndarrays containing function-level embedding vectors.\n\n**Example:**\n\n```python\n# Getting function-level vectors\nfunctionVectorMap = initObj.getFunctionVectors()\n```\n\n### getInstructionVectors\n\n**Description:** Gets instruction-level vectors for all instructions in the LLVM IR file.\n\n**Parameters:** optional\n\n**Returns:**\n\n- `instructionVectorsList`: list - A list of list where each list contains instruction corresponding embedding vectors as values.\n\n**Example:**\n\n```python\n\n# Getting instruction-level vectors\ninstructionVectorsList = initObj.getInstructionVectors()\n```\n## Example\n- The following code snippet contains an example to demonstrate the usage of the package.\n\n```python\nimport ir2vec\nimport numpy as np\n\n# IR2Vec Python APIs can be used in two ways. As shown below.\ninitObj = ir2vec.initEmbedding(\"/path/to/file.ll\", \"fa\", \"p\")\n\n#Approach 1\nprogVector1 = ir2vec.getProgramVector(initObj)\nfunctionVectorMap1 = ir2vec.getFunctionVectors(initObj)\ninstructionVectorsList1 = ir2vec.getInstructionVectors(initObj)\n\n#Approach 2\nprogVector2 = initObj.getProgramVector()\nfunctionVectorMap2 = initObj.getFunctionVectors()\ninstructionVectorsList2 = initObj.getInstructionVectors()\n\n# Both the approaches would result in same outcomes\nassert(np.allclose(progVector1,progVector2))\n\nfor fun, funcObj in functionVectorMap1.items():\n    assert fun == funcObj[\"demangledName\"]\n    functionOutput1 = ir2vec.getFunctionVectors(\n        initObj,\n        funcObj[\"actualName\"],\n    )\n    functionOutput2 = initObj.getFunctionVectors(\n        funcObj[\"actualName\"]\n    )\n    assert(np.allclose(functionOutput1[fun][\"vector\"],functionOutput2[fun][\"vector\"]))\n\n\n```\n## Binaries, Libraries and Wheels - Artifacts\nBinaries, Libraries (.a and .so), and whl files are autogenerated for every relevant check-in using GitHub Actions. Such generated artifacts are tagged along with the successful runs of [`Publish`](https://github.com/IITH-Compilers/IR2Vec/actions?query=workflow%3APublish) and [`Build Wheels`](https://github.com/IITH-Compilers/IR2Vec/actions/workflows/wheel.yml) actions.\n\n## Experiments\n\n### Note\n\u003ccode\u003e The results mentioned in the experiment's scripts/the published version are not updated for this branch. The experimental results for this branch would be different when compared to the published version. For comparison, use the release corresponding to [v0.1.0](https://github.com/IITH-Compilers/IR2Vec/releases/tag/v0.1.0). \u003c/code\u003e\n\n* [Device Mapping](./experiments/Device_Mapping)\n* [Thread Coarsening](./experiments/Thread_Coarsening)\n* [OOV](./experiments/Out_Of_Vocabulary)\n* [Time Taken](./experiments/TimeTaken)\n\n## Citation\n```\n@article{VenkataKeerthy-2020-IR2Vec,\nauthor = {VenkataKeerthy, S. and Aggarwal, Rohit and Jain, Shalini and Desarkar, Maunendra Sankar and Upadrasta, Ramakrishna and Srikant, Y. N.},\ntitle = {{IR2Vec: LLVM IR Based Scalable Program Embeddings}},\nyear = {2020},\nissue_date = {December 2020},\npublisher = {Association for Computing Machinery},\naddress = {New York, NY, USA},\nvolume = {17},\nnumber = {4},\nissn = {1544-3566},\nurl = {https://doi.org/10.1145/3418463},\ndoi = {10.1145/3418463},\njournal = {ACM Trans. Archit. Code Optim.},\nmonth = dec,\narticleno = {32},\nnumpages = {27},\nkeywords = {heterogeneous systems, representation learning, compiler optimizations, LLVM, intermediate representations}\n}\n```\n## Contributions\nPlease feel free to raise issues to file a bug, pose a question, or initiate any related discussions. Pull requests are welcome :)\n\n## License\nIR2Vec is released under a Apache License v2.0 with LLVM Exceptions License. See the LICENSE file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiith-compilers%2Fir2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiith-compilers%2Fir2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiith-compilers%2Fir2vec/lists"}