https://github.com/llm-db/tensor-program-optimization-with-auto-batching

Tensor Program Optimization with Auto-Batching (Master Thesis, ETH Zürich, 2025)
https://github.com/llm-db/tensor-program-optimization-with-auto-batching

inference llm lora peft tvm

Last synced: 2 months ago
JSON representation

Tensor Program Optimization with Auto-Batching (Master Thesis, ETH Zürich, 2025)

Host: GitHub
URL: https://github.com/llm-db/tensor-program-optimization-with-auto-batching
Owner: llm-db
Created: 2025-03-31T11:16:10.000Z (2 months ago)
Default Branch: main
Last Pushed: 2025-03-31T11:22:01.000Z (2 months ago)
Last Synced: 2025-03-31T12:26:18.414Z (2 months ago)
Topics: inference, llm, lora, peft, tvm
Language: Python
Homepage:
Size: 59.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        This repository contains the code for Luca Strässle's master's thesis [Tensor Program Optimization with Auto-Batching]()

# Getting Started

```

conda create -n AutoPEFT python=3.12

conda activate AutoPEFT

pip install -r requirements.txt

```

## PEFT installation

```

cd

git clone -b v0.15.1 https://github.com/huggingface/peft.git

cd peft

pip install -e .

```

## TVM Installation (Nvidia GPU)

```

conda install -c conda-forge -c anaconda "llvmdev==19.1.4" "cmake==3.31.1" git libxml2

cd

git clone --recursive -b v0.18.0 https://github.com/apache/tvm tvm

export LD_LIBRARY_PATH=$(conda info --base)/envs/AutoPEFT/lib:$LD_LIBRARY_PATH

export LIBRARY_PATH=$(conda info --base)/envs/AutoPEFT/lib:$LIBRARY_PATH

export CPATH=$(conda info --base)/envs/AutoPEFT/include:$CPATH

cd tvm

rm -rf build && mkdir build && cd build

cp ../cmake/config.cmake .

echo "set(CMAKE_BUILD_TYPE RelWithDebInfo)" >> config.cmake

echo "set(HIDE_PRIVATE_SYMBOLS ON)" >> config.cmake

```

Some lines in `config.cmake` have to be checked and potentially modified. Open it with `vim config.cmake` and make sure the following are set correctly:

```

set(USE_LLVM "llvm-config --ignore-libllvm --link-static")

set(USE_CUDA ON)

set(USE_METAL OFF)

set(USE_VULKAN OFF)

set(USE_OPENCL OFF)

set(USE_CUBLAS ON)

set(USE_CUDNN ON)

set(USE_CUTLASS ON)

```

Now continue with the build:

```

cmake -DCMAKE_CUDA_ARCHITECTURES=86 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12/bin/nvcc -DCMAKE_PREFIX_PATH=$(conda info --base)/envs/AutoPEFT -DLIBXML2_LIBRARIES=$(conda info --base)/envs/AutoPEFT/lib/libxml2.so .. && cmake --build . --parallel $(nproc)

export TVM_LIBRARY_PATH=/home//tvm/build

cd ../python

pip install -e .

```

# Repository Structure

The repository contains the following folders:

- **huggingface**: contains the implementations using HuggingFace's `transformers` and `peft` libraries

- **init_peft_weights**: contains code to randomly generate LoRA weights

- **prompts**: contains the default prompt (512 tokens) that we used for our experiments

- **tvm**: contains the implementations using TVM

The **huggingface** and **tvm** folders contain READMEs with detailed execution instructions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/llm-db/tensor-program-optimization-with-auto-batching

Awesome Lists containing this project

README