{"id":27102374,"url":"https://github.com/llm-db/tensor-program-optimization-with-auto-batching","last_synced_at":"2025-04-06T15:37:39.755Z","repository":{"id":285379431,"uuid":"957922848","full_name":"llm-db/tensor-program-optimization-with-auto-batching","owner":"llm-db","description":"Tensor Program Optimization with Auto-Batching (Master Thesis, ETH Zürich, 2025)","archived":false,"fork":false,"pushed_at":"2025-03-31T11:22:01.000Z","size":61,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-31T12:26:18.414Z","etag":null,"topics":["inference","llm","lora","peft","tvm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/llm-db.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-31T11:16:10.000Z","updated_at":"2025-03-31T11:25:43.000Z","dependencies_parsed_at":"2025-03-31T12:26:32.074Z","dependency_job_id":"347d6a15-d57a-41da-9cac-73e03ce53d93","html_url":"https://github.com/llm-db/tensor-program-optimization-with-auto-batching","commit_stats":null,"previous_names":["llm-db/tensor-program-optimization-with-auto-batching"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llm-db%2Ftensor-program-optimization-with-auto-batching","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llm-db%2Ftensor-program-optimization-with-auto-batching/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llm-db%2Ftensor-program-optimization-with-auto-batching/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llm-db%2Ftensor-program-optimization-with-auto-batching/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/llm-db","download_url":"https://codeload.github.com/llm-db/tensor-program-optimization-with-auto-batching/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247505193,"owners_count":20949785,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["inference","llm","lora","peft","tvm"],"created_at":"2025-04-06T15:37:39.192Z","updated_at":"2025-04-06T15:37:39.743Z","avatar_url":"https://github.com/llm-db.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"This repository contains the code for Luca Strässle's master's thesis [Tensor Program Optimization with Auto-Batching]()\n\n# Getting Started\n```\nconda create -n AutoPEFT python=3.12\nconda activate AutoPEFT\npip install -r requirements.txt\n```\n\n## PEFT installation\n```\ncd\ngit clone -b v0.15.1 https://github.com/huggingface/peft.git\ncd peft\npip install -e .\n```\n\n## TVM Installation (Nvidia GPU)\n```\nconda install -c conda-forge -c anaconda \"llvmdev==19.1.4\" \"cmake==3.31.1\" git libxml2\ncd\ngit clone --recursive -b v0.18.0 https://github.com/apache/tvm tvm\nexport LD_LIBRARY_PATH=$(conda info --base)/envs/AutoPEFT/lib:$LD_LIBRARY_PATH\nexport LIBRARY_PATH=$(conda info --base)/envs/AutoPEFT/lib:$LIBRARY_PATH\nexport CPATH=$(conda info --base)/envs/AutoPEFT/include:$CPATH\ncd tvm\nrm -rf build \u0026\u0026 mkdir build \u0026\u0026 cd build\ncp ../cmake/config.cmake .\necho \"set(CMAKE_BUILD_TYPE RelWithDebInfo)\" \u003e\u003e config.cmake\necho \"set(HIDE_PRIVATE_SYMBOLS ON)\" \u003e\u003e config.cmake\n```\nSome lines in `config.cmake` have to be checked and potentially modified. Open it with `vim config.cmake` and make sure the following are set correctly:\n```\nset(USE_LLVM \"llvm-config --ignore-libllvm --link-static\")\nset(USE_CUDA ON)\nset(USE_METAL OFF)\nset(USE_VULKAN OFF)\nset(USE_OPENCL OFF)\nset(USE_CUBLAS ON)\nset(USE_CUDNN ON)\nset(USE_CUTLASS ON)\n```\nNow continue with the build:\n```\ncmake -DCMAKE_CUDA_ARCHITECTURES=86 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12/bin/nvcc -DCMAKE_PREFIX_PATH=$(conda info --base)/envs/AutoPEFT -DLIBXML2_LIBRARIES=$(conda info --base)/envs/AutoPEFT/lib/libxml2.so .. \u0026\u0026 cmake --build . --parallel $(nproc)\nexport TVM_LIBRARY_PATH=/home/\u003cuser\u003e/tvm/build\ncd ../python\npip install -e .\n```\n\n# Repository Structure\nThe repository contains the following folders:\n- **huggingface**: contains the implementations using HuggingFace's `transformers` and `peft` libraries\n- **init_peft_weights**: contains code to randomly generate LoRA weights\n- **prompts**: contains the default prompt (512 tokens) that we used for our experiments\n- **tvm**: contains the implementations using TVM\n\nThe **huggingface** and **tvm** folders contain READMEs with detailed execution instructions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllm-db%2Ftensor-program-optimization-with-auto-batching","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fllm-db%2Ftensor-program-optimization-with-auto-batching","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllm-db%2Ftensor-program-optimization-with-auto-batching/lists"}