{"id":16272176,"url":"https://github.com/eliranwong/multiamdgpu_aidev_ubuntu","last_synced_at":"2025-04-08T15:45:27.524Z","repository":{"id":233526721,"uuid":"787385834","full_name":"eliranwong/MultiAMDGPU_AIDev_Ubuntu","owner":"eliranwong","description":"Multi AMD GPU Setup for AI Development on Ubuntu with ROCM","archived":false,"fork":false,"pushed_at":"2024-05-22T19:44:56.000Z","size":488,"stargazers_count":5,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-05-22T20:43:18.597Z","etag":null,"topics":["ai","amd","amd-gpu","amdgpu","freegenius","gpu","rocm","ubuntu"],"latest_commit_sha":null,"homepage":"https://letmedoit.ai","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eliranwong.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-16T12:28:32.000Z","updated_at":"2024-05-28T00:36:07.781Z","dependencies_parsed_at":"2024-05-28T00:36:03.747Z","dependency_job_id":null,"html_url":"https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu","commit_stats":null,"previous_names":["eliranwong/multiamdgpu_aidev_ubuntu"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliranwong%2FMultiAMDGPU_AIDev_Ubuntu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliranwong%2FMultiAMDGPU_AIDev_Ubuntu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliranwong%2FMultiAMDGPU_AIDev_Ubuntu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliranwong%2FMultiAMDGPU_AIDev_Ubuntu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eliranwong","download_url":"https://codeload.github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247874298,"owners_count":21010634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","amd","amd-gpu","amdgpu","freegenius","gpu","rocm","ubuntu"],"created_at":"2024-10-10T18:16:35.025Z","updated_at":"2025-04-08T15:45:27.514Z","avatar_url":"https://github.com/eliranwong.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multi AMD GPU Setup for AI Development on Ubuntu\n\nWelcome to this repository, where I share my notes and insights on \nsetting up multiple AMD GPUs on Ubuntu for AI development. This initiative stems\nfrom the noticeable gap in resources and discussions around AMD GPU setups for \nAI, as most online documentation and forums predominantly focus on Nvidia GPUs. \nThis repository aims to bridge that gap, providing an overview and step-by-step guide based on my personal experience and research.\n\n![result2](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/assets/25262722/29e067dc-2bbd-4140-b0e3-39206a72c4b1)\n\n### What You Will Find Here\n\nThis repository contains notes, guides, and scripts that I've compiled during my\nsetup process. While my setup specifically involves Ubuntu 22.04, kernel 6.5, \nwith 2 AMD RX 7900 XTX GPUs, the information provided here should be applicable \nor easily adaptable to similar configurations. Here's what to expect:\n\n- **Installation Guides:** Detailed steps on installing and configuring the \nnecessary drivers and tools for AMD GPUs on Ubuntu.\n- **Troubleshooting Tips:** Solutions to common issues that might arise during \nthe setup process.\n- **Performance Optimization:** Tips on optimizing your AMD GPU setup for better\nperformance in AI development tasks.\n- **Useful Resources:** A curated list of resources that I found invaluable \nduring my setup process.\n\n### Disclaimer\n\nThe notes and guides in this repository are based on my personal experience and \nresearch. While I strive to provide accurate and up-to-date information, I \ncannot guarantee that everything will work perfectly for every setup. Always \nback up your data and proceed with caution.\n\n### iGPU setup instead?\n\nFor iGPU setup instead, please visit https://github.com/eliranwong/AMD_iGPU_AI_Setup\n\n# Hardware Configurations for Multi-GPUs\n\n* PCIe® slots connected to the GPU must have identical PCIe lane width or bifurcation settings, and support PCIe 3.0 Atomics.\n\n* Only use PCIe slots connected by the CPU and to avoid PCIe slots connected via chipset. Refer to product-specific motherboard documentation for PCIe electrical configuration.\n\n* Ensure the PSU has sufficient wattage to support multiple GPUs.\n\n* Enable either iGPU or Discrete GPU\n\nRead more at: https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/mgpu.html\n\n# Select Ubuntu and Kernel Versions\n\nRead https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-distributions\n\nIf you need to install an additional kernel, read:\n\nhttps://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/Install_Ubuntu_Kernel.md\n\n## Tested Device\n\n\u003e cat /etc/os-release\n\n```\nPRETTY_NAME=\"Ubuntu 24.04.2 LTS\"\nNAME=\"Ubuntu\"\nVERSION_ID=\"24.04\"\nVERSION=\"24.04.2 LTS (Noble Numbat)\"\nVERSION_CODENAME=noble\nID=ubuntu\nID_LIKE=debian\nHOME_URL=\"https://www.ubuntu.com/\"\nSUPPORT_URL=\"https://help.ubuntu.com/\"\nBUG_REPORT_URL=\"https://bugs.launchpad.net/ubuntu/\"\nPRIVACY_POLICY_URL=\"https://www.ubuntu.com/legal/terms-and-policies/privacy-policy\"\nUBUNTU_CODENAME=noble\nLOGO=ubuntu-logo\n```\n\n\u003e uname -srmv\n\n```\nLinux 6.11.0-17-generic #17~24.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 20 22:48:29 UTC 2 x86_64\n```\n\n# Add User to Groups for GPU Access\n\nAdd current user to groups 'render' and 'video'\n\n\u003e sudo usermod -a -G render,video $LOGNAME\n\nTo add all future users to the video and render groups by default, run the following commands:\n\n```\necho 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf\necho 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf\necho 'EXTRA_GROUPS=render' | sudo tee -a /etc/adduser.conf\n```\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\nFor AMD GPUs on Linux, the groups you might need to add users to for proper GPU access are similar to those for NVIDIA GPUs. Here are the key groups:\n\n- video: This group grants access to video devices and may include GPU devices.\n- render: As previously mentioned, this group allows access to GPU rendering devices.\n\nWhen using AMD GPUs, especially with ROCm (Radeon Open Compute), you may also need to add users to these groups to ensure they have the necessary permissions to access the GPU for computing tasks. [The ROCm documentation](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/install-radeon.html) specifically mentions adding users to both the render and video groups to set the correct permissions.\n\nTo add a user to these groups, you can use the following command:\n\n\u003e sudo usermod -a -G render,video username\n\nReplace username with the actual username of the user you want to add to the groups. After adding the user to these groups, they should have the necessary permissions to access the GPU resources on your system. It's always a good practice to log out and log back in or reboot the system to ensure the changes take effect.\n\nIf you're using specific AMDGPU control applications or tools, they might have their own group requirements or recommendations, so it's a good idea to check the documentation for those tools as well1. Remember, managing user access to GPUs is an important aspect of system administration, especially in multi-user environments or when dealing with sensitive compute tasks.\n\n\u003c/details\u003e\n\n# Install ROCM 6.3.4\n\nVersion 6.3.4 is preferred, as it officaillly supports AMD Radeon™ 7000 series GPUs:\n\nRead more at: https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/howto_native_linux.html\n\n## Uninstall Old Copies\n\n```\namdgpu-install --uninstall\nsudo apt remove --purge amdgpu-install\n```\n\n## Install via package amdgpu-install\u003cbr\u003e\n\n```\nsudo apt update\nsudo apt install -y libstdc++-12-dev\nwget https://repo.radeon.com/amdgpu-install/6.3.4/ubuntu/noble/amdgpu-install_6.3.60304-1_all.deb\nsudo apt install ./amdgpu-install_6.3.60304-1_all.deb\nsudo amdgpu-install --usecase=graphics,multimedia,rocm,rocmdev,rocmdevtools,lrt,opencl,openclsdk,hip,hiplibsdk,openmpsdk,mllib,mlsdk --no-dkms -y\n```\n\nFor more options of use cases, read https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/amdgpu-install.html#use-cases\n\nTo install ROCm inside a container, read: [Read https://github.com/eliranwong/incus_container_gui_setup](https://github.com/eliranwong/incus_container_gui_setup/blob/main/ubuntu_22.04_LTS_rocm_6.1.3_tested.md)\n\n## Modify Grub to Avoid a Known Hang Issue\n\nIssue: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/native-install/install-faq.html#issue-5-application-hangs-on-multi-gpu-systems\n\nSolution: Add \"iommu=pt\" to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub. \n\nFor example, \n\n\u003e sudo nano /etc/default/grub\n\nChanged from:\n\n```\nGRUB_CMDLINE_LINUX_DEFAULT=\"quiet splash\"\n```\n\nTo:\n\n```\nGRUB_CMDLINE_LINUX_DEFAULT=\"quiet splash iommu=pt\"\n```\n\nUpdate grub\n\n\u003e sudo update-grub\n\n## Verify\n\nRestart to make changes effective:\n\n\u003e sudo reboot\n\nTo verify, run:\n\n\u003e rocminfo\n\nRead https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/install-radeon.html#post-install-verification-checks\n\n# Fix Xorg issue\n\nIf you use Xorg instead of Wayland and have the issue where the mouse cursor is invisible, you can try to create the file /etc/X11/xorg.conf.d/99-modesetting.conf with the following content:\n\n\u003e sudo nano /etc/X11/xorg.conf.d/99-modesetting.conf\n\n```\nSection \"Device\"\n      Identifier \"modesetting\"\n      Driver \"modesetting\"\nEndSection\n```\n\n# Disable Integrated GPU\n\nTo avoid [a known bug](https://github.com/vosen/ZLUDA#hardware) in underlying ROCm/HIP runtime, disable the integrated GPU.\n\nThe file `/etc/default/grub` is a configuration file for the GRUB bootloader on Ubuntu. The line `GRUB_CMDLINE_LINUX_DEFAULT` in this file is used to pass arguments to the Linux kernel at boot time. To disable an integrated GPU system-wide, you can do so by adding a specific argument to this line. The argument you need is `pci-stub.ids=\\\u003cDEVICE_VENDOR\\\u003e:\\\u003cDEVICE_CODE\\\u003e`, where `\\\u003cDEVICE_VENDOR\\\u003e:\\\u003cDEVICE_CODE\\\u003e` is the vendor and device code of your GPU.\n\nCheck the vendor and device code with 'lspci', e.g.:\n\n\u003e lspci -k | grep -A 2 -i \"VGA\"\n\nFor example, the following output tells that the vendor and device code is '1f66:0001':\n\n```\n05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt (rev c7)\n\tSubsystem: Device 1f66:0001\n\tKernel driver in use: amdgpu\n```\n\nEdit GRUB bootloader configuration file:\n\n\u003e sudo nano /etc/default/grub\n\n```\nGRUB_CMDLINE_LINUX_DEFAULT=\"quiet splash pci-stub.ids=1f66:0001\"\n```\n\nUpdate GRUB for the changes to take effect\n\n\u003e sudo update-grub\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\nThe GRUB_CMDLINE_LINUX_DEFAULT line in the GRUB configuration file is used to set the default kernel boot parameters. These parameters are passed to the Linux kernel at boot time and can be used to control various aspects of the system’s behavior.\n\nThe `pci-stub.ids=\\\u003cDEVICE_VENDOR\\\u003e:\\\u003cDEVICE_CODE\\\u003e` argument doesn't exactly \"disable\" the integrated graphics card, but rather it prevents the Linux kernel from loading a specific driver for that device during the boot process.\n\nHere's how it works:\n\n- Each hardware device in your computer, including your integrated graphics card, has a unique vendor and device ID. This ID is used by the operating system to identify the device and load the appropriate driver for it.\n\n- When you pass `pci-stub.ids=\\\u003cDEVICE_VENDOR\\\u003e:\\\u003cDEVICE_CODE\\\u003e` to the kernel at boot time, you're telling the kernel to reserve the specified device (in this case, your integrated graphics card) for the `pci-stub` driver.\n\n- The `pci-stub` driver is a \"dummy\" driver that doesn't do anything—it simply claims the device and prevents other drivers from being able to use it. This is useful in situations where you want to prevent the operating system from interacting with a device, such as when you're setting up a device for pass-through to a virtual machine.\n\nSo, while the integrated graphics card is still physically present and powered on, the operating system is unable to interact with it because the `pci-stub` driver has claimed it. This effectively \"disables\" the integrated graphics card from the operating system's perspective.\n\nRemember, this is a low-level operation that can have significant effects on your system, so it should be done with caution. Always make sure to consult the relevant documentation or seek expert advice if you're unsure. And don't forget to run `sudo update-grub` and reboot your system after making changes to the GRUB configuration file.\n\n\u003c/details\u003e\n\n# UPDATE GRUB\n\n**Configure Vulkan to use AMD graphics card** [optional]:\n\n    - Edit the file `/etc/default/grub` and in the line that reads `GRUB_CMDLINE_LINUX_DEFAULT`, add: `radeon.cik_support=0 amdgpu.cik_support=1 radeon.si_support=0 amdgpu.si_support=1`.\n    - Create a new file `/etc/modprobe.d/amdgpu.conf` and add the following lines to it:\n        ```\n        options amdgpu si_support=1\n        options amdgpu cik_support=1\n        ```\n    - Create a new file `/etc/modprobe.d/radeon.conf` and add the following lines to it:\n        ```\n        options radeon si_support=0\n        options radeon cik_support=0\n        ```\n    - Create a new file `/etc/modprobe.d/blacklist.conf` and add the following lines to it:\n        ```\n\tblacklist radeon\n\t```\n    - Run `sudo update-grub` and then reboot your system.\n\n# Hardware Detection\n\nWith GUI:\n\n1. Launch Settings application\n2. Select About \u003e Graphics\n\nWith CLI:\n\n\u003e rocminfo\n\nLook for information like:\n\n```\n*******                  \nAgent 2                  \n*******                  \n  Name:                    gfx1100                            \n  Uuid:                    GPU-b54ca445df90862b               \n  Marketing Name:          Radeon RX 7900 XTX                 \n  Vendor Name:             AMD \n  Feature:                 KERNEL_DISPATCH                    \n  Profile:                 BASE_PROFILE                       \n  Float Round Mode:        NEAR                               \n  Max Queue Number:        128(0x80)                          \n  Queue Min Size:          64(0x40)                           \n  Queue Max Size:          131072(0x20000)                    \n  Queue Type:              MULTI                              \n  Node:                    1 \n\n*******                  \nAgent 3                  \n*******                  \n  Name:                    gfx1100                            \n  Uuid:                    GPU-2ff163adb661d5fb               \n  Marketing Name:          Radeon RX 7900 XTX                 \n  Vendor Name:             AMD        \n  Feature:                 KERNEL_DISPATCH                    \n  Profile:                 BASE_PROFILE                       \n  Float Round Mode:        NEAR                               \n  Max Queue Number:        128(0x80)                          \n  Queue Min Size:          64(0x40)                           \n  Queue Max Size:          131072(0x20000)                    \n  Queue Type:              MULTI                              \n  Node:                    2\n```\n\nIn this case, there are two GPUs, which are referred as device 0 and 1 later.\n\n# Environment Variables\n\nModify the values to suit your cases.\n\nThe following examples assume:\n\n* ROCm version 6.3.2 installed\n\n* No integrated GPU\n\n* Two AMD RX 7900 XTX installed.\n\nNote: You may run `rocm-smi` to find the mapping information of node numbers to devices numbers.\n\n## Overview\n\nI use my case as an example:\n\nRemarks:\n* The following settings assumes `/opt/rocm` points to `/opt/rocm-6.3.2`.\n* Modify the values of ROCR_VISIBLE_DEVICES to your own ones.\n\n```\n# rocm\nexport GFX_ARCH=gfx1100\nexport HCC_AMDGPU_TARGET=gfx1100\nexport CUPY_INSTALL_USE_HIP=1\nexport ROCM_VERSION=6.3\nexport ROCM_HOME=/opt/rocm\nexport LD_LIBRARY_PATH=/usr/include/vulkan:/opt/rocm/include:/opt/rocm/lib:/opt/rocm/lib/llvm/lib:/opt/rocm/lib/migraphx/lib:$LD_LIBRARY_PATH\nexport PATH=/home/eliran/.local/bin:/opt/rocm/bin:/opt/rocm/llvm/bin:$PATH\nexport HSA_OVERRIDE_GFX_VERSION=11.0.0\nexport ROCR_VISIBLE_DEVICES=GPU-b54ca445df90862b,GPU-2ff163adb661d5fb\nexport GPU_DEVICE_ORDINAL=0,1\nexport HIP_VISIBLE_DEVICES=0,1\nexport CUDA_VISIBLE_DEVICES=0,1\nexport LLAMA_HIPLAS=0,1\nexport DRI_PRIME=1\nexport OMP_DEFAULT_DEVICE=1\n# vulkan\nexport GGML_VULKAN_DEVICE=0,1\nexport GGML_VK_VISIBLE_DEVICES=0,1\nexport VULKAN_SDK=/usr/share/vulkan\nexport VK_LAYER_PATH=$VULKAN_SDK/explicit_layer.d\n```\n\nROCM_HOME - tells AI libraries where ROCM is stored; typically somewhere in /opt, e.g.:\n\n\u003e export ROCM_HOME=/opt/rocm\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\n`ROCM_HOME` is an environment variable in Linux that is used to specify the location of the ROCm (Radeon Open Compute) software on your system. ROCm is a set of open-source libraries and tools that are used to create high performance, machine learning applications on AMD GPUs.\n\nWhen you install ROCm, it is typically installed in a directory under `/opt`. For example, if you installed version 6.1.3 of ROCm, it might be installed in `/opt/rocm`.\n\nThe `export` command in Linux is used to set environment variables. So, when you run the command `export ROCM_HOME=/opt/rocm`, you are telling the system \"Whenever I refer to `ROCM_HOME`, I actually mean `/opt/rocm`\".\n\nThis is useful because many AI libraries that use ROCm will look for the `ROCM_HOME` environment variable to know where to find the ROCm software. By setting `ROCM_HOME`, you ensure that these libraries can find and use ROCm correctly.\n\n\u003c/details\u003e\n\nLD_LIBRARY_PATH - loader library path; typically this is set to $ROCM_HOME/lib. An indication you’re missing this flag is if you import pytorch and see an error like undefined reference to...\n\n\u003e export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH\n\n\u003e export PATH=/opt/rocm/bin:/opt/rocm/opencl/bin:$PATH\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\n`LD_LIBRARY_PATH` is an environment variable that specifies a list of directories where the dynamic linker should look for dynamically linked libraries. When a program is launched, the dynamic linker checks the `LD_LIBRARY_PATH` to find the libraries that the program needs to run.\n\nIn this example, `LD_LIBRARY_PATH` is being set to include the `/opt/rocm/lib` directory. This is likely where the ROCm (Radeon Open Compute) libraries are installed. If you're trying to use PyTorch and it's built with ROCm support, it will need to know where these libraries are. If `LD_LIBRARY_PATH` doesn't include the ROCm library directory, you might see errors like \"undefined reference to...\" when you try to import PyTorch.\n\nThe command `export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH` is adding `/opt/rocm/lib` to your existing `LD_LIBRARY_PATH`.\n\nThe `PATH` environment variable is similar, but it's used to tell the shell where to look for executable files. The command `export PATH=/opt/rocm/bin:/opt/rocm/opencl/bin:$PATH` is adding the `/opt/rocm/bin` and `/opt/rocm/opencl/bin` directories to your `PATH`. This means that when you type a command, the shell will also look in these directories to find it.\n\n\u003c/details\u003e\n\nHSA_OVERRIDE_GFX_VERSION - workaround for software that doesn’t yet fully support the installed gpu. \n\n\u003e rocminfo | grep gfx\n\nFor example, gfx1100 used by AMD RX 7900 XTX:\n\n\u003e export HSA_OVERRIDE_GFX_VERSION=11.0.0\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\nHSA_OVERRIDE_GFX_VERSION: This is an environment variable used as a workaround for software that doesn’t yet fully support the installed GPU. It's used to override the Graphics/Compute version. For example, if you have a GPU that is not yet fully supported by the ROCm software (like the AMD Radeon RX 7900 XTX which uses gfx1100), you can set this environment variable to tell the ROCm software to treat your GPU as if it were a different, fully supported version.\n\nrocminfo | grep gfx: This is a command-line instruction. `rocminfo` is a tool that provides information about the HSA (Heterogeneous System Architecture) system attributes and agents. The `grep gfx` part of the command filters the output of `rocminfo` to only show lines that contain 'gfx', which are the lines that tell you the version of the AMD GCN ISA or architecture names.\n\nexport HSA_OVERRIDE_GFX_VERSION=11.0.0: This is a command that sets the `HSA_OVERRIDE_GFX_VERSION` environment variable to '11.0.0'. This tells the ROCm software to treat your GPU as if it were a gfx1100, even if it's actually a different version. This can be useful if your GPU is not yet fully supported by the ROCm software.\n\nIn summary, these commands and variables are used to help ensure compatibility between your GPU and the ROCm software, even if your GPU is not yet fully supported. They allow you to override the reported version of your GPU so that the ROCm software treats it as a fully supported version. This can be particularly useful when working with newer GPUs like the AMD Radeon RX 7900 XTX. Please note that while this can enable you to use the ROCm software with unsupported GPUs, it may not provide optimal performance or full functionality. It's always best to check the official ROCm documentation or the GPU manufacturer's documentation for the most accurate and up-to-date information.\n\nRemarks: Set HSA_OVERRIDE_GFX_VERSION=10.3.0 for 680M, and HSA_OVERRIDE_GFX_VERSION=11.0.0 for 780M.\n\n\u003c/details\u003e\n\nROCR_VISIBLE_DEVICES - device indices or UUIDs that will be exposed to applications, e.g.:\n\n\u003e export ROCR_VISIBLE_DEVICES=0,1\n\nRemarks: Tough documents state that device indices are accepted, but device indeces does not work in my case.  I have to use UUIDs, in order for my system to detect the GPUs correctly. UUIDs of individual GPUs can be found with command 'rocminfo'.\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\n`ROCR_VISIBLE_DEVICES` is an environment variable used in the ROCm (Radeon Open Compute) software stack. It specifies which GPU devices will be exposed to applications.\n\nDevice Indices or UUIDs - These are identifiers for the GPUs in your system. A device index is a numerical value assigned to each GPU, starting from 0. A UUID (Universally Unique Identifier) is a unique string that can also be used to identify a GPU.\n\nYou set this environment variable to a list of device indices or UUIDs that you want to expose to applications. For example, `export ROCR_VISIBLE_DEVICES=0,1` means that only the first and second GPUs (indices start from 0) will be visible to applications.\n\nApplications running on the ROCm platform will only be able to see and use the GPUs specified in `ROCR_VISIBLE_DEVICES`. Other GPUs in the system will be hidden from these applications.\n\nThis feature is useful for isolating GPU resources, especially in multi-GPU systems. For instance, you might want certain applications to only use specific GPUs, while other GPUs are reserved for different tasks.\n\n\u003c/details\u003e\n\nGPU_DEVICE_ORDINAL - devices indices exposed to OpenCL and HIP applications, e.g.:\n\n\u003e export GPU_DEVICE_ORDINAL=0,1\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\n`GPU_DEVICE_ORDINAL` is an environment variable used in both OpenCL and HIP (Heterogeneous-Compute Interface for Portability) applications. It's used to control the visibility of devices to these applications, similar to `HIP_VISIBLE_DEVICES`.\n\nThis specific environment variable, `GPU_DEVICE_ORDINAL`, is used to control which devices are available for OpenCL and HIP programs to use when they are run.\n\nDevice Indices are the numerical identifiers assigned to each device. In a system with multiple GPUs, each GPU is assigned a unique device index.\n\nThe value `0,1` means that the first and second devices (as device indices start from 0) are visible to OpenCL and HIP applications. So, if you have two GPUs in your system, both will be available for use by these programs.\n\n\u003c/details\u003e\n\nHIP_VISIBLE_DEVICES - Device indices exposed to HIP applications, e.g.:\n\n\u003e export HIP_VISIBLE_DEVICES=0,1\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\n`HIP_VISIBLE_DEVICES` is an environment variable used in HIP (Heterogeneous-Compute Interface for Portability) applications. HIP is a part of AMD's ROCm (Radeon Open Compute) platform designed to ease the task of porting CUDA applications to AMD's hardware.\n\nThis specific environment variable, `HIP_VISIBLE_DEVICES`, is used to control the visibility of devices to HIP applications. It determines which devices are available for HIP programs to use when they are run.\n\nDevice Indices are the numerical identifiers assigned to each device. In a system with multiple GPUs, each GPU is assigned a unique device index.\n\nThe value `0,1` means that the first and second devices (as device indices start from 0) are visible to HIP applications. So, if you have two GPUs in your system, both will be available for use by HIP programs.\n\n\u003c/details\u003e\n\nCUDA_VISIBLE_DEVICES - provided for CUDA compatibility, e.g.:\n\n\u003e export CUDA_VISIBLE_DEVICES=0,1\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\n`CUDA_VISIBLE_DEVICES` is an environment variable in Linux that CUDA applications use to control which GPUs they can use. This variable is provided for CUDA compatibility.\n\nWhen you run a CUDA application, it can see all the GPUs in your system by default. However, there might be situations where you want to limit an application to only use certain GPUs. This is where `CUDA_VISIBLE_DEVICES` comes into play.\n\nThe `export CUDA_VISIBLE_DEVICES=0,1` command is an example of how you can use this environment variable. Here's what it does:\n\n- `export`: This is a command in Linux that sets environment variables.\n- `CUDA_VISIBLE_DEVICES`: This is the environment variable we're setting. It controls which GPUs a CUDA application can use.\n- `0,1`: This is the value we're setting for `CUDA_VISIBLE_DEVICES`. The numbers correspond to the IDs of the GPUs. `0,1` means the application can use GPU 0 and GPU 1.\n\nSo, `export CUDA_VISIBLE_DEVICES=0,1` means \"set the `CUDA_VISIBLE_DEVICES` environment variable to `0,1`\". After running this command, any CUDA application you run will only be able to see and use GPU 0 and GPU 1, even if there are more GPUs in the system.\n\nThis can be particularly useful in multi-GPU systems, where you might want to reserve certain GPUs for specific tasks or users. By setting `CUDA_VISIBLE_DEVICES`, you can control the resources that different applications or users can access. \n\nPlease note that the GPU IDs are not necessarily fixed, they can change based on the system configuration, GPU topology, or after a system reboot. So, it's always a good idea to verify the GPU IDs before setting `CUDA_VISIBLE_DEVICES`. You can do this using the `nvidia-smi` command, which provides information about the GPUs in your system, including their IDs. \n\n\u003c/details\u003e\n\nLLAMA_HIPLAS - applicable to Llama.cpp setup\n\n\u003e export LLAMA_HIPLAS=0,1\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\nThe LLAMA_HIPLAS=0,1 setting is likely related to the configuration of the LLaMA (Large Language Model) library in the llama.cpp project. However, the exact meaning of this setting is not directly available in the search results.\n\n\u003c/details\u003e\n\nDRI_PRIME - optional in case with no integrated GPU\n\n\u003e export DRI_PRIME=0\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\n`DRI_PRIME` is an environment variable used in Linux systems to manage hybrid graphics. Hybrid graphics are found on recent desktops and laptops, where there are two graphics cards: an integrated one (usually Intel) and a discrete one (like NVIDIA or AMD Radeon). The integrated card is used for regular tasks to save power, while the discrete card is used for GPU-intensive applications like gaming or 3D rendering.\n\nWhen you run a command with `DRI_PRIME=1`, it tells the system to use the discrete GPU for that particular application. For example, if you want to run Firefox using the discrete GPU, you would use the command `DRI_PRIME=1 firefox`.\n\nOn the other hand, `export DRI_PRIME=0` sets the `DRI_PRIME` environment variable to `0` for the entire session or script where the command is run. This means that the integrated GPU (which is usually less powerful but more energy-efficient) will be used for all applications run in that session or script.\n\nPlease note that the actual GPU used can depend on your specific system configuration. You can check which GPU an application is using with the command `glxinfo | grep \"OpenGL renderer\"`.\n\n\u003c/details\u003e\n\nOMP_DEFAULT_DEVICE - default device used for OpenMP target offloading, e.g.:\n\n\u003e export OMP_DEFAULT_DEVICE=1\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\n`OMP_DEFAULT_DEVICE` is an environment variable used in the context of OpenMP, a parallel programming model. This variable is used to specify the default device for OpenMP target offloading.\n\nIn OpenMP, \"offloading\" refers to the process of transferring computation from the host (usually a CPU) to a device (usually a GPU or another accelerator). This is particularly useful in high-performance computing scenarios where you want to leverage the power of GPUs for certain parts of your computation.\n\nThe value of `OMP_DEFAULT_DEVICE` is an integer that corresponds to the device ID. Device IDs usually start from 0 and increment for each additional device. So, if you have two devices and you want to use the second device as the default for offloading, you would set `OMP_DEFAULT_DEVICE=\"1\"` (since we start counting from 0).\n\nIt's not uncommon to skip the first device for offloading. The first device (device 0) could be reserved for other tasks, such as rendering graphics in a desktop environment. Offloading compute-intensive tasks to other devices can help ensure that the system remains responsive. However, this can vary based on the specific system configuration and the requirements of the application. It's always a good idea to check the documentation for your specific hardware and software setup to understand the best practices for your situation.\n\n\u003c/details\u003e\n\nGGML_VULKAN_DEVICE \u0026 GGML_VK_VISIBLE_DEVICES - use Llama.cpp via Vulkan backend\n\n\u003e export GGML_VULKAN_DEVICE=0,1\n\n\u003e export GGML_VK_VISIBLE_DEVICES=0,1\n\n\u003cdetails\u003e\u003csummary\u003eExplanation\u003c/summary\u003e\n\nhttps://github.com/ggerganov/llama.cpp/issues/6166\n\n\u003c/details\u003e\n\nRead more at:\n\n- https://rocmdocs.amd.com/en/latest/conceptual/gpu-isolation.html#environment-variables\n\n- https://rocmdocs.amd.com/projects/HIP/en/develop/how-to/debugging.html#useful-environment-variables\n\n- https://medium.com/@damngoodtech/amd-rocm-pytorch-and-ai-on-ubuntu-the-rules-of-the-jungle-24a7ab280b17\n\n# Install Vulkan Tools\n\nThis is optional if you don't use vulkan.\n\n```\nsudo apt install vulkan-tools libvulkan-dev vulkan-validationlayers vulkan-validationlayers-dev\n```\n\nTo verify:\n\n```\nvulkaninfo\n```\n\n## Check the path of $VULKAN_SDK\n\ne.g.\n\n\u003e locate explicit_layer.d # /usr/share/vulkan/explicit_layer.d\n\nTo set related Vulkan SDK environment variables:\n\n\u003e export VULKAN_SDK=/usr/share/vulkan\n\n\u003e export VK_LAYER_PATH=$VULKAN_SDK/explicit_layer.d\n\nReference: https://github.com/ggerganov/llama.cpp#vulkan\n\n# MIGraphX\n\nInstall ROCm before installing MIGraphX.  To install MIGraphX\n\n```\nsudo apt update \u0026\u0026 sudo apt install -y migraphx half\n```\n\nTo test:\n\n```\n/opt/rocm/bin/migraphx-driver perf --test\n```\n\nHeader files and libraries are installed under /opt/rocm-\\\u003cversion\\\u003e, where \\\u003cversion\\\u003e is the ROCm version.\n\nRead: https://github.com/ROCm/AMDMIGraphX#amd-migraphx\n\n# Set up Python\n\nUse python version 3.10.x, to work with wheel files available at https://repo.radeon.com/rocm/manylinux/rocm-rel-6.1.3/\n\nTo install a specific version with pyenv, read https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/ubuntu_desktop/basic.md#pyenv\n\n```\nsudo apt update\nsudo apt install -y make build-essential python3 python3-setuptools libjpeg-dev python3-pip python3-dev python3-venv libssl-dev libffi-dev libnss3 zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev python3-wheel python3-wheel-whl twine\n```\n\n# Python libraries\n\nTo make it clean, uninstall old copies, if any\n\n\u003e pip uninstall torch torchaudio torchvision cupy spacy numpy protobuf -y\n\n# Set up a python virtual environment\n\n```\npython3 -m venv ai\nsource ai/bin/activate\npip3 install --upgrade pip wheel twine setuptools\n```\n\n# Install Compatible Versions of numpy and protobuf\n\nRun or re-run this line if any conflicts about versions of numpy / protobuf:\n\n```\npip install numpy==1.26.4 protobuf==4.25.3\n```\n\nRemarks: protobuf==5.29.1\n\n# Install PyTorch \u0026 Triton\n\nRead https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/install-pytorch.html\n\n```\nwget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/torch-2.4.0%2Brocm6.3.2-cp312-cp312-linux_x86_64.whl\nwget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/torchvision-0.19.0%2Brocm6.3.2-cp312-cp312-linux_x86_64.whl\nwget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/pytorch_triton_rocm-3.0.0%2Brocm6.3.2.75cc27c26a-cp312-cp312-linux_x86_64.whl\nwget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/torchaudio-2.4.0%2Brocm6.3.2-cp312-cp312-linux_x86_64.whl\npip3 uninstall torch torchvision pytorch-triton-rocm\npip3 install torch-2.4.0+rocm6.3.2-cp312-cp312-linux_x86_64.whl torchvision-0.19.0+rocm6.3.2-cp312-cp312-linux_x86_64.whl torchaudio-2.4.0+rocm6.3.2-cp312-cp312-linux_x86_64.whl pytorch_triton_rocm-3.0.0+rocm6.3.2.75cc27c26a-cp312-cp312-linux_x86_64.whl\n```\n\nTo verify:\n\n```\npython3 -c 'import torch' 2\u003e /dev/null \u0026\u0026 echo 'Success' || echo 'Failure'\npython3 -c 'import torch; print(torch.cuda.is_available())'\npython3 -c \"import torch; print(f'device name [0]:', torch.cuda.get_device_name(0))\"\npython3 -c \"import torch; print(f'device name [1]:', torch.cuda.get_device_name(1))\"\npython3 -m torch.utils.collect_env\n```\n\nAlternately, run:\n\n\u003e python3\n\n```\n\u003e\u003e\u003e import torch\n\u003e\u003e\u003e torch.__version__\n'2.1.2+rocm6.1.3'\n\u003e\u003e\u003e torch.cuda.is_available()\nTrue\n\u003e\u003e\u003e torch.cuda.device_count()\n2\n\u003e\u003e\u003e torch.cuda.get_device_properties(0).total_memory\n25753026560\n\u003e\u003e\u003e torch.cuda.get_device_properties(1).total_memory\n25753026560\n\u003e\u003e\u003e torch.cuda.current_device()\n0\n\u003e\u003e\u003e torch.cuda.get_device_name(torch.cuda.current_device())\n'Radeon RX 7900 XTX'\n```\n\nRead more at https://pytorch.org/get-started/locally/#linux-verification\n\n# Install ONNX Runtime\n\nInstall `migraphx` FIRST!\n\n```\npip3 uninstall onnxruntime-rocm\npip3 install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/\n```\n\nTo verify:\n\n```\npython3 -c \"import onnxruntime; print(onnxruntime.get_available_providers())\"\n```\n\nExpected output:\n\n```\n['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']\n```\n\n# Work with ONNX ExecutionProviders\n\nTo use MiGraphX ExecutionProvider:\n\nRead: https://onnxruntime.ai/docs/execution-providers/MIGraphX-ExecutionProvider.html\n\n```\nproviders = [\n    'MIGraphXExecutionProvider',\n    'CPUExecutionProvider',\n]\n```\n\nTo use ROCm ExecutionProvider:\n\nRead: https://onnxruntime.ai/docs/execution-providers/ROCm-ExecutionProvider.html\n\n```\nproviders = [\n    'ROCMExecutionProvider',\n    'CPUExecutionProvider',\n]\n```\n\nOption: user_compute_stream\n\n```\nproviders = [(\"ROCMExecutionProvider\", {\"device_id\": torch.cuda.current_device(), \"user_compute_stream\": str(torch.cuda.current_stream().cuda_stream)})]\n```\n\n# Install Tensorflow\n\n```\npip install tf-keras --no-deps\npip3 uninstall tensorflow-rocm\npip3 install https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl\n```\n\nTo verify:\n\n```\npython3 -c 'import tensorflow' 2\u003e /dev/null \u0026\u0026 echo 'Success' || echo 'Failure'\n```\n\n# Install Flash Attention 2 (pending update)\n\nLinks available at: https://github.com/ROCm/flash-attention/releases/\n\nRemarks: The cxx11abi part of the filename indicates whether the package was built with the C++11 ABI (Application Binary Interface) enabled or not. The C++11 ABI is a set of rules that define how different parts of a C++ program interact at the binary level.\n\n# Install Cupy (pending update)\n\nExport required variables, if you haven't:\n\n```\nexport CUPY_INSTALL_USE_HIP=1\nexport ROCM_HOME=/opt/rocm\nexport HCC_AMDGPU_TARGET=gfx1100\n```\n\nInstall from source:\n\n```\ngit clone https://github.com/cupy/cupy.git\ncd cupy\ngit checkout rocm-ci-6.1\ngit submodule update --init\npip install git+https://github.com/ROCmSoftwarePlatform/hipify_torch.git\npip install .\n```\n\nTo fix cicular import error, run again:\n\n```\npip3 install torch-2.3.0+rocm6.3.2-cp310-cp310-linux_x86_64.whl torchvision-0.18.0+rocm6.3.2-cp310-cp310-linux_x86_64.whl torchaudio-2.5.0+rocm6.3.2-cp310-cp310-linux_x86_64.whl pytorch_triton_rocm-2.3.0+rocm6.3.2.5a02332983-cp310-cp310-linux_x86_64.whl\n```\n\nTo verify:\n\n```\npython3 -c \"import cupy; print(cupy.__version__)\"\n```\n\n# Install Spacy (pending update)\n\nWith pytorch \u0026 cupy installed first, run:\n\n\u003e pip install spacy\n\n# Install Piper Text-to-Speech\n\nPiper-tts is a good offline tts engine that supports Linux. Its ONNX voice models are small in sizes that runs smooth even without GPUs.\n\nAn [issue](https://github.com/rhasspy/piper/issues/483) and a [pull request](https://github.com/rhasspy/piper/pull/512) are created to support piper to accelrate with AMD-GPUs.\n\nMeanwhile, AMD-GPUs users can still workaround the issue with the following setup:\n\nTo support ROCm-enabled GPUs via 'ROCMExecutionProvider' or 'MIGraphXExecutionProvider':\n\n1. Install piper-tts\n\n\u003e pip install piper-tts\n\n2. Uninstall onnxruntime\n\n\u003e pip uninstall onnxruntime\n\n3. Re-install onnxruntime-rocm\n\n\u003e pip install --force-reinstall onnxruntime_rocm-1.19.0-cp310-cp310-linux_x86_64.whl\n\n4. Fix numpy and protobuf versions\n\n\u003e pip install numpy==1.26.4 protobuf==4.25.3\n\nTo verify:\n\n\u003e python3\n```\n$ import onnxruntime\n$ onnxruntime.get_available_providers()\n```\n\nOutput:\n```\n['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']\n```\n\nWorkaround:\n\nManually edit the 'load' function in the file ../site-packages/piper/voice.py:\n\nFrom:\n\n```\nproviders=[\"CPUExecutionProvider\"]\nif not use_cuda\nelse [\"CUDAExecutionProvider\"],\n```\n\nTo:\n```\nproviders=[\"MIGraphXExecutionProvider\"],\n```\n\nTo upgrade piper-tts, follow the following order:\n\n1. Upgrade piper-tts\n\n\u003e pip install --upgrade piper-tts --no-cache-dir\n\n2. Uninstall onnxruntime\n\n\u003e pip uninstall onnxruntime\n\n3. Install Install onnxruntime-rocm again\n\n\u003e pip install --force-reinstall onnxruntime_rocm-1.19.0-cp310-cp310-linux_x86_64.whl\n\n4. Manually edit the 'load' function in the file ../site-packages/piper/voice.py as described above.\n\n# Check Versions of Installed Packages\n\nCheck combination of installed versions up to this point:\n\n\u003e pip list \u003e pip_list.txt\n\nhttps://github.com/eliranwong/pip_list.txt\n\n# DeepSpeed\n\nhttps://cloudblogs.microsoft.com/opensource/2022/03/21/supporting-efficient-large-model-training-on-amd-instinct-gpus-with-deepspeed/\n\n# ollama\n\nStandard installation: https://ollama.com/download\n\n\u003e curl -fsSL https://ollama.com/install.sh | sh\n\nConfigure Ollama, run:\n\n\u003e sudo nano /etc/systemd/system/ollama.service\n\nAdd the following three lines at the end of the [Service] session:\n\n```\nEnvironment=\"OLLAMA_NUM_PARALLEL=2\"\nEnvironment=\"OLLAMA_MAX_LOADED_MODELS=2\"\nEnvironment=\"OLLAMA_HOST=0.0.0.0\"\n```\n\nReload Ollama, run:\n\n\u003e sudo systemctl daemon-reload\n\n\u003e sudo systemctl restart ollama\n\nAdd user to group `ollama` for access of Ollama directory:\n\n\u003e sudo usermod -a -G ollama $LOGNAME\n\n\u003e sudo reboot\n\n## VS Code Plugin with Ollama\n\nInstall VS code plugin `twinny` by `rjmacarthy`\n\nDownload LLMs to work with twinny, e.g.:\n\n```\nollama pull codellama:7b-instruct\nollama pull codellama:7b-code\n```\n\nClick \"Manage twinny providers\" for more options.\n\n# Build llama.cpp that runs ROCm backend\n\nRun in terminal:\n\n```\ngit clone https://github.com/ggml-org/llama.cpp\nmv llama.cpp/ llamacpp_rocm/\ncd llamacpp_rocm\nHIPCXX=\"$(hipconfig -l)/clang\" HIP_PATH=\"$(hipconfig -R)\" cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1100 -DCMAKE_BUILD_TYPE=Release \u0026\u0026 cmake --build build --config Release -- -j $(lscpu | grep -m 1 '^Core(s)' | awk '{print $NF}')\n```\n\nExpected lines in the terminal output:\n\n```\n...\n-- Adding CPU backend variant ggml-cpu: -march=native \n-- The HIP compiler identification is Clang 18.0.0\n-- Detecting HIP compiler ABI info\n-- Detecting HIP compiler ABI info - done\n-- Check for working HIP compiler: /opt/rocm-6.3.2/lib/llvm/bin/clang - skipped\n-- Detecting HIP compile features\n-- Detecting HIP compile features - done\n-- HIP and hipBLAS found\n-- Including HIP backend\n...\n```\n\n# Build llama.cpp that runs Vulkan backend\n\nAs an alternative to ROCm backend, you may build a copy of llama.cpp that runs Vulkan backend.\n\nTo set up Vulkan driver:\n\n```\nsudo apt install -y glslc glslang-tools glslang-dev mesa-vulkan-drivers vulkan-amdgpu vulkan-tools libvulkan-dev vulkan-validationlayers vulkan-utility-libraries-dev\n```\n\nTo build run:\n\n```\ngit clone https://github.com/ggml-org/llama.cpp\nmv llama.cpp/ llamacpp_vulkan/\ncd llamacpp_vulkan\ncmake -S . -B build -DGGML_VULKAN=ON  -DCMAKE_BUILD_TYPE=Release \u0026\u0026 cmake --build build --config Release -- -j $(lscpu | grep -m 1 '^Core(s)' | awk '{print $NF}')\n```\n\nExpected lines in the terminal output:\n\n```\n...\n-- Adding CPU backend variant ggml-cpu: -march=native \n-- Found Vulkan: /usr/lib/x86_64-linux-gnu/libvulkan.so (found version \"1.3.275\") found components: glslc glslangValidator \n-- Vulkan found\n-- GL_KHR_cooperative_matrix supported by glslc\n-- GL_NV_cooperative_matrix2 not supported by glslc\n-- Including Vulkan backend\n...\n```\n\nMake sure you set the vulkan-related variables, e.g. https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu#overview\n\n## Alias for launching llama-server with ROCm backend\n\nRun in terminal:\n\n```\ncd llamacpp_rocm\necho \"alias llamacpp=\\\"cd /home/$USER/agentmake/models/gguf/ \u0026\u0026 $(pwd)/build/bin/llama-server --threads $(lscpu | grep -m 1 '^Core(s)' | awk '{print $NF}') -ngl 99 --model\\\"\" \u003e\u003e $HOME/.bashrc\n```\n\nRemarks: We add `-ngl 99` in the alias to offload as many layers as available to GPU. Depending on your device hardware, you may need to reduce the value of ngl to load large-sized models.\n\n## Working with Large-size Files\n\n* Adjust the number of layers with `-ngl` to the maximum possible vaule for loading the files on your devices.\n* You may want to control the context size and the output tokens too.\n\nFor examples:\n\n\u003e ./llama-cli -m ../gguf/command-r-plus.gguf -p \"What is machine learning?\" --temp 0.0 -ngl 20 -c 2048 -n 2048 -t 24 -ngl 48\n\n\u003e ./llama-cli -m ../gguf/wizardlm2_8x22b.gguf -p \"What is machine learning?\" --temp 0.0 -ngl 20 -c 2048 -n 2048 -t 24 -ngl 34\n\n## Speed Test: CPU vs CPU+GPUx2\n\nhttps://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/cpu_vs_gpux2.md\n\n## Speed Test: Vulkan vs ROCm\n\nhttps://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/vulkan_vs_rocm.md\n\n## More Benchmark\n\nhttps://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/benchmark.md\n\n# Install Llama-cpp-python Packages\n\nThe author managed to installed llama.cpp with \n\n\u003e CMAKE_ARGS=\"-DLLAMA_CLBLAST=on\" pip install llama-cpp-python\n\nAlternately,\n\nUse hipBLAS (ROCm) as backend:\n\n\u003e sudo apt install libc6-dev libstdc++-12-dev\n\n\u003e CMAKE_ARGS=\"-DLLAMA_HIPBLAS=on\" pip install llama-cpp-python\n\nUse Vulkan as backend:\n\n\u003e CMAKE_ARGS=\"-DLLAMA_VULKAN=on\" pip install llama-cpp-python\n\nRead more at: https://llama-cpp-python.readthedocs.io/en/stable/\n\n## Troubleshoot Tensor Split Issue\n\n[an issue regarding tensor_split feature](https://github.com/abetlen/llama-cpp-python/issues/1166)\n\n![amdgpu_llamacpp](https://github.com/eliranwong/freegenius/assets/25262722/6d227573-eef9-49ea-9239-59cae140a8d2)\n\nEdit the file \"llama_cpp.py\", in this case, located in '~/apps/freegenius/lib/python3.10/site-packages/llama_cpp'\n\nEdit the file manually: '~/apps/freegenius/lib/python3.10/site-packages/llama_cpp/llama_cpp.py'\n\nChange:\n\nfrom\n\n```\nLLAMA_MAX_DEVICES = _lib.llama_max_devices()\n```\n\nto\n\n```\n#LLAMA_MAX_DEVICES = _lib.llama_max_devices()\nLLAMA_MAX_DEVICES = 2\n```\n\n# Stable-diffusion-cpp-python\n\n```\nCMAKE_ARGS=\"-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100\" pip install stable-diffusion-cpp-python --no-cache-dir\n```\n\n# stable-diffusion-webui\n\nSetup of stable-diffusion-webui is straightforward as follows:\n\n```\nsudo apt install google-perftools libgl1\ngit clone https://github.com/AUTOMATIC1111/stable-diffusion-webui\ncd stable-diffusion-webui\npython3 -m venv venv\n./webui.sh\n\n# Set up an alias [optional]\necho 'alias sdwebui=\"'$(pwd)'/webui.sh\"' \u003e\u003e ~/.bashrc\n```\n\n```\nopen http://127.0.0.1:7860\n```\n\nRead more at: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs\n\n# ComfyUI\n\n[Official setup instructions](https://github.com/comfyanonymous/ComfyUI) works:\n\n```\ngit clone https://github.com/comfyanonymous/ComfyUI\ncd ComfyUI/custom_nodes\ngit clone https://github.com/ltdrdata/ComfyUI-Manager\ncd ..\npython3 -m venv venv\nsource venv/bin/activate\npip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3\npython3 -m pip install -r requirements.txt  --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.3\npython3 -m pip install -r custom_nodes/ComfyUI-Manager/requirements.txt --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.3\npython3 main.py\n\n# Set up an alias [optional]\necho 'alias comfyui=\"'$(pwd)'/venv/bin/python3 '$(pwd)'/main.py\"' \u003e\u003e ~/.bashrc\n```\n\n```\nopen http://127.0.0.1:8188\n```\n\n# SwarmUI + Flux in GGUF Formats\n\n1. Install SwarmUI\n\nRead the latest instructions at: https://github.com/mcmonkeyprojects/SwarmUI#installing-on-linux\n\n```\nwget https://github.com/mcmonkeyprojects/SwarmUI/releases/download/0.6.5-Beta/install-linux.sh -O install-linux.sh\nchmod +x install-linux.sh\n./install-linux.sh\n```\n2. Select AMD version during the installation process\n\n![amd_version](https://github.com/user-attachments/assets/2ccc0862-e4e3-4d15-85a7-5d604d74044b)\n\n3. After SwarmUI is installed, stop the server first.\n\n4. Download a gguf file from GGUF Quantized \"unet\" models repositories, such as Flux Schnell https://huggingface.co/city96/FLUX.1-schnell-gguf/tree/main or Flux Dev https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main\n\n5. Place the donwload file(s) in folder `Models/unet` inside SwarmUI directory\n\n6. Download file `ae.safetensors` from https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main and place it in `Models/VAE` inside SwarmUI directory\n\n7. Run `./launch-linux.sh` to launch SwarmUI\n\n8. In \"Models\" tab, edit model metadata of the gguf file and select the correct architecture, e.g. Flux.1 Dev\n\n![edit_model_metadata](https://github.com/user-attachments/assets/fdcde050-476e-4a6c-8d58-0ee41f37ecb3)\n\n8. Enter a prompt and generate an image. Confirm to install GGUF support when prompted.\n\n![install_GGUF_support](https://github.com/user-attachments/assets/97ac8b50-2e84-45e9-a69f-d1d8b7080e84)\n\nRemarks:\n\n* a cfg_scale of 1 is recommended for FLUX\n* euler sampling method is recommended for FLUX\n\n9. Set an alias, assuming your current location at SwamUI directory:\n\n\u003e echo 'alias swarmui='$(pwd)'/launch-linux.sh' \u003e\u003e ~/.bashrc\n\n# fabric\n\nInstall pipx xsel and ffmpeg to work with fabric:\n\n```\nsudo apt install -y pipx xsel ffmpeg\ngit clone https://github.com/danielmiessler/fabric.git\ncd fabric\npipx install .\ntee --append $HOME/.bashrc \u003c\u003cEOF\nalias pbcopy='xsel -b -i'\nalias pbpaste='xsel -b -o'\nEOF\nfabric --setup\nsource $HOME/.bashrc\n```\n\n# perplexica\n\nInstall docker first and run:\n\n```\nsudo apt install -y git\ngit clone https://github.com/ItzCrazyKns/Perplexica.git\ncd Perplexica\ncp sample.config.toml config.toml\ndocker compose up -d\nopen localhost:3000\n```\n\n# CLI and Desktop Integration with AgentMake AI\n\nRun in terminal:\n\n```\n# optional: navigate to home directory\ncd\n# install in a virtual environment\npython3 -m venv ai\nsource ai/bin/activate\npip install --upgrade agentmake[genai]\necho \". /home/$USER/ai/bin/activate\" \u003e\u003e ~/.bashrc\n# To test\nai Hi!\n```\n\n## Edit Configurations\n\nTo edit configurations or add API keys, run in terminal:\n\n\u003e ai -ec\n\n## Test with Ollama\n\n\u003e ai Hi!\n\nRemarks: Ollama is set as the default backend, so you can use the `ai` or `aic` commands without specifying the backend option. Run `ai -ec` to edit configurations.\n\n## Test with Chat Feature\n\nUse command `aic` with chat features enabled, e.g.:\n\n\u003e aic Tell me a joke.\n\nClose the terminal app and reopen it\n\n\u003e aic Tell me one more.\n\nChat history is saved locally and recalled even the terminal session is ended.\n\nBecome a new conversation with `-n` option, e.g.:\n\n\u003e aic -n Hi!\n\n## Test with Llama.cpp\n\nYou can run llama.cpp server with the model files downloaded via Ollama.\n\nTo access ollama model files, add user to group `ollama`:\n\n\u003e sudo usermod -a -G ollama $LOGNAME\n\n\u003e sudo reboot\n\nTo download a model via Ollama and save a copy of it in `~/agentmake/models/gguf/` by default, e.g.:\n\n\u003e ai --get_model deepseek-r1 -gm llama3.3:70b -gm aya-expanse\n\nTo run an instance of llama-server, assuming that you have set up an alias as mentioned [here](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu#alias-for-launching-llama-server-with-rocm-backend), e.g.:\n\n\u003e llamacpp deepseek-r1.gguf\n\nTo run agentmake with llama.cpp, e.g.:\n\n\u003e ai -b llamacpp Hi!\n\n## Test with Perplexica\n\nTo list available tools that work with perplexica, run:\n\n\u003e ai -lt | grep perplexica\n\nExpected output:\n\n```\nperplexica/openai\nperplexica/groq\nperplexica/xai\nperplexica/googleai\nperplexica/anthropic\nperplexica/github\n```\n\nTo use one of them, e.g.:\n\n\u003e ai -t perplexica/github What is AgentMake AI?\n\n## Test with SearXNG\n\nSearXNG is automatically installed with Perplexica, to get real-time information, e.g.:\n\n\u003e ai -t search/searxng Give me news updates in London today.\n\n## Test with Fabric Integration\n\nAssuming fabric patterns are downloaded, e.g.:\n\n\u003e ai What are AI agents? -sys fabric.write_micro_essay -b genai\n\n## Test with Selected Text in Any Applicaitons\n\nFirst, make sure `xsel` is installed:\n\n\u003e sudo apt install xsel\n\nLaunch `Settings` \u003e Keyboard \u003e View and Customise Shortcuts \u003e Custom Shortcuts \u003e +\n\nFill in content, like below (replace `username` with your `username`: \n\n```\nName: AgentMake AI\nCommand: gnome-terminal -- bash -c \"/home/username/ai/bin/ai -i -eo -py\"\nShift+Ctrl+A\n```\n\n![Image](https://github.com/user-attachments/assets/d21fea9a-2288-4e85-96ad-dfbee7ce160d)\n\nSelect some text in an application, then press `Shift+Ctrl+A`.\n\nChoose a predefined instruction:\n\n![Image](https://github.com/user-attachments/assets/e4872498-0cef-48e7-a550-55c0c4234929)\n\nAssistant response is automatically copied to clipboard.\n\nRemarks: You can define up to 10 custom instructions for being selected in the dialog, by specifying the values of `CUSTOM_INSTRUCTION_1`, `CUSTOM_INSTRUCTION_2`, `CUSTOM_INSTRUCTION_3`, ... `CUSTOM_INSTRUCTION_10` in AgentMake configurations (run `ai -ec` to edit).\n\n## Test with Tool in Custom instruction\n\nYou can specify a tool in a custom instruction by prefixing the tool name with symbol `@`\n\nFor example, if you want to extract a Youtube url from the selected text, download the video and convert it into mp3:\n\nEdit configuration:\n\n\u003e ai -ec\n\nEdit the item `CUSTOM_INSTRUCTION_1`:\n\n```\nCUSTOM_INSTRUCTION_1=\"@youtube/download_audio\"\n```\n\nTry to highlight a text that contains a YouTube url, in any applications, then press `Shift+Ctrl+A`.\n\n## Test with Image Creation with Flux\n\nRequirement: To run the following example, you need to manually download the file `ae.safetensors` from https://huggingface.co/black-forest-labs/FLUX.1-dev and place it in `~/agentmake/models/flux`.\n\nTo check available tools, to work with Flux:\n\n\u003e ai -lt | grep flux\n\n```output\nimages/create_flux_portrait\nimages/create_flux_landscape\nimages/create_flux\n```\n\nTo create an image, e.g.:\n\n\u003e ai -t images/create_flux a cute cat\n\n![Image](https://github.com/user-attachments/assets/441d6ac8-4f2d-449a-a188-616529a595e9)\n\nRemarks: There are `iw`, `ih` and `iss` for adjusting the image output.\n\n## Note about Azure AI Setup\n\nAn easy way to deploy AI models via Azure service:\n\n1. Sign in https://ai.azure.com/github\n2. All resources \u003e Create New\n3. Overview \u003e copy an API key, Azure OpenAI Service and Azure AI inference endpoints\n\n* Use Azure OpenAI Service endpoint for running OpenAI models; the endpoint should look like https://resource_name.openai.azure.com/\n\n* Use Azure AI inference endpoint for running DeepSeek-R1 and Phi-4; the endpoint should look like https://resource_name.services.ai.azure.com/models\n\nTo configure AgentMake AI, run:\n\n\u003e ai -ec\n\n## Note about Vertex AI\n\nMake sure the extra package `genai` is installed with the command mentioned above:\n\n\u003e pip install --upgrade \"agentmake[genai]\"\n\nTo configure, run:\n\n\u003e ai -ec\n\nEnter the path of your Google application credentials JSON file as the value of `VERTEXAI_API_KEY`. You need to specify your project ID and service location, in the configurations, as well. e.g.:\n\n```\nVERTEXAI_API_KEY=~/agentmake/google_application_credentials.json\nVERTEXAI_API_PROJECT_ID=my_project_id\nVERTEXAI_API_SERVICE_LOCATION=us-central1\n```\n\nTo test Gemini 2.0 with Vertex AI, e.g.:\n\n\u003e ai -b vertexai -m gemini-2.0-flash Hi!\n\n## Using other backends and tools\n\nTo list all available tools:\n\n\u003e ai -lt\n\nFor all options, run:\n\n\u003e ai -h\n\nTo edit configurations, run:\n\n\u003e ai -ec\n\nAgentMake AI supports 14 AI backends and 7 agentic components.\n\nRead more at https://github.com/eliranwong/agentmake\n\n# Llama Factory\n\n```\ngit clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git\ncd LLaMA-Factory\npython3 -m venv rocm\nsource rocm/bin/activate\npip install -e \".[metrics]\"\npip uninstall torch triton -y\nwget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/torch-2.3.0%2Brocm6.3.2-cp310-cp310-linux_x86_64.whl\nwget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/torchvision-0.18.0%2Brocm6.3.2-cp310-cp310-linux_x86_64.whl\nwget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2/pytorch_triton_rocm-2.3.0%2Brocm6.3.2.5a02332983-cp310-cp310-linux_x86_64.whl\npip3 install torch-2.3.0+rocm6.3.2-cp310-cp310-linux_x86_64.whl torchvision-0.18.0+rocm6.3.2-cp310-cp310-linux_x86_64.whl pytorch_triton_rocm-2.3.0+rocm6.3.2.5a02332983-cp310-cp310-linux_x86_64.whl\npip install --upgrade huggingface_hub\n```\n\nTo login hugging face:\n\n```\nhuggingface-cli login\n```\n\nTo run webui:\n\n\u003e env HIP_VISIBLE_DEVICES=0 llamafactory-cli webui\n\nNote: Llama Factory currently fails to run training when mulitple GPUs are used, e.g.:\n\n\u003e env HIP_VISIBLE_DEVICES=0,1 llamafactory-cli webui\n\n\n# Performance Optimization\n\nFor performance optimization, you may read:\n\nhttps://huggingface.co/docs/optimum/main/en/amd/amdgpu/overview\n\nhttps://github.com/nktice/AMD-AI/blob/main/performance-tuning.md\n\n# JAX\n\nhttps://jax.readthedocs.io/en/latest/developer.html#additional-notes-for-building-a-rocm-jaxlib-for-amd-gpus\n\nhttps://keras.io/guides/distributed_training_with_jax/\n\n# CUDA-compatible Alternative\n\nhttps://github.com/vosen/ZLUDA\n\nCurrent known issues of ZLUDA: https://github.com/vosen/ZLUDA#known-issues\n\n# References\n\nhttps://rocm.docs.amd.com/projects/install-on-linux/en/latest/index.html\n\nhttps://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/index.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feliranwong%2Fmultiamdgpu_aidev_ubuntu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feliranwong%2Fmultiamdgpu_aidev_ubuntu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feliranwong%2Fmultiamdgpu_aidev_ubuntu/lists"}