https://github.com/alenzenx/windowseasyfintunellm

Finetune LLM using Torchtune on Windows
https://github.com/alenzenx/windowseasyfintunellm

finetune finetune-llm finetuning llama llm torchtune windows windows10 windows11

Last synced: 3 months ago
JSON representation

Finetune LLM using Torchtune on Windows

Host: GitHub
URL: https://github.com/alenzenx/windowseasyfintunellm
Owner: alenzenx
License: mit
Created: 2025-03-16T09:08:03.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-03-17T17:42:38.000Z (3 months ago)
Last Synced: 2025-03-17T18:24:27.979Z (3 months ago)
Topics: finetune, finetune-llm, finetuning, llama, llm, torchtune, windows, windows10, windows11
Language: Python
Homepage:
Size: 126 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Windows Easy Fintune LLM

### **"This is a simple Finetune LLM tutorial using Windows + Torchtune"**

![image](https://github.com/alenzenx/WindowsEasyFintuneLLM/blob/main/goodjob.jpg)

## **Software Configuration**

**OS : Windows 11**

**CUDA : 12.4**

**cuDNN : v8.9.7**

**python : 3.11.0**

**pytorch : 2.6.0+cu124**

**triton-windows : 3.2.0.post12**

**torchtune : 0.5.0+cu124**

**LLM : LLaMA-2-7B** *(Used in this tutorial)*

**Finetune method : QLoRA** *(Used in this tutorial)*

## **Hardware Configuration**

**GPU : NVIDIA GeForce RTX 3060 12GB** *(important : 12GB of VRAM is the minimum standard.)*

**RAM : 16GB** *(Better up)*

## **1. Download and Install**

### *python 3.11.0 + CUDA 12.4 + cuDNN v8.9.7*

#### python 3.11.0

https://www.python.org/downloads/release/python-3110/

#### CUDA and cuDNN similar install tutorial

https://medium.com/@alenzenx/安裝-cuda12-6-與-cudnn-8-9-7-34f95ef8ce7f

## **2. Install MSVC for Triton-Windows**

#### Add a Windows System variables (The following path is the default path)

Variable name

```

CUDA_PATH

```    

Variable value

```

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4

```

#### Download Visual Studio Installer

https://visualstudio.microsoft.com/downloads/

#### Open Visual Stduio Installer

#### Install Visual Studio Build Tools 2022 version : 17.13.2

#### Install MSVC (Select within Visual Studio Build Tools 2022 version)

##### Click "Modify" 

##### Click "Individual components"

##### Select

1. MSVC v143 - VS 2022 C++ x64/x86 build tools(Latest)

2. Windows 11 SDK(10.0.22621.0)

3. C++ CMake tools for Windows

4. MSBuild support for LLVM(clang-cl) toolset 

#### Add the following path to Path under Windows User variables for User (The following path is the default path)

```

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\bin\Hostx64\x64

```

## **3. Create Virtual Environment**

#### (Change "C:\Users\User\Desktop\ourllm" to your project path)

```

python -m venv C:\Users\User\Desktop\ourllm

```

```

C:\Users\User\Desktop\ourllm\Scripts\Activate.ps1

```

## **4. Install requirements**

```

pip install -r requirements.txt

```

#### Verify CUDA GPU execution

```

python GPUtest.py

```

#### Verify Triton execution

```

python test_triton.py

```

## **5. Download raw LLaMA-2-7B**

```

pip install llama-stack

```

```

llama model list --show-all

```

```

llama download --source meta --model-id Llama-2-7b

```

#### LLaMA-2 verification key (a URL starting with https)

##### You must apply from the following website

https://www.llama.com/llama-downloads/

##### Click "Previous language & safety models"

##### Choose "Llama 2"

**(a URL starting with https, similar to this)**

```

https://download.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoiMWh0d3JyeWVxOXE1cWpjMTQ5aDQ2OWx5IiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQubGxhbWFtZXRhLm5ldFwvKiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MTk1ODM1MH19fV19&Signature=nCSq%7ECseY3cvvI5w7THDAAXAvaiqP81ibq5nLCztW1efQmL-f67TvxGrblYUGV5Kg7URAsDxJNp5NFdOVoyOX5E5fpFm1Dzi2xAfsrunyGVnud-uliH8HdHoEwT9Pmin5qSt4slG9v2n4hSw7t-htP4dd5yh69rpf7GJWH02QKc66Axf4%7EoQ1AhFc0cLpSpS3MUMDp7D1m2jEjT98J4Ee3Hj1eH%7EtU0mGytyncEb-W1bNEZt8TdTIDwE8pY2S9sXpzGkbQrHv5A4QvR0fqEcvio47uvVjYqSH7ExCHJP5WeYEuT6lXNFgfn59oe0coyliIseAXLQet7X7Jbh2m64Tw__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=587287740993231

```

## **6. Convert the raw LLaMA-2 model into hf format (hf format=huggingface format)**

```

pip install protobuf sentencepiece

```

```

python convert_llama_weights_to_hf.py --input_dir "Llama-2-7b" --model_size 7B --output_dir "Llama-2-7b-hf" --llama_version 2

```

## **7. Fine-tune LLaMA-2 using Torchtune.**

#### torchtune directory

    tune ls

#### Copy the default QLoRA single-GPU training config file from tune

    tune cp llama2/7B_qlora_single_device custom_config.yaml

### **Write custom_config.yaml**

#### Change output_dir path

    output_dir: qlora_output

#### Change tokenizer path

    tokenizer:

      _component_: torchtune.models.llama2.llama2_tokenizer

      path: Llama-2-7b-hf/tokenizer.model

      max_seq_len: null

#### Change checkpointer path and Change to save only QLoRA weights

    checkpointer:

      _component_: torchtune.training.FullModelHFCheckpointer

      checkpoint_dir: Llama-2-7b-hf

      checkpoint_files: [

          model-00001-of-00003.safetensors,

          model-00002-of-00003.safetensors,

          model-00003-of-00003.safetensors

      ]

      adapter_checkpoint: null

      recipe_checkpoint: null

      output_dir: ${output_dir}

      model_type: LLAMA2

    resume_from_checkpoint: False

    save_adapter_weights_only: True

#### Floating-point format : bf16 -> fp32 (Geforce GPU need)

    dtype: fp32

#### Change batch size

    dataset:

      _component_: torchtune.datasets.alpaca_cleaned_dataset

      packed: False  # True increases speed

    seed: null

    shuffle: True

    batch_size: 4

#### Verify custom_config.yaml

    tune validate custom_config.yaml

## **If you want to train the full finetune instead of a dummy test : proceed to Step 9.**

## **8. Create Dummy Test**

#### **Write custom_config.yaml**

#### Change batch size and create dummy test path

    dataset:

      _component_: my_dummy_dataset.MyDummyDataset

      data_file: "./dummy_alpaca.json"

      packed: False  # True increases speed

    seed: null

    shuffle: True

    batch_size: 4

#### Verify custom_config.yaml

    tune validate custom_config.yaml

#### Create dummy_alpaca.json

    [

        {

        "instruction": "Dummy Instruction",

        "input": "Dummy Input",

        "output": "Dummy Output"

        }

    ]

#### Create my_dummy_dataset.py

    import json

    from torch.utils.data import Dataset

    class MyDummyDataset(Dataset):

        def __init__(self, tokenizer=None, data_file=None, packed=False):

            """

            tokenizer: 由 Torchtune 以位置引數 (positional arg) 傳入

            data_file: YAML 中指定的關鍵字引數

            """

            super().__init__()

            self.tokenizer = tokenizer

            self.packed = packed  # <- 關鍵字參數接收 "packed"

            # 如果 data_file 有指定，就載入資料

            if data_file is not None:

                with open(data_file, "r", encoding="utf-8") as f:

                    self.data = json.load(f)

            else:

                # 沒有提供檔案就給個空 list

                self.data = []

        def __len__(self):

            return len(self.data)

        def __getitem__(self, idx):

            example = self.data[idx]

            return {

                "instruction": example.get("instruction", ""),

                "input": example.get("input", ""),

                "output": example.get("output", ""),

            }

## **9. Single GPU Finetune LLM (Train)**

    tune run lora_finetune_single_device --config custom_config.yaml

## **10. Inference (QLoRA version)**

    python test.py

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alenzenx/windowseasyfintunellm

Awesome Lists containing this project

README