{"id":28384451,"url":"https://github.com/runpod-workers/llm-fine-tuning","last_synced_at":"2025-06-25T23:30:49.007Z","repository":{"id":278239690,"uuid":"914557829","full_name":"runpod-workers/llm-fine-tuning","owner":"runpod-workers","description":"Large Language model fine tuning on runpod serverless using axolotl. ","archived":false,"fork":false,"pushed_at":"2025-03-18T22:07:35.000Z","size":75,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-06T12:05:40.284Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/runpod-workers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-09T20:42:42.000Z","updated_at":"2025-05-29T18:46:00.000Z","dependencies_parsed_at":"2025-02-18T18:46:26.731Z","dependency_job_id":"701d15b7-6883-4422-8f97-4c63a0360d64","html_url":"https://github.com/runpod-workers/llm-fine-tuning","commit_stats":null,"previous_names":["runpod-workers/llm-fine-tuning"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/runpod-workers/llm-fine-tuning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fllm-fine-tuning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fllm-fine-tuning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fllm-fine-tuning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fllm-fine-tuning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/runpod-workers","download_url":"https://codeload.github.com/runpod-workers/llm-fine-tuning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod-workers%2Fllm-fine-tuning/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261972470,"owners_count":23238536,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-30T08:38:43.196Z","updated_at":"2025-06-25T23:30:48.998Z","avatar_url":"https://github.com/runpod-workers.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n\u003ch1\u003eLLM Training- Full finetune, LoRA, QLoRa etc. Llama/Mistral/Gemma\u003c/h1\u003e\n\n## RunPod Worker Images\n\nBelow is a summary of the available RunPod Worker images, categorized by image stability and CUDA version compatibility.\n\n| Preview Image Tag                  | Development Image Tag             |\n-----------------------------------|-----------------------------------|\n| `runpod/llm-finetuning:preview` | `runpod/llm-finetuning:dev` \n\n# Configuration Options\n\nThis document outlines all available configuration options for training models. The configuration can be provided as a JSON request.\n\n## Usage\n\nYou can use these configuration Options:\n\n1. As a JSON request body:\n```json\n{\n  \"input\": {\n    \"user_id\": \"user\",\n    \"model_id\": \"model-name\",\n    \"run_id\": \"run-id\",\n    \"credentials\": {\n      \"wandb_api_key\": \"\", # add your Weights \u0026 biases key. TODO:  you will be able to set this in Enviornment variables.\n      \"hf_token\": \"\", # add your HF_token. TODO:  you will be able to set this in Enviornment variables.\n    },\n    \"args\": {\n      \"base_model\": \"NousResearch/Llama-3.2-1B\",\n      // ... other options\n    }\n  }\n}\n```\n\n## Configuration Options\n\n### Model Configuration\n\n| Option | Description | Default |\n|--------|-------------|---------|\n| `base_model` | Path to the base model (local or HuggingFace) | Required |\n| `base_model_config` | Configuration path for the base model | Same as base_model |\n| `revision_of_model` | Specific model revision from HuggingFace hub | Latest |\n| `tokenizer_config` | Custom tokenizer configuration path | Optional |\n| `model_type` | Type of model to load | AutoModelForCausalLM |\n| `tokenizer_type` | Type of tokenizer to use | AutoTokenizer |\n| `hub_model_id` | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional |\n\n\n\n## Model Family Identification\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `is_falcon_derived_model` | `false` | Whether model is Falcon-based |\n| `is_llama_derived_model` | `false` | Whether model is LLaMA-based |\n| `is_qwen_derived_model` | `false` | Whether model is Qwen-based |\n| `is_mistral_derived_model` | `false` | Whether model is Mistral-based |\n\n## Model Configuration Overrides\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `overrides_of_model_config.rope_scaling.type` | `\"linear\"` | RoPE scaling type (linear/dynamic) |\n| `overrides_of_model_config.rope_scaling.factor` | `1.0` | RoPE scaling factor |\n\n### Model Loading Options\n\n| Option | Description | Default |\n|--------|-------------|---------|\n| `load_in_8bit` | Load model in 8-bit precision | false |\n| `load_in_4bit` | Load model in 4-bit precision | false |\n| `bf16` | Use bfloat16 precision | false |\n| `fp16` | Use float16 precision | false |\n| `tf32` | Use tensor float 32 precision | false |\n\n\n## Memory and Device Settings\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `gpu_memory_limit` | `\"20GiB\"` | GPU memory limit |\n| `lora_on_cpu` | `false` | Load LoRA on CPU |\n| `device_map` | `\"auto\"` | Device mapping strategy |\n| `max_memory` | `null` | Max memory per device |\n\n## Training Hyperparameters\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `gradient_accumulation_steps` | `1` | Gradient accumulation steps |\n| `micro_batch_size` | `2` | Batch size per GPU |\n| `eval_batch_size` | `null` | Evaluation batch size |\n| `num_epochs` | `4` | Number of training epochs |\n| `warmup_steps` | `100` | Warmup steps |\n| `warmup_ratio` | `0.05` | Warmup ratio |\n| `learning_rate` | `0.00003` | Learning rate |\n| `lr_quadratic_warmup` | `false` | Quadratic warmup |\n| `logging_steps` | `null` | Logging frequency |\n| `eval_steps` | `null` | Evaluation frequency |\n| `evals_per_epoch` | `null` | Evaluations per epoch |\n| `save_strategy` | `\"epoch\"` | Checkpoint saving strategy |\n| `save_steps` | `null` | Saving frequency |\n| `saves_per_epoch` | `null` | Saves per epoch |\n| `save_total_limit` | `null` | Maximum checkpoints to keep |\n| `max_steps` | `null` | Maximum training steps |\n\n### Dataset Configuration\n\n```yaml\ndatasets:\n  - path: vicgalle/alpaca-gpt4  # HuggingFace dataset or TODO: You will be able to add the local path. \n    type: alpaca               # Format type (alpaca, gpteacher, oasst, etc.)\n    ds_type: json             # Dataset type\n    data_files: path/to/data  # Source data files\n    train_on_split: train     # Dataset split to use\n```\n\n\n## Chat Template Settings\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `chat_template` | `\"tokenizer_default\"` | Chat template type |\n| `chat_template_jinja` | `null` | Custom Jinja template |\n| `default_system_message` | `\"You are a helpful assistant.\"` | Default system message |\n\n## Dataset Processing\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `dataset_prepared_path` | `\"data/last_run_prepared\"` | Path for prepared dataset |\n| `push_dataset_to_hub` | `\"\"` | Push dataset to HF hub |\n| `dataset_processes` | `4` | Number of preprocessing processes |\n| `dataset_keep_in_memory` | `false` | Keep dataset in memory |\n| `shuffle_merged_datasets` | `true` | Shuffle merged datasets |\n| `dataset_exact_deduplication` | `true` | Deduplicate datasets |\n\n## LoRA Configuration\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `adapter` | `\"lora\"` | Adapter type (lora/qlora) |\n| `lora_model_dir` | `\"\"` | Directory with pretrained LoRA |\n| `lora_r` | `8` | LoRA attention dimension |\n| `lora_alpha` | `16` | LoRA alpha parameter |\n| `lora_dropout` | `0.05` | LoRA dropout |\n| `lora_target_modules` | `[\"q_proj\", \"v_proj\"]` | Modules to apply LoRA |\n| `lora_target_linear` | `false` | Target all linear modules |\n| `peft_layers_to_transform` | `[]` | Layers to transform |\n| `lora_modules_to_save` | `[]` | Modules to save |\n| `lora_fan_in_fan_out` | `false` | Fan in/out structure |\n\n\n## Optimization Settings\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `train_on_inputs` | `false` | Train on input prompts |\n| `group_by_length` | `false` | Group by sequence length |\n| `gradient_checkpointing` | `false` | Use gradient checkpointing |\n| `early_stopping_patience` | `3` | Early stopping patience |\n\n## Learning Rate Scheduling\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `lr_scheduler` | `\"cosine\"` | Scheduler type |\n| `lr_scheduler_kwargs` | `{}` | Scheduler parameters |\n| `cosine_min_lr_ratio` | `null` | Minimum LR ratio |\n| `cosine_constant_lr_ratio` | `null` | Constant LR ratio |\n| `lr_div_factor` | `null` | LR division factor |\n\n## Optimizer Settings\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `optimizer` | `\"adamw_hf\"` | Optimizer choice |\n| `optim_args` | `{}` | Optimizer arguments |\n| `optim_target_modules` | `[]` | Target modules |\n| `weight_decay` | `null` | Weight decay |\n| `adam_beta1` | `null` | Adam beta1 |\n| `adam_beta2` | `null` | Adam beta2 |\n| `adam_epsilon` | `null` | Adam epsilon |\n| `max_grad_norm` | `null` | Gradient clipping |\n\n## Attention Implementations\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `flash_optimum` | `false` | Use better transformers |\n| `xformers_attention` | `false` | Use xformers |\n| `flash_attention` | `false` | Use flash attention |\n| `flash_attn_cross_entropy` | `false` | Flash attention cross entropy |\n| `flash_attn_rms_norm` | `false` | Flash attention RMS norm |\n| `flash_attn_fuse_qkv` | `false` | Fuse QKV operations |\n| `flash_attn_fuse_mlp` | `false` | Fuse MLP operations |\n| `sdp_attention` | `false` | Use scaled dot product |\n| `s2_attention` | `false` | Use shifted sparse attention |\n\n\n## Tokenizer Modifications\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `special_tokens` | - | Special tokens to add/modify |\n| `tokens` | `[]` | Additional tokens |\n\n## Distributed Training\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `fsdp` | `null` | FSDP configuration |\n| `fsdp_config` | `null` | FSDP config options |\n| `deepspeed` | `null` | Deepspeed config path |\n| `ddp_timeout` | `null` | DDP timeout |\n| `ddp_bucket_cap_mb` | `null` | DDP bucket capacity |\n| `ddp_broadcast_buffers` | `null` | DDP broadcast buffers |\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch3\u003eExample Configuration Request:\u003c/h3\u003e\u003c/summary\u003e\n\nHere's a complete example for fine-tuning a LLaMA model using LoRA:\n\n```json\n{\n  \"input\": {\n    \"user_id\": \"user\",\n    \"model_id\": \"llama-test\",\n    \"run_id\": \"test-run\",\n    \"credentials\": {\n      \"wandb_api_key\": \"\",\n      \"hf_token\": \"\"\n    },\n    \"args\": {\n      \"base_model\": \"NousResearch/Llama-3.2-1B\",\n      \"load_in_8bit\": false,\n      \"load_in_4bit\": false,\n      \"strict\": false,\n      \"datasets\": [\n        {\n          \"path\": \"teknium/GPT4-LLM-Cleaned\",\n          \"type\": \"alpaca\"\n        }\n      ],\n      \"dataset_prepared_path\": \"last_run_prepared\",\n      \"val_set_size\": 0.1,\n      \"output_dir\": \"./outputs/lora-out\",\n      \"adapter\": \"lora\",\n      \"sequence_len\": 2048,\n      \"sample_packing\": true,\n      \"eval_sample_packing\": true,\n      \"pad_to_sequence_len\": true,\n      \"lora_r\": 16,\n      \"lora_alpha\": 32,\n      \"lora_dropout\": 0.05,\n      \"lora_target_modules\": [\n        \"gate_proj\",\n        \"down_proj\",\n        \"up_proj\",\n        \"q_proj\",\n        \"v_proj\",\n        \"k_proj\",\n        \"o_proj\"\n      ],\n      \"gradient_accumulation_steps\": 2,\n      \"micro_batch_size\": 2,\n      \"num_epochs\": 1,\n      \"optimizer\": \"adamw_8bit\",\n      \"lr_scheduler\": \"cosine\",\n      \"learning_rate\": 0.0002,\n      \"train_on_inputs\": false,\n      \"group_by_length\": false,\n      \"bf16\": \"auto\",\n      \"tf32\": false,\n      \"gradient_checkpointing\": true,\n      \"logging_steps\": 1,\n      \"flash_attention\": true,\n      \"loss_watchdog_threshold\": 5,\n      \"loss_watchdog_patience\": 3,\n      \"warmup_steps\": 10,\n      \"evals_per_epoch\": 4,\n      \"saves_per_epoch\": 1,\n      \"weight_decay\": 0,\n      \"hub_model_id\": \"runpod/llama-fr-lora\",\n      \"wandb_name\": \"test-run-1\",\n      \"wandb_project\": \"test-run-1\",\n      \"wandb_entity\": \"axo-test\",\n      \"special_tokens\": {\n        \"pad_token\": \"\u003c|end_of_text|\u003e\"\n      }\n    }\n  }\n}\n```\n\u003c/details\u003e\n\n### Advanced Features\n\n#### Wandb Integration\n- `wandb_project`: Project name for Weights \u0026 Biases\n- `wandb_entity`: Team name in W\u0026B\n- `wandb_watch`: Monitor model with W\u0026B\n- `wandb_name`: Name of the W\u0026B run\n- `wandb_run_id`: ID for the W\u0026B run\n\n\n\n#### Performance Optimization\n- `sample_packing`: Enable efficient sequence packing\n- `eval_sample_packing`: Use sequence packing during evaluation\n- `torch_compile`: Enable PyTorch 2.0 compilation\n- `flash_attention`: Use Flash Attention implementation\n- `xformers_attention`: Use xFormers attention implementation\n\n### Available Optimizers\n\nThe following optimizers are supported:\n\n- `adamw_hf`: HuggingFace's AdamW implementation\n- `adamw_torch`: PyTorch's AdamW\n- `adamw_torch_fused`: Fused AdamW implementation\n- `adamw_torch_xla`: XLA-optimized AdamW\n- `adamw_apex_fused`: NVIDIA Apex fused AdamW\n- `adafactor`: Adafactor optimizer\n- `adamw_anyprecision`: Anyprecision AdamW\n- `adamw_bnb_8bit`: 8-bit AdamW from bitsandbytes\n- `lion_8bit`: 8-bit Lion optimizer\n- `lion_32bit`: 32-bit Lion optimizer\n- `sgd`: Stochastic Gradient Descent\n- `adagrad`: Adagrad optimizer\n\n\n\n## Notes\n\n- Set `load_in_8bit: true` or `load_in_4bit: true` for memory-efficient training\n- Enable `flash_attention: true` for faster training on modern GPUs\n- Use `gradient_checkpointing: true` to reduce memory usage\n- Adjust `micro_batch_size` and `gradient_accumulation_steps` based on your GPU memory\n\nFor more detailed information, please refer to the [documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config.html).\n\n\n### Errors: \n- if you face any issues with the Flash Attention-2, Delete yoor worker and Re-start.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunpod-workers%2Fllm-fine-tuning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frunpod-workers%2Fllm-fine-tuning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunpod-workers%2Fllm-fine-tuning/lists"}