{"id":13606509,"url":"https://github.com/arielnlee/Platypus","last_synced_at":"2025-04-12T08:31:24.470Z","repository":{"id":177464543,"uuid":"658571084","full_name":"arielnlee/Platypus","owner":"arielnlee","description":"Code for fine-tuning Platypus fam LLMs using LoRA","archived":false,"fork":false,"pushed_at":"2024-02-04T09:50:42.000Z","size":1358,"stargazers_count":622,"open_issues_count":16,"forks_count":60,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-11-07T11:44:26.526Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arielnlee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-26T04:27:07.000Z","updated_at":"2024-10-28T23:15:20.000Z","dependencies_parsed_at":"2024-08-01T19:53:31.372Z","dependency_job_id":null,"html_url":"https://github.com/arielnlee/Platypus","commit_stats":null,"previous_names":["arielnlee/platypus-30b","arielnlee/platypus"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arielnlee%2FPlatypus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arielnlee%2FPlatypus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arielnlee%2FPlatypus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arielnlee%2FPlatypus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arielnlee","download_url":"https://codeload.github.com/arielnlee/Platypus/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248539827,"owners_count":21121239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:01:09.774Z","updated_at":"2025-04-12T08:31:19.461Z","avatar_url":"https://github.com/arielnlee.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","4. Fine-Tuning","GitHub projects"],"sub_categories":["大语言对话模型及数据","Frameworks"],"readme":"# Platypus: Quick, Cheap, and Powerful Refinement of LLMs (https://platypus-llm.github.io)\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./assets/Best_Platty.png\" alt=\"Platypus\" width=\"300\"/\u003e\n\u003c/p\u003e\n\nThe Platypus models are a series of fine-tuned and merged variants based on the LLaMA and LLaMa-2 transformer architectures. Platypus takes advantage of [LoRA](https://arxiv.org/pdf/2106.09685.pdf) and [PEFT](https://github.com/huggingface/peft). \n\nAll models and dataset available via HuggingFace: [`garage-bAInd`](https://huggingface.co/garage-bAInd)\n\n## Updates\n\n**8/21/23**: If you're fine-tuning LLaMa-2 7B, please add `bf16=True` and change `fp16=False` in the HF trainer. LLaMa-1 7B works as is. **This only applies to LLaMa-2 7B.** Additionally, if you are using 1 GPU, please change `ddp_find_unused_paramters=False` in the HF trainer. We will be updating the fine-tuning script to handle these changes automatically. \n\n**8/14/23**: We have cleaned up our pipeline and added data refinement and similarity code. Within in the next few days we'll have a script to reproduce our exact dataset from 11 open-source datasets.\n\n**8/13/23**: An unquantized GPU chatbot of OpenOrca-Platypus2-13B, our most recent collab, is available via Hugging Face spaces, courtesy of OpenOrca: [Chat now!](https://huggingface.co/spaces/Open-Orca/OpenOrca-Platypus2-13B)\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./assets/orca_platty.jpeg\" alt=\"Platypus\" width=\"120\"/\u003e\n\u003c/p\u003e\n\n**8/11/23**: Our [paper](https://arxiv.org/abs/2308.07317) and [project website](https://platypus-llm.github.io) have been released!\n\n## CLI \n\n[Fastchat](https://github.com/lm-sys/FastChat) provides a simple setup for those interested in running the model. After downloading the model through HuggingFace, clone the Fastchat repository:\n\n```\ngit clone https://github.com/lm-sys/FastChat.git\ncd FastChat\n```\n\nDownload the required packages:\n\n```\npip3 install --upgrade pip  # enable PEP 660 support\npip3 install -e .\n```\n\nFinally, run the following:\n\n```\npython3 -m fastchat.serve.cli --model-path garage-bAInd/Platypus-30B --conv_template alpaca\n```\n\n## Local Setup\n\nThis repository is multi-GPU friendly, and provides code to use model or data parellelism, depending on your computational resources. \n\n1. Install dependencies\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n2. Be sure to use these exact requirements or you may run into model saving or OOM issues.\n\n## Fine-tuning (`finetune.py`)\n\nRun `fine-tuning.sh`.\n\nNote: The script above uses `torchrun` for data parallelism. PyTorch is not in `requirements.txt` since technically you can run fine-tuning without it (after a few minor changes to the .py file). To use `fine-tuning.sh`, please install [PyTorch](https://pytorch.org/get-started/locally/). We recommend using `torchrun` and PyTorch 2.0+ for speed + `torch.compile`. If you do not install pytorch, or use an alternative method like `accelerate launch`, please take time to comment out any torch related lines in the scirpts.\n\nHyperparameters used to fine-tune Platypus:\n\n| Hyperparameter      | Value 13B / 70B  |\n|---------------------|--------|\n| learning rate       | 4e-4 / 3e-4   |\n| batch size          | 16     |\n| microbatch  size    | 1      |\n| warmup steps        | 100    |\n| epochs              | 1      |\n| weight decay        | 0.     |\n| lr scheduler        | cosine |\n| lora alpha          | 16     |\n| lora rank           | 16     |\n| lora dropout        | 0.05   |\n| lora target modules | gate_proj, up_proj, down_proj|\n| cutoff length       | 4096   |\n| train on inputs     | False  |\n| group by length     | False  |\n| add eos token       | False  |\n\nExample for how to calcualte gradient accumulation steps using 2 GPUs: = global_batch_size / micro_batch_size / num_gpus = 16 / 1 / 2 = 8.\n\nIf your model **cannot** fit on the memory of each GPU, please use the alternative fine-tuning option below (or utilize accelerate, FDSP, etc.) to take advantage of model parallelism. A good alternative to torchrun is accelerate. \n\n```bash\npython finetune.py \\\n    --base_model meta-llama/Llama-2-70b-hf \\\n    --data-path ./final_data.json \\\n    --output_dir ./llama2-platypus-70b \\\n    --batch_size 16 \\\n    --micro_batch_size 1 \\\n    --num_epochs 1 \\\n    --learning_rate 0.0003 \\\n    --cutoff_len 4096 \\\n    --val_set_size 0 \\\n    --lora_r 16 \\\n    --lora_alpha 16 \\\n    --lora_dropout 0.05 \\\n    --lora_target_modules '[gate_proj, down_proj, up_proj]' \\\n    --train_on_inputs False \\\n    --add_eos_token False \\\n    --group_by_length False \\\n    --prompt_template_name alpaca \\\n    --lr_scheduler 'cosine' \\\n    --warmup_steps 100\n```\n\n## Merging\n\nOnce you've completed a fine-tuning, use `merge.sh` to merge the LoRA weights back into the base LLaMa model (or base model of your choice) for export to HuggingFace format.\n\nWhile we are experimenting on better and alternative ways to merge (stay tuned!), our current merging process relies on the basic linear merge provided by PEFT. Before we fine-tune, we search for possible models to merge with and the datasets used to create them (to the best of our ability). The success of our LoRA merges stems from using the right data. Our most successful merges have little to no overlap in fine-tuning data. For example, GPlatty-30B is a merge of Platypus-30B and gpt4-alpaca-lora-30b. We saw a 2% jump in accuracy for GPlatty, and the datasets used to fine-tune the aforementioned two LoRA-based models had very low similarity scores. Please see [our paper](https://arxiv.org/abs/2308.07317) for additional information. \n\n**NOTE:** If you encounter any errors while merging, please try uninstalling bitsandbytes and peft, then reinstalling with the newest versions (peft should always be installed from source).\n\n## Dataset Refinement\n\nWe used keyword search to find STEM and logic questions in the 11 open-source datasets that make up [Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus). Then, to remove duplicates and redundancy, we perform a cosine similarity check of the questions using SentenceTransformers embeddings. Lastly, we do a similarity check to remove any questions from our training set that are too similiar to the test set.\n\nYou can access all of the related code in the `data_pipeline` folder of this repo.\n\n## Reproducing Benchmark Eval Results\nInstall LM Evaluation Harness:\n```\ngit clone https://github.com/EleutherAI/lm-evaluation-harness\ncd lm-evaluation-harness\ngit checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463 # The commit used by the Open LLM Leaderboard\npip install -e .\n```\nEach task was evaluated on a single A100 80GB GPU for 13B, and 2 A100s for 70B.\n\nARC:\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-13B,use_accelerate=True --tasks arc_challenge --batch_size 2 --no_cache --write_out --output_path results/Platypus-13B/arc_challenge_25shot.json --device cuda --num_fewshot 25\n```\n\nHellaSwag:\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-13B,use_accelerate=True --tasks hellaswag --batch_size 2 --no_cache --write_out --output_path results/Platypus-13B/hellaswag_10shot.json --device cuda --num_fewshot 10\n```\n\nMMLU:\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-13B,use_accelerate=True --tasks hendrycksTest-* --batch_size 2 --no_cache --write_out --output_path results/Platypus-13B/mmlu_5shot.json --device cuda --num_fewshot 5\n```\n\nTruthfulQA:\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-13B,use_accelerate=True --tasks truthfulqa_mc --batch_size 2 --no_cache --write_out --output_path results/Platypus-13B/truthfulqa_0shot.json --device cuda\n```\n## Inference for Adapters (`inference.py`)\n\nThis a basic example script for running inference directly using fine-tuned adapters and/or local data. The current version reads data from a csv file. You can easily edit this to pull from HF or use a json file. Please make any necessary edits before using this script (it assumes alpaca formatting).\n\n## BibTeX\n\n```\n@article{platypus2023,\n    title={Platypus: Quick, Cheap, and Powerful Refinement of LLMs}, \n    author={Ariel N. Lee and Cole J. Hunter and Nataniel Ruiz},\n    booktitle={arXiv preprint arxiv:2308.07317},\n    year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farielnlee%2FPlatypus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farielnlee%2FPlatypus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farielnlee%2FPlatypus/lists"}