{"id":25757610,"url":"https://github.com/hsj576/griffin","last_synced_at":"2025-05-13T00:12:13.262Z","repository":{"id":279359855,"uuid":"937889108","full_name":"hsj576/GRIFFIN","owner":"hsj576","description":"Official Implementation of \"GRIFFIN: Effective Token Alignment for Faster Speculative Decoding\"","archived":false,"fork":false,"pushed_at":"2025-05-12T06:57:32.000Z","size":12483,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-13T00:12:03.415Z","etag":null,"topics":["large-language-models","llm-inference","speculative-decoding"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2502.11018","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hsj576.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-24T04:30:50.000Z","updated_at":"2025-05-12T06:57:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"9e3d4c0e-99c2-496e-94b4-90ee2325fd22","html_url":"https://github.com/hsj576/GRIFFIN","commit_stats":null,"previous_names":["hsj576/griffin"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hsj576%2FGRIFFIN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hsj576%2FGRIFFIN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hsj576%2FGRIFFIN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hsj576%2FGRIFFIN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hsj576","download_url":"https://codeload.github.com/hsj576/GRIFFIN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253843219,"owners_count":21972874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["large-language-models","llm-inference","speculative-decoding"],"created_at":"2025-02-26T16:31:17.277Z","updated_at":"2025-05-13T00:12:13.249Z","avatar_url":"https://github.com/hsj576.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"figs/griffin-logo.png\" alt=\"GRIFFIN\" width=\"220\" align=\"left\"\u003e\u003cdiv align=\"center\"\u003e\u003ch1\u003e\u0026nbsp;GRIFFIN\u003c/h1\u003e\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n| \u003ca href=\"https://arxiv.org/abs/2502.11018\"\u003e\u003cb\u003ePaper (GRIFFIN)\u003c/b\u003e\u003c/a\u003e | \n\u003c/p\u003e\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Version-v1.0.0-orange.svg\" alt=\"Version\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://opensource.org/licenses/Apache-2.0\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-Apache_2.0-blue.svg\" alt=\"License\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/hsj576/GRIFFIN/issues\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Maintained%3F-yes-green.svg\" alt=\"Maintenance\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/hsj576/GRIFFIN/pulls\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Contributions-welcome-brightgreen.svg?style=flat\" alt=\"Contributions welcome\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n##\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./figs/griffin_benchmark_t=0.png\" alt=\"benchmark\" width=\"790\"\u003e\n\u003c/p\u003e\nSpeed up ratios of GRIFFIN when temperature = 0.\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./figs/griffin_benchmark_t=1.png\" alt=\"benchmark\" width=\"790\"\u003e\n\u003c/p\u003e\nSpeed up ratios of GRIFFIN when temperature = 1.\n\n## Overview \n\n**GRIFFIN** is a novel framework designed to address **token misalignment** in speculative decoding. This repository provides the implementation of GRIFFIN, including its token-alignable training strategy and token-alignable draft model.   \n\n- GRIFFIN is:\n  - **4.2x** faster than vanilla decoding.\n  - **1.3x** faster than EAGLE-2.\n\n\n\n### Acceleration demo of GRIFFIN for llama3-8B in a 4090GPU\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./figs/GRIFFIN-acceleration.gif\" alt=\"demogif\"\u003e\n\u003c/p\u003e\n\n## Update\n\n**2025.5.12**: We now support Qwen-2.\n\n**2025.2.24**: GRIFFIN v1.0 is released.\n\n## Setup \u0026 Installation\n\nTo set up the environment, follow these steps:  \n\n\n1. Clone the repository and navigate to the `GRIFFIN` directory:  \n```bash  \n   git clone https://github.com/hsj576/GRIFFIN.git\n   cd GRIFFIN  \n```\n\n2. Install the required dependencies:\n\n```bash   \npip install -r requirements.txt  \n```\n\n3. Update the paths in the code: Replace placeholders like `\"your-model-paths\"` and `\"your-datasets-path\"` with the actual paths to your models and datasets.\n\n## GRIFFIN Weights\n\n| Base Model  | GRIFFIN on Hugging Face | Base Model  | GRIFFIN on Hugging Face |\n|------|------|------|------|\n| Vicuna-7B-v1.5 | [husj576/GRIFFIN-Vicuna-7B-v1.5](https://huggingface.co/husj576/GRIFFIN-Vicuna-7B-v1.5) | LLaMA2-Chat 7B | [husj576/GRIFFIN-llama2-chat-7B](https://huggingface.co/husj576/GRIFFIN-llama2-chat-7B) |\n| LLaMA3-Instruct 8B | [husj576/GRIFFIN-llama3-instruct-8B](https://huggingface.co/husj576/GRIFFIN-llama3-instruct-8B) | LLaMA2-Chat 13B | [husj576/GRIFFIN-llama2-chat-13B](https://huggingface.co/husj576/GRIFFIN-llama2-chat-13B) |\n| LLaMA3-Instruct 70B | [husj576/GRIFFIN-llama3-instruct-70B](https://huggingface.co/husj576/GRIFFIN-llama3-instruct-70B) | Qwen2-Instruct 7B | [husj576/GRIFFIN-qwen2-instruct-7B](https://huggingface.co/husj576/GRIFFIN-qwen2-instruct-7B) |\n\n## Inference\nThe inference code we provide automatically allocates model weights (loading a model across multiple GPUs), allowing you to run models that exceed the memory of a single GPU.\n\nWe have provided a suggested web interface, which you can use by running the following command. After the model is fully loaded, a URL will be output in the terminal, which you can enter into your browser to access.\n```bash\npython -m application.webui --ea-model-path [path of GRIFFIN weight]\\ \n\t\t--base-model-path [path of the original model]\\\n\t\t--model-type [vicuna\\llama2\\llama3]\\\n        --total-token [int]\n```\nThe *total-token* is the number of draft tokens. For smaller models and advanced GPUs, this value can be set larger. Adjusting according to the specific device and model can achieve better results. \n\n## Training\n\nTo train GRIFFIN's token-alignable draft model, you first need to generate the training data and then proceed with the multi-step training process.\n\n### Generate Train Data\nYou can run the following command to generate the training data.\n```bash\npython -m ge_data.allocation --outdir [path of data]\n```\n### Train the Draft Model\n\nGRIFFIN's token-alignable training involves multiple training steps. For all steps beyond the first, the model is trained incrementally based on the checkpoints from the previous step. \n\n#### For train step 1\n\nRun the following command for the first training step:\n\n```bash\nexport PYTHONPATH=\"/your-GRIFFIN-path/GRIFFIN:$PYTHONPATH\"\n\naccelerate launch -m --mixed_precision=bf16 train.main_griffin_1 \\\n--tmpdir [path to training data] \\\n--cpdir [path to save checkpoints] \\\n--configpath [path to configuration file]  \n```\n#### For train step $j \\ge 2$\n\nFor subsequent training steps $j \\ge 2$, use the following command:\n\n```bash\nexport PYTHONPATH=\"/your-GRIFFIN-path/GRIFFIN:$PYTHONPATH\"\n\naccelerate launch -m --mixed_precision=bf16 train.main_griffin_2 \\\n--tmpdir [path to training data] \\\n--cpdir [path to save checkpoints] \\\n--configpath [path to configuration file] \\\n--forward_num_total j \\\n--griffinpath [path to previous GRIFFIN model checkpoint]  \n```\n\nExample configuration files can be found in the `GRIFFIN/train` directory.\n\n\n## Evaluation\nTo evaluate the performance and speed of GRIFFIN, use the provided scripts for different models. Run the following commands: \n```bash\n./scripts/llama3_test_8b.sh\n./scripts/llama3_test_70b.sh\n./scripts/llama2_test_7b.sh\n./scripts/llama2_test_13b.sh\n./scripts/vicuna_test_7b.sh\n./scripts/qwen2_test_7b.sh\n```\n\n## Reference\nFor technical details and full experimental results, please check [the paper of GRIFFIN](https://arxiv.org/abs/2502.11018).\n```\n@misc{hu2025griffineffectivetokenalignment,\n      title={GRIFFIN: Effective Token Alignment for Faster Speculative Decoding}, \n      author={Shijing Hu and Jingyang Li and Xingyu Xie and Zhihui Lu and Kim-Chuan Toh and Pan Zhou},\n      year={2025},\n      eprint={2502.11018},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2502.11018}, \n}\n```\n\n## Acknowledgements\n\nOur implementation is based on the opensource repository of [EAGLE](https://github.com/SafeAILab/EAGLE/tree/main). This project has been influenced by many excellent projects in the LLM community, such as [HASS](https://github.com/HArmonizedSS/HASS), [FSPAD](https://github.com/Luc4Gui/FSPAD), and others. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhsj576%2Fgriffin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhsj576%2Fgriffin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhsj576%2Fgriffin/lists"}