{"id":18064300,"url":"https://github.com/changwoolee/blast","last_synced_at":"2025-04-11T18:10:20.650Z","repository":{"id":260161094,"uuid":"864201479","full_name":"changwoolee/BLAST","owner":"changwoolee","description":"[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference","archived":false,"fork":false,"pushed_at":"2024-11-06T02:08:33.000Z","size":1502,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T14:04:46.165Z","etag":null,"topics":["efficient-inference","large-language-models","llama","matrix-factorization","matrix-multiplication","model-compression"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/changwoolee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-27T17:24:51.000Z","updated_at":"2025-03-04T19:21:20.000Z","dependencies_parsed_at":"2024-11-06T03:01:47.064Z","dependency_job_id":null,"html_url":"https://github.com/changwoolee/BLAST","commit_stats":null,"previous_names":["changwoolee/blast"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/changwoolee%2FBLAST","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/changwoolee%2FBLAST/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/changwoolee%2FBLAST/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/changwoolee%2FBLAST/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/changwoolee","download_url":"https://codeload.github.com/changwoolee/BLAST/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248456365,"owners_count":21106603,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["efficient-inference","large-language-models","llama","matrix-factorization","matrix-multiplication","model-compression"],"created_at":"2024-10-31T06:05:24.955Z","updated_at":"2025-04-11T18:10:20.607Z","avatar_url":"https://github.com/changwoolee.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n \n# BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference\n\n**[Changwoo Lee](http://changwoolee.github.io), [Soo Min Kwon](https://soominkwon.github.io), [Qing Qu](https://qingqu.engin.umich.edu), and [Hun-Seok Kim](https://kim.engin.umich.edu)**\n\nUniversity of Michigan\n\n\u003cimg src=\"https://github.com/changwoolee/BLAST/blob/main/imgs/blast.png?raw=true\" alt=\"blast\" width=\"200\"/\u003e\n\n**[[Paper](https://arxiv.org/abs/2410.21262)]**\n\n\u003c/div\u003e\n\n## Notice\nThis repo is being actively updated.\n* [Blast-Llama-4B](https://huggingface.co/cwoolee/blast-llama-4B) is now available on Hugging Face! 🤗 \n* [arXiv](https://arxiv.org/abs/2410.21262) version is available!\n* The paper is accepted to NeurIPS 2024.\n\n## Dependencies\n\nThe packages can be installed via `conda env create --file environment.yml`.\n\nAdditionally, install `lm-evaluation-harness` with BLAST implementation via \n```bash\ncd lm-evaluation-harness\npip install -e .\n```\n\n## Blast-Llama-4B Model\n\nBlast-Llama-4B is a Llama-7B model compressed by 50% via the procedure described below.\nThe model can be loaded using `transformers` library.\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"huggyllama/llama-7b\")\nmodel = AutoModelForCausalLM.from_pretrained(\"cwoolee/blast-llama-4B\", trust_remote_code=True)\n```\n\n## Llama Decompsotion\n\nRun `bash ./scripts/decompose_llama.sh 0-31`.\n\n## Blast-Llama Retraining\nRun `bash ./scripts/train_blast.sh`. The script assumes that 4 gpus are available.\n\nWe re-trained the compressed Llama model for 400 steps on a subset of SlimPajama dataset available at [here](https://huggingface.co/datasets/DKYoon/SlimPajama-6B).\n\n## Evaluation using `lm-evaluation-harness`\nRun `bash scripts/lm-eval-blast.sh`.\n\n\n## Acklowledgment\n\nThis repo is highly inspired by [huggingface/transformers](https://github.com/huggingface/transformers/tree/main) and [EleutherAI/lm-evaluation-harness\n](https://github.com/EleutherAI/lm-evaluation-harness).\n\n## Citation\n\nPlease cite our paper if you find this repo or our paper useful\n```\n@inproceedings{\n    lee2024blast,\n    title={{BLAST}: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference},\n    author={Lee, Changwoo and Kwon, Soo Min and Qu, Qing and Kim, Hun-Seok},\n    booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},\n    year={2024},\n}\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchangwoolee%2Fblast","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchangwoolee%2Fblast","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchangwoolee%2Fblast/lists"}