{"id":21389451,"url":"https://github.com/feifeibear/llmroofline","last_synced_at":"2025-07-13T15:33:14.168Z","repository":{"id":227069759,"uuid":"770345229","full_name":"feifeibear/LLMRoofline","owner":"feifeibear","description":"Compare different hardware platforms via the Roofline Model for LLM inference tasks.","archived":false,"fork":false,"pushed_at":"2024-03-13T02:29:22.000Z","size":3250,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-03-13T13:38:21.459Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feifeibear.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-11T11:47:15.000Z","updated_at":"2024-03-13T11:27:09.000Z","dependencies_parsed_at":"2024-03-11T13:45:55.294Z","dependency_job_id":null,"html_url":"https://github.com/feifeibear/LLMRoofline","commit_stats":null,"previous_names":["feifeibear/llmhardwarecompare"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FLLMRoofline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FLLMRoofline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FLLMRoofline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FLLMRoofline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feifeibear","download_url":"https://codeload.github.com/feifeibear/LLMRoofline/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225896453,"owners_count":17541499,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T12:26:36.579Z","updated_at":"2024-11-22T12:26:37.138Z","avatar_url":"https://github.com/feifeibear.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A Simplified Roofline Model for Comparing Hardware Performance on LLM Inference Tasks\n\nThis repository provides scripts that leverage the roofline model to compare the performance of Large Language Model (LLM) inference tasks across various hardware platforms.\n\nIn LLM inference tasks, several factors can influence performance on different hardware. These factors include:\n\n- Batch Size: The batch size usually refers to the number of requests forward at the same time.\n\n- KV Cache Length: In Transformer models, the KV cache length refers to the length of the cache used to store key-value pairs.\n\n- Hidden Size: The hidden size refers to the dimension of the hidden layers in the model.\n\n- Intermediate Size: The intermediate size refers to the dimension of the fully connected layers in the model.\n\n- GQA: the group number of the Grouped Query Attention.\n\n- MoE : Mixture of Experts configurations.\n\n\nFor Mixtral-8X7B MoE, A800 is always worse than H20.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./media/mistral7b_moe.png\" alt=\"The speedup of A800/H20 on Mistral-8x7B\" width=\"50%\"\u003e\n\u003c/p\u003e\n\nWe present the speedup between two hardware platforms in a mesh and compare our naive roofline model with a more complex one from [LLM-Viewer](https://github.com/hahnyuan/LLM-Viewer). \nAlthough the two models are not exactly the same, they exhibit similar distribution patterns.\nFor instance:\n\nFor LLAMA2 70B, in some scenerios, A800 is better than H20 on, especially for large batch sizes.\n\n![The speedup of A800/H20 on LLAMA2 70B](./media/llama2_70b.png)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeifeibear%2Fllmroofline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeifeibear%2Fllmroofline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeifeibear%2Fllmroofline/lists"}