{"id":17989468,"url":"https://github.com/evilfreelancer/benchmarking-llms","last_synced_at":"2025-03-25T23:30:44.274Z","repository":{"id":188530601,"uuid":"678924392","full_name":"EvilFreelancer/benchmarking-llms","owner":"EvilFreelancer","description":"Comprehensive benchmarks and evaluations of Large Language Models (LLMs) with a focus on hardware usage, generation speed, and memory requirements.","archived":false,"fork":false,"pushed_at":"2023-08-31T07:28:15.000Z","size":31,"stargazers_count":12,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-17T20:40:04.515Z","etag":null,"topics":["benchmark","llama","llm","mgpt","mpt","rugpt"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EvilFreelancer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-15T17:44:40.000Z","updated_at":"2024-06-20T18:16:20.000Z","dependencies_parsed_at":"2024-08-03T07:59:16.237Z","dependency_job_id":null,"html_url":"https://github.com/EvilFreelancer/benchmarking-llms","commit_stats":{"total_commits":10,"total_committers":1,"mean_commits":10.0,"dds":0.0,"last_synced_commit":"15fe12504f5ae527ac7eccc74292a1a911a90ff8"},"previous_names":["evilfreelancer/benchmarking-llms"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fbenchmarking-llms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fbenchmarking-llms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fbenchmarking-llms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fbenchmarking-llms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EvilFreelancer","download_url":"https://codeload.github.com/EvilFreelancer/benchmarking-llms/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245561115,"owners_count":20635677,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","llama","llm","mgpt","mpt","rugpt"],"created_at":"2024-10-29T19:14:45.612Z","updated_at":"2025-03-25T23:30:42.131Z","avatar_url":"https://github.com/EvilFreelancer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Benchmarking Large Language Models (LLMs)\n\nThis comparison evaluates various large language models (LLMs) based on their hardware usage,\nnumber of parameters, and context size.\n\nTest Environment:\n\n* Graphics Card: RTX 4090 24Gb\n* CUDA Version: 11.7 (for ruGPT3 family) and 11.8 (for other models)\n* Python Version: 3.11.4\n\nNote:\n\n* I was unable to test the 13B models due to GPU memory limitations.\n* I have not been granted access to test LLaMA 2 yet.\n\n## Testing Prompts\n\nFor my tests, I evaluated how models responded to prompts about the birthdate of the famous poet, Alexander Sergeevich\nPushkin. I employed diverse prompts in various languages and transliterations to ensure a comprehensive evaluation. This\nmethod was inspired by the model testing approach for mGPT 1.3B, as demonstrated in\nthis [example notebook](https://github.com/ai-forever/mgpt/blob/main/notebooks/mgpt_huggingface_generation_example.ipynb).\n\n## Evaluation Parameters\n\nTo maintain consistency in my evaluations, I used the following generation parameters:\n\n* dtype: float16 (LLaMA), bfloat16 (MPT), 8bit (Saiga-2, ruGPT-3.5)\n* Maximum new tokens: 1024\n* Top-k: 20\n* Top-p: 0.9\n* Repetition Penalty: 1.1\n* Sampling: Enabled\n* Caching: Disabled\n\nI chose these parameters to:\n\n* Determine the model's verbosity.\n* Measure its generation speed.\n* Most crucially, understand its memory requirements.\n\nThrough my testing, I discovered that performing CUDA cache clearance `torch.cuda.empty_cache()` results\nin a reduction of generation speed, averaging between 15-25%.\n\n## Results\n\nThe table provides a detailed comparison and performance metrics of various large language models (LLMs).\n\n| Name                                                                              | Size           | Context | MAX VRAM (Gb) | MAX Init RAM (Gb) | AVG GenTime (s) | AVG Tokens | AVG t/s |\n|-----------------------------------------------------------------------------------|----------------|---------|---------------|-------------------|-----------------|------------|---------|\n| [StableBeluga 7b](https://huggingface.co/stabilityai/StableBeluga-7B)             | 7b             | 4096    | ~22.5         | ~22.7             | ~31.25          | ~529.7     | ~16.9   |\n| [LLaMA 7b](https://huggingface.co/huggyllama/llama-7b)                            | 7b             | 4096    | ~22.47        | ~22.7             | ~34.52          | ~545.5     | ~15.8   |\n| [LLaMA 2 7b](https://huggingface.co/meta-llama/Llama-2-7b-hf)                     | 7b             | 4096    | ~22.78        | ~22.7             | ~63.99          | ~768.6     | ~12.0   | \n| [LLaMA 2 7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)           | 7b-chat        | 4096    | ~22.51        | ~21.8             | ~17.38          | ~356.8     | ~20.5   |\n| [LLaMA 2 7b 32k](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K)          | 7b-32k         | 32768   | ~21.5         | ~22.7             | ~56.63          | ~868.5     | ~15.3   | \n| [LLaMA 2 13b](https://huggingface.co/meta-llama/Llama-2-13b-hf)                   | 13b            | 4096    |               | ~30.7             |                 |            |         |\n| [LLaMA 2 13b-chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)         | 13b-chat       | 4096    |               |                   |                 |            |         |\n| [MosaicML 7b](https://huggingface.co/mosaicml/mpt-7b)                             | 7b             | 8192    | ~22.6         | ~9.8              | ~87.27          | ~1046.2    | ~12.0   |\n| [MosaicML 7b-storywriter](https://huggingface.co/mosaicml/mpt-7b-storywriter)     | 7b-storywriter | 65536   | ~22.9         | ~10.4             | ~109.12         | ~1048.2    | ~9.6    |\n| [MosaicML 7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct)           | 7b-instruct    | 4096    | ~22.93        | ~9.8              | ~110.47         | ~1045.2    | ~9.5    |\n| [MosaicML 7b-instruct-8k](https://huggingface.co/mosaicml/mpt-7b-instruct-8k)     | 7b-instruct-8k | 8192    | ~22.66        | ~10.5             | ~84.32          | ~1045.5    | ~12.4   |\n| [Saiga 2 LoRa 7b](https://huggingface.co/IlyaGusev/saiga2_7b_lora)                | 7b_lora        | 2048    | ~7.9          | ~8.9              | ~13.34          | ~86.1      | ~6.5    |\n| [Saiga 2 LoRa 13b](https://huggingface.co/IlyaGusev/saiga2_13b_lora)              | 13b_lora       | 2048    | ~14.25        | ~8.3              | ~35.36          | ~171.5     | ~4.9    |\n| [ruGPT 3 small](https://huggingface.co/ai-forever/rugpt3small_based_on_gpt2)      | 125m           | 2048    | ~6.18         | ~1.3              | ~6.4            | ~1041.8    | ~162.7  |\n| [ruGPT 3 medium](https://huggingface.co/ai-forever/rugpt3medium_based_on_gpt2)    | 410m           | 2048    | ~6.66         | ~2.6              | ~12.74          | ~1044.3    | ~82.0   |\n| [ruGPT 3 large](https://huggingface.co/ai-forever/rugpt3large_based_on_gpt2)      | 750m           | 2048    | ~7.48         | ~5.2              | ~15.19          | ~1045.5    | ~68.8   |\n| [ruGPT 3 xl](https://huggingface.co/ai-forever/rugpt3xl)                          | 1.3B           | 2048    | ~13.76        | ~4.7              | ~13.38          | ~567.1     | ~42.4   |\n| [ruGPT 3.5 13b](https://huggingface.co/ai-forever/ruGPT-3.5-13B)                  | 13b            | 2048    |               |                   |                 |            |         |\n| [ruGPT-3.5 13b (load_in_8bit)](https://huggingface.co/pe4enov/ruGPT-3.5-13B-8bit) | 13b-8bit       | 2048    | ~14.41        | ~11.2             | ~187.96         | ~1043.7    | ~5.6    |\n| [ruGPT-3.5 13b-8bit](https://huggingface.co/Gaivoronsky/ruGPT-3.5-13B-8bit)       | 13b-8bit (q8)  | 2048    | ~14.96        | ~25.4             | ~666.14         | ~1042.1    | ~1.5    |\n| [ruGPT-3.5 13b-8bit](https://huggingface.co/Gaivoronsky/ruGPT-3.5-13B-fp16)       | 13b-fp16       | 2048    |               | ~57.9             |                 |            |         |\n| [mGPT](https://huggingface.co/ai-forever/mGPT)                                    | 1.3b           | 2048    | ~22.96        | ~7.01             | ~24.72          | ~1046.8    | ~42.3   |\n| [mGPT 13b](https://huggingface.co/ai-forever/mGPT-13B)                            | 13b            | 2048    |               |                   |                 |            |         |\n| [mGPT 13b (load_in_8bit)](https://huggingface.co/ai-forever/mGPT-13B)             | 13b-8bit       | 2048    | ~20.06        | ~12.5             | ~155.81         | ~1042.5    | ~6.7    |\n| [Qwen VL](https://huggingface.co/Qwen/Qwen-VL)                                    | 7B             | 8192    | ~22.6         | ~5.6              | ~93.62          | ~1034.7    | ~11.1   |\n| [Qwen VL Chat](https://huggingface.co/Qwen/Qwen-VL-Chat)                          | 7B-chat        | 8192    | ~22.6         | ~5.3              | ~95.21          | ~1037.7    | ~10.9   |\n| [Qwen 7B](https://huggingface.co/Qwen/Qwen-7B)                                    | 7B             | 8192    | ~17.32        | ~3.6              | ~89.84          | ~1037.6    | ~11.5   |\n| [Qwen 7B Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)                          | 7B-chat        | 8192    | ~18.22        | ~3.3              | ~92.76          | ~944.8     | ~10.2   |\n| [Qwen 7B Chat q4](https://huggingface.co/Qwen/Qwen-7B-Chat-Int4)                  | 7B-chat-int4   | 8192    | ~7.38         | ~9.26             | ~77.22          | ~930.8     | ~12.1   |\n\n* **Name** - The name of the large language model (LLM), often hyperlinked to its source or documentation.\n* **Size** - The number of parameters the model has, typically represented in billions (b) or other units.\n* **Context** - The maximum number of tokens the model can consider from previous inputs in a conversation or text\n  sequence.\n* **MAX VRAM (Gb)** - The maximum amount of Video RAM (in gigabytes) required to run the model.\n* **MAX Init RAM (Gb)** - The maximum amount of system RAM (in gigabytes) used during the model's initialization.\n* **AVG GenTime (s)** - The average time (in seconds) it takes for the model to generate a response or complete a given\n  task.\n* **AVG Tokens** - The average number of tokens generated by the model in its responses or outputs.\n* **AVG t/s** - The average number of tokens generated by the model per second.\n\n## Scripts\n\n* **llama.py** - A script to test LLaMA and LLaMA 2 models and model based on them.\n* **mpt.py** - A script to test MosaicML models.\n* **rugpt.py** - A script to test ruGPT3small, ruGPT3medium, ruGPT3large and mGPT.\n* **rugpt3xl.py** - A script to test ruGPT3XL only.\n    * Dockerfile - A Dockerfile to run rugpt3xl.py in a container.\n    * docker-compose.yml - A docker-compose file to run rugpt3xl.py in a container.\n    * requirements-xl.txt - A list of Python packages required to run rugpt3xl.py in a container.\n\n# Links\n\n* My Telegram channel: https://t.me/evilfreelancer\n* Salute AI Community Telegram channel: https://t.me/SaluteTechGroup\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilfreelancer%2Fbenchmarking-llms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevilfreelancer%2Fbenchmarking-llms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilfreelancer%2Fbenchmarking-llms/lists"}