{"id":21423920,"url":"https://github.com/picovoice/llm-compression-benchmark","last_synced_at":"2025-07-14T08:31:40.717Z","repository":{"id":240600526,"uuid":"794648845","full_name":"Picovoice/llm-compression-benchmark","owner":"Picovoice","description":"LLM Compression Benchmark","archived":false,"fork":false,"pushed_at":"2025-07-08T17:39:01.000Z","size":14371,"stargazers_count":22,"open_issues_count":1,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-07-08T18:53:15.655Z","etag":null,"topics":["llm","llm-compression","llm-inference"],"latest_commit_sha":null,"homepage":"https://picovoice.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Picovoice.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-01T16:58:09.000Z","updated_at":"2025-07-08T17:38:36.000Z","dependencies_parsed_at":"2025-04-29T17:46:38.739Z","dependency_job_id":null,"html_url":"https://github.com/Picovoice/llm-compression-benchmark","commit_stats":null,"previous_names":["picovoice/llm-compression-benchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Picovoice/llm-compression-benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fllm-compression-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fllm-compression-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fllm-compression-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fllm-compression-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Picovoice","download_url":"https://codeload.github.com/Picovoice/llm-compression-benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fllm-compression-benchmark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265262609,"owners_count":23736428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","llm-compression","llm-inference"],"created_at":"2024-11-22T21:18:52.112Z","updated_at":"2025-07-14T08:31:35.705Z","avatar_url":"https://github.com/Picovoice.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Compression Benchmark\n\nMade in Vancouver, Canada by [Picovoice](https://picovoice.ai)\n\nThis repository is a minimalist and extensible framework for benchmarking LLM compression algorithms.\n\n## Table of Contents\n\n- [Algorithms](#algorithms)\n    - [GPTQ](#gptq)\n    - [picoLLM Compression](#picollm-compression)\n- [Tasks](#tasks)\n    - [MMLU Score](#mmlu-score)\n    - [ARC Score](#arc-score)\n    - [Perplexity Loss](#perplexity-loss)\n- [Data](#data)\n    - [MMLU](#mmlu)\n    - [ARC](#arc)\n    - [Perplexity (C4)](#perplexity-c4)\n    - [Quantization (C4)](#quantization-c4)\n- [Models](#models)\n- [Usage](#usage)\n- [Results](#results)\n    - [MMLU](#mmlu-1)\n    - [ARC-Easy](#arc-easy)\n    - [ARC-Challenge](#arc-challenge)\n    - [Perplexity](#perplexity)\n\n## Algorithms\n\n### GPTQ\n\n[GPTQ](https://arxiv.org/abs/2210.17323) is arguably the most popular quantization algorithm for LLMs. GPTQ fully\nreconstructs weights so that the quantized version closely mimics the full-precision one.\n\n### picoLLM Compression\n\npicoLLM Compression is Picovoice's in-house LLM compression algorithm. Given a target size, picoLLM optimally\ndistributes available bits within and across LLM's weights.\n\n## Tasks\n\n### MMLU Score\n\n[MMLU](https://huggingface.co/datasets/lukaemon/mmlu) (Massive Multitask Language Understanding) is a\nmultiple-choice dataset that can measure the models' ability to understand natural language.\n\n### ARC Score\n\n[ARC](https://allenai.org/data/arc) (AI2 Reasoning Challenge) is a multiple-choice dataset that measures\nthe models' reasoning ability. The ARC dataset has two partitions: `Easy` and `Challenge`. We perform the benchmark on\nboth partitions and report the results separately.\n\n### Perplexity Loss\n\nPerplexity measures the models' language modeling capabilities.\n\n## Data\n\nThe'/res' folder contains all required data for the benchmark. To reproduce it, follow the sections below.\n\n### MMLU\n\nDownload the [MMLU dataset](https://huggingface.co/datasets/lukaemon/mmlu) and run the following from the\nrepository's root to extract and format it:\n\n```console\npython3 data/mmlu.py --dataset-folder ${DATASET_FOLDER}\n```\n\n### ARC\n\nDownload the [ARC dataset](https://allenai.org/data/arc) and run the following from the repository's root to extract and\nformat the `Challenge` portion:\n\n```console\npython3 data/arc.py --dataset-folder ${DATASET_FOLDER}\n```\n\nPerform the above for the `Easy` portion:\n\n```console\npython3 data/arc.py --dataset-folder ${DATASET_FOLDER} --easy\n```\n\n### Perplexity (C4)\n\nFor the perplexity measurement, we use 128 randomly selected text snippets from the validation portion of the\n[C4 dataset](https://huggingface.co/datasets/c4). Once you download the dataset, run the following from the root of the\nrepository to extract and normalize the data:\n\n```console\npython3 data/c4-normalize.py \\\n--repository-folder ${REPOSITORY_FOLDER} \\\n--normalized-folder ${VALIDATION_FOLDER} \\\n--portion validation\n```\n\nReplace `${REPOSITORY_FOLDER}` with the path to the downloaded dataset repository and `${VALIDATION_FOLDER}` with a\nfolder to hold onto the normalized data.\n\nThen we sample 128 sequences from the normalized data:\n\n```console\npython3 data/c4-sample.py \\\n--dataset-folder ${VALIDATION_FOLDER} \\\n--portion valid\n```\n\n### Quantization (C4)\n\nWe need a sample dataset for quantization algorithms (GPTQ, picoLLM). We use 128 randomly selected text snippets from\nthe train portion of the [C4 dataset](https://huggingface.co/datasets/c4). Once you download the dataset, run the\nfollowing from the root of the repository to extract and normalize the data:\n\n```console\npython3 data/c4-normalize.py \\\n--repository-folder ${REPOSITORY_FOLDER} \\\n--normalized-folder ${TRAIN_FOLDER} \\\n--portion train\n```\n\nReplace `${REPOSITORY_FOLDER}` with the path to the downloaded dataset repository and `${TRAIN_FOLDER}` with a\nfolder to hold onto the normalized data.\n\nThen we sample 128 sequences from the normalized data:\n\n```console\npython3 data/c4-sample.py \\\n--dataset-folder ${TRAIN_FOLDER} \\\n--portion train\n```\n\n## Models\n\nWe use six models:\n\n- `Gemma-2b`\n- `Gemma-7b`\n- `Llama-2-7b`\n- `Llama-3-8b`\n- `Mistral-7b-v0.1`\n- `Phi-2`\n\nThe corresponding picoLLM compressed models are on [Picovoice Console](https://console.picovoice.ai/). We create GPTQ\nmodels using the package [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ). You can quantize the models by running the\nfollowing:\n\n```console\npython3 model/autogptq.py \\\n--model-uri ${MODEL_URI} \\\n--quantized-model-folder ${QUANTIZED_MODEL_FOLDER} \\\n--bits ${BITS}\n```\n\n## Usage\n\nTo measure the MMLU score for a given model, run the following:\n\n```console\npython3 mmlu.py \\\n--compression ${COMPRESSION} \\\n--model-uri ${MODEL_URI}\n```\n\nReplace `${COMPRESSION}` with the model's compression. i.e., `NONE` for full-precision models, `GPTQ,` or `picoLLM.`\n\nTo measure the ARC score for a given model, run the following:\n\n```console\npython3 arc.py \\\n--compression ${COMPRESSION} \\\n--model-uri ${MODEL_URI}\n```\n\nReplace `${COMPRESSION}` with the model's compression. i.e., `NONE` for full-precision models, `GPTQ,` or `picoLLM.`\n\nTo measure the perplexity for a given model, run the following:\n\n```console\npython3 perplexity.py \\\n--compression ${COMPRESSION} \\\n--model-uri ${MODEL_URI}\n```\n\nReplace `${COMPRESSION}` with the model's compression. i.e., `NONE` for full-precision models, `GPTQ,` or `picoLLM.`\n\nWhen running picoLLM Compressed models, you must also provide your Picovoice AccessKey, which is available on\n[Picovoice Console](https://console.picovoice.ai/).\n\n```console\n... --picollm-access-key ${PICOLLM_ACCESS_KEY}\n```\n\n## Results\n\nBelow are our benchmark results comparing GPTQ against picoLLM for all [models](model). We perform 2, 3, and 4-bit\nquantization using GPTQ, then find the model size in GB and set that as the target size for picoLLM Compression. Hence,\nboth models have the same size in terms of the number of bytes. When performing GPTQ, we set the group size parameter to\n128, set the damp percent to 0.1 and enabled activation reordering.\n\n### MMLU\n\nThe table below depicts the MMLU score of the original models.\n\n\u003ctable\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eModel\u003c/td\u003e\n    \u003ctd\u003eMMLU\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 5.0G\u003c/td\u003e\n    \u003ctd\u003e40.21\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 17.1G\u003c/td\u003e\n    \u003ctd\u003e64.48\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 16.1G\u003c/td\u003e\n    \u003ctd\u003e64.88\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 13.5G\u003c/td\u003e\n    \u003ctd\u003e46.38\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 15.0G\u003c/td\u003e\n    \u003ctd\u003e62.41\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 5.6G\u003c/td\u003e\n    \u003ctd\u003e56.04\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\nThe table below depicts the MMLU score of the quantized models.\n\n\u003ctable\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eModel\u003c/td\u003e\n    \u003ctd\u003eGPTQ\u003c/td\u003e\n    \u003ctd\u003epicoLLM\u003c/td\u003e\n  \u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGemma-2b 3.1G\u003c/td\u003e\n    \u003ctd\u003e39.07\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e41.12\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 2.9G\u003c/td\u003e\n    \u003ctd\u003e27.51\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e41.12\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 2.6G\u003c/td\u003e\n    \u003ctd\u003e24.93\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e41.12\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 7.2G\u003c/td\u003e\n    \u003ctd\u003e62.58\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e64.98\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 6.2G\u003c/td\u003e\n    \u003ctd\u003e53.30\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e64.57\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 5.2G\u003c/td\u003e\n    \u003ctd\u003e25.58\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e64.32\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 3.9G\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e45.26\u003c/strong\u003e\u003c/td\u003e\n    \u003ctd\u003e44.99\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 3.1G\u003c/td\u003e\n    \u003ctd\u003e40.40\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e40.68\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 2.3G\u003c/td\u003e\n    \u003ctd\u003e25.36\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e28.72\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 5.7G\u003c/td\u003e\n    \u003ctd\u003e63.09\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e64.96\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 4.9G\u003c/td\u003e\n    \u003ctd\u003e53.86\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e64.76\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 4.0G\u003c/td\u003e\n    \u003ctd\u003e25.05\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e61.26\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 4.2G\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e61.00\u003c/strong\u003e\u003c/td\u003e\n    \u003ctd\u003e59.19\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 3.3G\u003c/td\u003e\n    \u003ctd\u003e23.73\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e57.72\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 2.4G\u003c/td\u003e\n    \u003ctd\u003e25.70\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e43.53\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.8G\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e54.61\u003c/strong\u003e\u003c/td\u003e\n    \u003ctd\u003e54.11\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.5G\u003c/td\u003e\n    \u003ctd\u003e50.64\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e52.24\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.2G\u003c/td\u003e\n    \u003ctd\u003e26.05\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e48.86\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n### ARC Easy\n\nThe table below depicts the ARC Easy score of the original models.\n\n\u003ctable\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eModel\u003c/td\u003e\n    \u003ctd\u003eARC Easy\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 5.0G\u003c/td\u003e\n    \u003ctd\u003e33.75\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 17.1G\u003c/td\u003e\n    \u003ctd\u003e75.51\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 13.5G\u003c/td\u003e\n    \u003ctd\u003e44.87\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 16.1G\u003c/td\u003e\n    \u003ctd\u003e75.80\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 15.0G\u003c/td\u003e\n    \u003ctd\u003e80.56\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 5.6G\u003c/td\u003e\n    \u003ctd\u003e75.25\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\nThe table below depicts the ARC Easy score of the quantized models.\n\n\u003ctable\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eModel\u003c/td\u003e\n    \u003ctd\u003eGPTQ\u003c/td\u003e\n    \u003ctd\u003epicoLLM\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 3.1G\u003c/td\u003e\n    \u003ctd\u003e30.39\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e34.39\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 2.9G\u003c/td\u003e\n    \u003ctd\u003e24.37\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e34.39\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 2.6G\u003c/td\u003e\n    \u003ctd\u003e23.82\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e34.39\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 7.2G\u003c/td\u003e\n    \u003ctd\u003e76.52\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e84.18\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 6.2G\u003c/td\u003e\n    \u003ctd\u003e44.28\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e84.51\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 5.2G\u003c/td\u003e\n    \u003ctd\u003e23.95\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e84.13\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 3.9G\u003c/td\u003e\n    \u003ctd\u003e39.23\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e41.96\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 3.1G\u003c/td\u003e\n    \u003ctd\u003e32.95\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e33.96\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 2.3G\u003c/td\u003e\n    \u003ctd\u003e23.91\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e24.49\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 5.7G\u003c/td\u003e\n    \u003ctd\u003e72.85\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e78.83\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 4.9G\u003c/td\u003e\n    \u003ctd\u003e43.39\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e77.02\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 4.0G\u003c/td\u003e\n    \u003ctd\u003e24.71\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e71.76\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 4.2G\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e77.27\u003c/strong\u003e\u003c/td\u003e\n    \u003ctd\u003e73.95\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 3.3G\u003c/td\u003e\n    \u003ctd\u003e23.91\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e72.10\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 2.4G\u003c/td\u003e\n    \u003ctd\u003e24.92\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e46.46\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.8G\u003c/td\u003e\n    \u003ctd\u003e70.45\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e75.04\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.5G\u003c/td\u003e\n    \u003ctd\u003e56.61\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e70.66\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.2G\u003c/td\u003e\n    \u003ctd\u003e22.10\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e62.42\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n### ARC Challenge\n\nThe table below depicts the ARC Challenge score of the original models.\n\n\u003ctable\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eModel\u003c/td\u003e\n    \u003ctd\u003eARC Challenge\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 5.0G\u003c/td\u003e\n    \u003ctd\u003e30.38\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 17.1G\u003c/td\u003e\n    \u003ctd\u003e64.93\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 13.5G\u003c/td\u003e\n    \u003ctd\u003e37.03\u003c/td\u003e\n  \u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 16.1G\u003c/td\u003e\n    \u003ctd\u003e63.05\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 15.0G\u003c/td\u003e\n    \u003ctd\u003e67.49\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 5.6G\u003c/td\u003e\n    \u003ctd\u003e61.60\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\nThe table below depicts the ARC Challenge score of the quantized models.\n\n\u003ctable\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eModel\u003c/td\u003e\n    \u003ctd\u003eGPTQ\u003c/td\u003e\n    \u003ctd\u003epicoLLM\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 3.1G\u003c/td\u003e\n    \u003ctd\u003e26.37\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e30.97\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 2.9G\u003c/td\u003e\n    \u003ctd\u003e23.55\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e30.97\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 2.6G\u003c/td\u003e\n    \u003ctd\u003e24.83\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e30.97\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 7.2G\u003c/td\u003e\n    \u003ctd\u003e66.30\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e72.35\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 6.2G\u003c/td\u003e\n    \u003ctd\u003e33.62\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e72.35\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 5.2G\u003c/td\u003e\n    \u003ctd\u003e24.06\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e72.61\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 3.9G\u003c/td\u003e\n    \u003ctd\u003e32.42\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e34.30\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 3.1G\u003c/td\u003e\n    \u003ctd\u003e27.56\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e28.24\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 2.3G\u003c/td\u003e\n    \u003ctd\u003e21.16\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e23.63\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 5.7G\u003c/td\u003e\n    \u003ctd\u003e60.24\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e64.33\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 4.9G\u003c/td\u003e\n    \u003ctd\u003e36.18\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e63.48\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 4.0G\u003c/td\u003e\n    \u003ctd\u003e23.29\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e57.85\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 4.2G\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e64.42\u003c/strong\u003e\u003c/td\u003e\n    \u003ctd\u003e60.49\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 3.3G\u003c/td\u003e\n    \u003ctd\u003e24.06\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e59.04\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 2.4G\u003c/td\u003e\n    \u003ctd\u003e23.21\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e37.80\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.8G\u003c/td\u003e\n    \u003ctd\u003e57.42\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e62.46\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.5G\u003c/td\u003e\n    \u003ctd\u003e44.97\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e57.51\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.2G\u003c/td\u003e\n    \u003ctd\u003e24.49\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e47.87\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n### Perplexity\n\nThe table below depicts the perplexity of the original models.\n\n\u003ctable\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eModel\u003c/td\u003e\n    \u003ctd\u003ePerplexity\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 5.0G\u003c/td\u003e\n    \u003ctd\u003e16.79\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 17.1G\u003c/td\u003e\n    \u003ctd\u003e14.67\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 13.5G\u003c/td\u003e\n    \u003ctd\u003e8.40\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 16.1G\u003c/td\u003e\n    \u003ctd\u003e11.61\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 15.0G\u003c/td\u003e\n    \u003ctd\u003e10.50\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 5.6G\u003c/td\u003e\n    \u003ctd\u003e17.38\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\nThe table below depicts the perplexity of the quantized models.\n\n\u003ctable\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eModel\u003c/td\u003e\n    \u003ctd\u003eGPTQ\u003c/td\u003e\n    \u003ctd\u003epicoLLM\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 3.1G\u003c/td\u003e\n    \u003ctd\u003e17.85\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e16.86\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 2.9G\u003c/td\u003e\n    \u003ctd\u003e24.11\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e16.86\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-2b 2.6G\u003c/td\u003e\n    \u003ctd\u003e8377.74\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e16.86\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 7.2G\u003c/td\u003e\n    \u003ctd\u003e15.47\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e14.82\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 6.2G\u003c/td\u003e\n    \u003ctd\u003e27.29\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e14.84\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eGemma-7b 5.2G\u003c/td\u003e\n    \u003ctd\u003e33370970.40\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e15.08\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 3.9G\u003c/td\u003e\n    \u003ctd\u003e8.59\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e8.50\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 3.1G\u003c/td\u003e\n    \u003ctd\u003e9.66\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e8.86\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-2-7b 2.3G\u003c/td\u003e\n    \u003ctd\u003e67.43\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e10.87\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 5.7G\u003c/td\u003e\n    \u003ctd\u003e12.31\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e11.73\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 4.9G\u003c/td\u003e\n    \u003ctd\u003e17.47\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e11.90\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLlama-3-8b 4.0G\u003c/td\u003e\n    \u003ctd\u003e712.70\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e12.67\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 4.2G\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e10.43\u003c/strong\u003e\u003c/td\u003e\n    \u003ctd\u003e10.62\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 3.3G\u003c/td\u003e\n    \u003ctd\u003e2909.83\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e10.81\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMistral-7b-v0.1 2.4G\u003c/td\u003e\n    \u003ctd\u003e1176.43\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e14.87\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.8G\u003c/td\u003e\n    \u003ctd\u003e18.15\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e17.76\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.5G\u003c/td\u003e\n    \u003ctd\u003e19.94\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e18.14\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePhi-2 1.2G\u003c/td\u003e\n    \u003ctd\u003e76.55\u003c/td\u003e\n    \u003ctd\u003e\u003cstrong\u003e20.22\u003c/strong\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpicovoice%2Fllm-compression-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpicovoice%2Fllm-compression-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpicovoice%2Fllm-compression-benchmark/lists"}