Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/liyucheng09/llm-compressive
Longitudinal Evaluation of LLMs via Data Compression
https://github.com/liyucheng09/llm-compressive
benchmark evaluation llm llms nlp
Last synced: 2 months ago
JSON representation
Longitudinal Evaluation of LLMs via Data Compression
- Host: GitHub
- URL: https://github.com/liyucheng09/llm-compressive
- Owner: liyucheng09
- Created: 2023-12-26T21:37:15.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-29T02:33:30.000Z (7 months ago)
- Last Synced: 2024-05-29T16:06:03.838Z (7 months ago)
- Topics: benchmark, evaluation, llm, llms, nlp
- Language: Python
- Homepage: https://liyucheng09.github.io/llm-compressive/
- Size: 667 KB
- Stars: 19
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# LLM-Compressive: Longitudinal Evaluation of LLMs via Data Compression
Compression is believed to be the key feature of intelligence. Llm-compressive allows you to evaluate Large Language Models (LLMs) for generalization and robustness via **data compression**.
Llm-compressive tests LLMs with data compression on timeline, to understand how LLMs generalize over time.
For example, llm-compressive test open source LLMs on wikipedia across 83 months from 2017 to 2023.
**Mistral** and **Baichuan2** show steady performance across all time periods, indicating promissing generalization over time. In contrast, other models demonstrate linearly-worsen curves.
More results on coding, arxiv, news, image, and audio in the paper: [Evaluating Large Language Models for Generalization and Robustness via Data Compression
](https://arxiv.org/pdf/2402.00861.pdf).**Updates**:
- 27 Feb 2024, try the interactive leaderboard at [LLM-Compressive](https://liyucheng09.github.io/llm-compressive/).# Getting Started
0. Clone and install requirements.
```
git clone https://github.com/liyucheng09/llm-compressive.git
cd llm-compressive
pip install -r requirements.txt
```1. Run the main test script.
```
python main.py
```- `model_name`: the name of the model from HF Hub. See supported [models](#models).
- `dataset_name`: the name of the dataset. choose from `wikitext`, `math`, `bbc_news`, `code`, `arxiv`, `audio`, `bbc_image`.
- `save_path`: the path to save the results.
- `context_size`: the context size used for compression. choose from `2048`, `4096`, `8192`, `max_length`, or `stride`.
- `batch_size`: the batch size. This depends on the model scale and your GPU memory.**Attention!!**, if you need to use huggingface mirror (which means you have problem accessing huggingface.co directly), add `HF_ENDPOINT=https://hf-mirror.com` in your environment variables.
2. Aggregate the results.
```
python results/aggregate_all_results.py
```- `save_path`: the path you saved the results in.
3. Visualize the results.
```
python visualise/timeline_vis.py
```This will generate a figure visualizing the trend of models' compression rate over time.
```
python visualise/big_table.py
```This will 1) generate the big table in the paper; 2) generate a figure showing the performance-robustness trade-off of models (like the figure below).
see the explaination of the figure in the [paper](https://arxiv.org/pdf/2402.00861.pdf).
# Models
We have tested the following models:
- codellama/CodeLlama-7b-hf
- baichuan-inc/Baichuan2-7B-Base
- mistralai/Mistral-7B-v0.1
- huggyllama/llama-7b
- huggyllama/llama-13b
- huggyllama/llama-65b
- meta-llama/Llama-2-7b-hf
- meta-llama/Llama-2-13b-hf
- meta-llama/Llama-2-70b-hf
- Qwen/Qwen-7B
- internlm/internlm-7b
- THUDM/chatglm3-6b-base
- 01-ai/Yi-6B-200K
- 01-ai/Yi-34B-200K
- google/gemma-7b
- Qwen/Qwen1.5-7BAnd any GPTQ version of the above models, such as:
- TheBloke/CodeLlama-70B-hf-GPTQ
- TheBloke/Llama-2-70B-GPTQ
- TheBloke/Yi-34B-200K-GPTQ
- ...# Issues
send me emails or open issues if you have any questions.
# Citation
If you find this repo helpful, please consider citing our paper:
```
@article{Li2024EvaluatingLL,
title={Evaluating Large Language Models for Generalization and Robustness via Data Compression},
author={Yucheng Li and Yunhao Guo and Frank Guerin and Chenghua Lin},
year={2024},
journal={arXiv preprint arXiv:2402.00861}
}
```