https://github.com/LiuYuWei/llm_model_evaluation

LLM Model Evaluation for tmmluplus datasets
https://github.com/LiuYuWei/llm_model_evaluation

Last synced: 6 months ago
JSON representation

LLM Model Evaluation for tmmluplus datasets

Host: GitHub
URL: https://github.com/LiuYuWei/llm_model_evaluation
Owner: LiuYuWei
License: apache-2.0
Created: 2024-01-09T01:07:37.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-01-23T02:44:35.000Z (over 1 year ago)
Last Synced: 2024-01-23T08:58:09.814Z (over 1 year ago)
Language: Jupyter Notebook
Size: 80.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome_ai_agents - Llm_Model_Evaluation - LLM Model Evaluation for tmmluplus datasets (Building / Datasets)
awesome_ai_agents - Llm_Model_Evaluation - LLM Model Evaluation for tmmluplus datasets (Building / Datasets)

README

        # llm_model_evaluation

## Description

Use python script to do LLM Model Evaluation.

## Support Dataset

### I. mmlu dataset

- Introduction from paper with code:

[Paper-with-code](https://paperswithcode.com/dataset/mmlu)

### II. tmmluplus dataset

- Introduction:

[Medium Article](https://medium.com/infuseai/tmmluplus-dataset-brief-introduction-ecfd00297838)

- huggingface dataset:

[Huggingface Dataset](https://huggingface.co/datasets/ikala/tmmluplus)

## How to use it?

- Step 1: please download the model from huggingface

The following command line is the example of mistral-7B-v0.1 model:

```bash

git lfs install

git clone https://huggingface.co/mistralai/Mistral-7B-v0.1

```

- Step 2: Please arrange the dataset from tmmluplus data folder to data_arrange folder.

- Step 3: Please run the following code to predict the result:

```bash

python3 evaluation_hf_testing.py \

    --model ./models/llama2-7b-hf \

    --data_dir ./llm_evaluation_tmmluplus/data_arrange/ \

    --save_dir ./llm_evaluation_tmmluplus/results/

```

- Step 4: Please run the evaluation code to get the output json file.

```

!python /content/llm_model_evaluation/catogories_result_eval.py \

    --catogory "mmlu" \

    --model ./models/llama2-7b-hf \

    --save_dir "./results/results_llama2-7b-hf"

```

## The example google colab code

- mmlu dataset:

1. [Google Colab - mmlu](https://colab.research.google.com/github/LiuYuWei/llm_model_evaluation/blob/main/llm_evaluation_mmlu.ipynb)

2. [Google Colab - mmlu in phi-2 model](https://colab.research.google.com/github/LiuYuWei/llm_model_evaluation/blob/main/llm_evaluation_mmlu_phi_2.ipynb) [Colab free tier can use this Google Colab example]

- tmmluplus dataset: 

1. [Google Colab - tmmluplus](https://colab.research.google.com/github/LiuYuWei/llm_model_evaluation/blob/main/llm_evaluation_tmmluplus.ipynb)

## Evaluation Result

- mmlu dataset:

| 模型 | Weighted Accuracy | STEM | humanities | social sciences | other | Inference Time(s) |

|  ----  |  ----  |  ----  |  ----  |  ----  |  ----  |  ----  |

| Mistral-7B-v0.1 | 0.6254094858282296 | 0.5251822398939695 | 0.5636556854410202 | 0.7357816054598635 | 0.703578038247995 | 15624.038010835648 |

- tmmluplus dataset:

| 模型 | Weighted Accuracy | STEM | humanities | social sciences | other | Inference Time(s) |

|  ----  |  ----  |  ----  |  ----  |  ----  |  ----  |  ----  |

| Mistral-7B-v0.1 | - | - | - | - | - | - |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/LiuYuWei/llm_model_evaluation

Awesome Lists containing this project

README