Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/LiuYuWei/llm_model_evaluation
LLM Model Evaluation for tmmluplus datasets
https://github.com/LiuYuWei/llm_model_evaluation
Last synced: 3 days ago
JSON representation
LLM Model Evaluation for tmmluplus datasets
- Host: GitHub
- URL: https://github.com/LiuYuWei/llm_model_evaluation
- Owner: LiuYuWei
- License: apache-2.0
- Created: 2024-01-09T01:07:37.000Z (almost 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-23T02:44:35.000Z (12 months ago)
- Last Synced: 2024-01-23T08:58:09.814Z (12 months ago)
- Language: Jupyter Notebook
- Size: 80.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome_ai_agents - Llm_Model_Evaluation - LLM Model Evaluation for tmmluplus datasets (Building / Datasets)
- awesome_ai_agents - Llm_Model_Evaluation - LLM Model Evaluation for tmmluplus datasets (Building / Datasets)
README
# llm_model_evaluation
## Description
Use python script to do LLM Model Evaluation.## Support Dataset
### I. mmlu dataset
- Introduction from paper with code:
[Paper-with-code](https://paperswithcode.com/dataset/mmlu)### II. tmmluplus dataset
- Introduction:
[Medium Article](https://medium.com/infuseai/tmmluplus-dataset-brief-introduction-ecfd00297838)- huggingface dataset:
[Huggingface Dataset](https://huggingface.co/datasets/ikala/tmmluplus)## How to use it?
- Step 1: please download the model from huggingface
The following command line is the example of mistral-7B-v0.1 model:
```bash
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-v0.1
```- Step 2: Please arrange the dataset from tmmluplus data folder to data_arrange folder.
- Step 3: Please run the following code to predict the result:
```bash
python3 evaluation_hf_testing.py \
--model ./models/llama2-7b-hf \
--data_dir ./llm_evaluation_tmmluplus/data_arrange/ \
--save_dir ./llm_evaluation_tmmluplus/results/
```- Step 4: Please run the evaluation code to get the output json file.
```
!python /content/llm_model_evaluation/catogories_result_eval.py \
--catogory "mmlu" \
--model ./models/llama2-7b-hf \
--save_dir "./results/results_llama2-7b-hf"
```## The example google colab code
- mmlu dataset:
1. [Google Colab - mmlu](https://colab.research.google.com/github/LiuYuWei/llm_model_evaluation/blob/main/llm_evaluation_mmlu.ipynb)
2. [Google Colab - mmlu in phi-2 model](https://colab.research.google.com/github/LiuYuWei/llm_model_evaluation/blob/main/llm_evaluation_mmlu_phi_2.ipynb) [Colab free tier can use this Google Colab example]- tmmluplus dataset:
1. [Google Colab - tmmluplus](https://colab.research.google.com/github/LiuYuWei/llm_model_evaluation/blob/main/llm_evaluation_tmmluplus.ipynb)## Evaluation Result
- mmlu dataset:
| 模型 | Weighted Accuracy | STEM | humanities | social sciences | other | Inference Time(s) |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| Mistral-7B-v0.1 | 0.6254094858282296 | 0.5251822398939695 | 0.5636556854410202 | 0.7357816054598635 | 0.703578038247995 | 15624.038010835648 |- tmmluplus dataset:
| 模型 | Weighted Accuracy | STEM | humanities | social sciences | other | Inference Time(s) |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| Mistral-7B-v0.1 | - | - | - | - | - | - |