https://github.com/zjunlp/easyeval
An Easy-to-use Intelligence Evaluation Framework for LLMs.
https://github.com/zjunlp/easyeval
Last synced: 6 months ago
JSON representation
An Easy-to-use Intelligence Evaluation Framework for LLMs.
- Host: GitHub
- URL: https://github.com/zjunlp/easyeval
- Owner: zjunlp
- License: mit
- Created: 2023-09-01T08:50:14.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-16T08:27:35.000Z (over 2 years ago)
- Last Synced: 2025-07-11T00:04:05.249Z (7 months ago)
- Language: Python
- Homepage:
- Size: 162 KB
- Stars: 6
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# EasyEval
---
## 🔧Installation
**Installation for local development:**
```
git clone https://github.com/zjunlp/EasyEval
cd EasyEval
pip install -e .
```
**Installation using PyPI:**
```
pip install easyeval-tool
```
---
## 📌Use EasyEval
### FairEval
> `FairEval` is the class for two simple yet effective strategies,
> namely Multiple Evidence Calibration (MEC) and Balanced Position Calibration (BPC) to calibrate the positional bias of LLMs.
> Refer to the paper: [Large Language Models are not Fair Evaluators](https://link.zhihu.com/?target=https%3A//arxiv.org/pdf/2305.17926v1.pdf).
**Example**
**Step1: Provide the question json file for evaluation.** Here is an example of the data:
```json
{"question_id": 1, "text": "How can I improve my time management skills?"}
{"question_id": 2, "text": "What are the most effective ways to deal with stress?"}
```
**Step2: Provide the answer json files for evaluation.** Note that the question_id must be consistent with the question file. Here is an example of the data:
```json
{"question_id": 1, "text": "Here are some tips to improve your time management skills:\n\n1. Create a schedule: Make a to-do list for the day ..."}
{"question_id": 2, "text": "Here are some effective ways to deal with stress:\n\n1. Exercise regularly: Physical activity can help reduce stress and improve mood ..."}
```
**Step3: Evaluation**
```python
from EasyEval.eval import FairEval
# Declare a eval class
eval = FairEval(answer_file_list=["YOUR-ANSWER-FILE1", "YOUR-ANSWER-FILE2"], question_file="YOUR-QUESTION-FILE",
output="YOUR-OUTPUT-FILE", api_key="YOUR-KEY", eval_model='gpt-4', bpc=1, k=3)
# Get the result from LLM API service
eval.fair_eval()
```
---
## 🎉Contributors