https://github.com/zjunlp/easyeval

An Easy-to-use Intelligence Evaluation Framework for LLMs.
https://github.com/zjunlp/easyeval

Last synced: 6 months ago
JSON representation

An Easy-to-use Intelligence Evaluation Framework for LLMs.

Host: GitHub
URL: https://github.com/zjunlp/easyeval
Owner: zjunlp
License: mit
Created: 2023-09-01T08:50:14.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-10-16T08:27:35.000Z (over 2 years ago)
Last Synced: 2025-07-11T00:04:05.249Z (7 months ago)
Language: Python
Homepage:
Size: 162 KB
Stars: 6
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # EasyEval

---

## 🔧Installation

**Installation for local development:**

```

git clone https://github.com/zjunlp/EasyEval

cd EasyEval

pip install -e .

```

**Installation using PyPI:**

```

pip install easyeval-tool

```

---

## 📌Use EasyEval

### FairEval

> `FairEval` is the class for  two simple yet effective strategies,

> namely Multiple Evidence Calibration (MEC) and Balanced Position Calibration (BPC) to calibrate the positional bias of LLMs.

> Refer to the paper: [Large Language Models are not Fair Evaluators](https://link.zhihu.com/?target=https%3A//arxiv.org/pdf/2305.17926v1.pdf).

**Example**

**Step1: Provide the question json file for evaluation.** Here is an example of the data:

```json

{"question_id": 1, "text": "How can I improve my time management skills?"}

{"question_id": 2, "text": "What are the most effective ways to deal with stress?"}

```

**Step2: Provide the answer json files for evaluation.** Note that the question_id  must be consistent with the question file. Here is an example of the data:

```json

{"question_id": 1, "text": "Here are some tips to improve your time management skills:\n\n1. Create a schedule: Make a to-do list for the day ..."}

{"question_id": 2, "text": "Here are some effective ways to deal with stress:\n\n1. Exercise regularly: Physical activity can help reduce stress and improve mood ..."}

```

**Step3: Evaluation**

```python

from EasyEval.eval import FairEval

# Declare a eval class

eval = FairEval(answer_file_list=["YOUR-ANSWER-FILE1", "YOUR-ANSWER-FILE2"], question_file="YOUR-QUESTION-FILE",

                output="YOUR-OUTPUT-FILE", api_key="YOUR-KEY", eval_model='gpt-4', bpc=1, k=3)

# Get the result from LLM API service

eval.fair_eval()

```

---

## 🎉Contributors

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zjunlp/easyeval

Awesome Lists containing this project

README