https://github.com/da03/epanadiplosis_benchmark

Benchmarking the performance of various language models in generating epanadiplosis, i.e., generating sentences that start with and end with the same word.
https://github.com/da03/epanadiplosis_benchmark

nlp

Last synced: 11 months ago
JSON representation

Benchmarking the performance of various language models in generating epanadiplosis, i.e., generating sentences that start with and end with the same word.

Host: GitHub
URL: https://github.com/da03/epanadiplosis_benchmark
Owner: da03
Created: 2023-03-17T20:28:14.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-03-18T00:20:13.000Z (over 3 years ago)
Last Synced: 2025-03-17T19:11:40.622Z (over 1 year ago)
Topics: nlp
Language: Python
Homepage:
Size: 197 KB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Epanadiplosis Benchmark

![ChatGPT Demo](./epanadiplosis_chatgpt.png)
![GPT-4 Demo](./epanadiplosis_gpt4.png)

Benchmarking the performance of various language models in generating [epanadiplosis](https://en.wiktionary.org/wiki/epanadiplosis), i.e., generating sentences that start with and end with the same word. Below is an example from Philippians 4:4:

```
Rejoice in the Lord always: and again I say, Rejoice.
```

## Dependencies

Install the required dependencies with:

```
pip install -r requirements.txt
```

## Usage

First, make sure that your OpenAI API key is set as an environment variable `API_KEY`:

```
export AI_KEY="my_api_key"
```

Then, run the script using the following command:

```
python main.py --api_key ${API_KEY} --num_generations 100
```

## Results

| Model | Success Rate↑ (%) | Repetitivenses↓ (%) |
|------------------|--------------------|----------------------|
| code-davinci-002 | 4 | 63 |
| text-davinci-002 | 18 | 89 |
| text-davinci-003 | 43 | 67 |
| gpt-3.5-turbo | 22 | 63 |
| gpt-4 | 92 | 98 |

Evaluation metrics:

* Success rate: The proportion of generated sentences that demonstrate epanadiplosis (i.e., using the same word as both the first and last word in the sentence).
* Repetitiveness: The proportion of first words (counting from the second generation) that have been used in previous generations. For example, if all generated sentences start with the same word "Dreams", then this number will be 100%.

The raw generations corresponding to the above results are included in [evaluation_results.json](./evaluation_results.json) as part of the repo. Note that each run will result in different generations since we cannot set the random seed using the OpenAI API.

## Prompt

The prompt used to obtain the results in this benchmark is:

```
Write a sentence with at least 10 words that begins with and ends with the same word:
```

The requirement of "at least 10 words" helps ensure that the generated sentences are of a reasonable length. Without this constraint, some models tend to produce short and ungrammatical phrases.

## Acknolegements

This repository has been inspired by Yao Fu and Litu Ou's [Chain-of-Thought Hub: Measuring LLMs' Reasoning Performance](https://github.com/FranxYao/chain-of-thought-hub).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/da03/epanadiplosis_benchmark

Awesome Lists containing this project

README