https://github.com/kasnerz/quintd
Data-to-text generation with LLMs using ad-hoc datasets.
https://github.com/kasnerz/quintd
Last synced: 3 months ago
JSON representation
Data-to-text generation with LLMs using ad-hoc datasets.
- Host: GitHub
- URL: https://github.com/kasnerz/quintd
- Owner: kasnerz
- License: mit
- Created: 2024-01-18T18:46:45.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-25T12:12:10.000Z (9 months ago)
- Last Synced: 2024-10-12T10:15:55.706Z (8 months ago)
- Language: Python
- Homepage: https://d2t-llm.github.io
- Size: 5.6 MB
- Stars: 7
- Watchers: 4
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# โ quintd
Data-to-text generation with LLMs using ad-hoc datasets.Project website: https://d2t-llm.github.io
---
**๐ [NEW]** Here is our annotation framework as an open-source library: https://github.com/kasnerz/factgenie/---
## Datasets
Quintd is a tool for downloading structured data from five data-to-text generation tasks. Each task uses data from a distinct domain:
| Domain | Task | Task Description | Source | Format |
| ------------- | ------------- | ------------------------------------------------------- | -------------------------------------------- | -------- |
| ๐ฆ๏ธ Weather | `openweather` | Generating a weather forecast from weather data. | [OpenWeather](https://openweathermap.org) | JSON |
| ๐ฑ Technology | `gsmarena` | Describing a product based on its attributes. | [GSMArena](https://www.gsmarena.com) | JSON |
| ๐ Sport | `ice_hockey` | Describing an outcome of an ice-hockey game. | [RapidAPI](https://rapidapi.com) | JSON |
| ๐ Health | `owid` | Generating a caption for a time series. | [OurWorldInData](https://ourworldindata.org) | CSV |
| ๐ World facts | `wikidata` | Describing entities and relations in a knowledge graph. | [Wikidata](https://wikidata.org) | Markdown |We use the data to evaluate semantic accuracy of data-to-text generation outputs.
As the data comes *without reference outputs*, the evaluation is done using referenceless metrics (checking whether the outputs are grounded in the input data).
## Preliminaries
The code is tested with Python 3.10. Make sure to install the required packages first:
```bash
pip install -r requirements.txt
```## Project Overview
The code enables to replicate the experiments described in [our paper](https://arxiv.org/abs/2401.10186):
> Zdenฤk Kasner & Ondลej Duลกek: Beyond Traditional Benchmarks:
Analyzing Behaviors of Open LLMs on Data-to-Text Generation. In: Proceedings of the 62th Annual Meeting of the Association for Computational Linguistics (ACL 2024).. The experiments consist of three stages:
- **Collecting structured data** using Quintd.
- **Generating data-based reports** using LLMs.
- **Annotating outputs** using LLMs and human annotators.
## Collecting structured data
Quintd is a tool for downloading "ad hoc" test sets. The goal is to have evaluation data not included in the training data of the LLMs.
There are two ways that Quintd achieves that:
- downloading up-to-date data (for the weather forecasts, partially also for products and game results),
- selecting different examples based on the random seed.The recommended approach is to **download a fresh test set for each set of experiments**.
### How-to
Here is how to generate a new dataset with Quintd:
```bash
python data/generate_dataset.py -d [DOMAIN] -n [EXAMPLES] -r [SEED] -o [OUT_DIR]
```A basic setting which will generate a small dataset (10 examples per domain) with the random seed 7331:
```
NUM_EXAMPLES=10
SEED=7331python data/generate_dataset.py -n $NUM_EXAMPLES -r $SEED
```In our experiments, we generated one such dataset which we dubbed Quintd-1.
The dataset is available in [data/quintd-1/data](data/quintd-1/data).The following code will try to replicate the data collection for Quintd-1 (up to some differences in API responses):
```
python data/generate_dataset.py --replicate
```## Generating data-based reports
Data-to-text generation requires having access to a LLM through an API. The code we provide here is tested with:
- [text-generation-webui](https://github.com/oobabooga/text-generation-webui) (commit [d01c68f](https://github.com/oobabooga/text-generation-webui/commit/d01c68f2a3084b35b0faca799814feb4a92b2287)) - used for open LLMs.
- [OpenAI API](https://platform.openai.com/docs/overview) (as of 13/07/2024) - used for GPT-3.5.It should be fairly easy to connect the code with any other OpenAI-like API.
### How-to
Here is how to generate outputs for a particular domain, model, and setup:
```bash
python model/generate.py -d [DOMAIN] -s [SETUP] -m [MODEL] -p [DATASET_PATH] -a [API_URL]
```Example:
```bash
export TG_WEBUI_API_URL="http://tdll-8gpu1.ufal.hide.ms.mff.cuni.cz:5000"
python model/generate.py -d ice_hockey -s direct -m zephyr -p data/quintd-1 -a $TG_WEBUI_API_URL
```
You can also run the experiments with a single command:
```bash
python run_experiments.py
```See the config file [model/setups/direct.yaml](model/setups/direct.yaml) for the model prompt and other hyperparameters.
The generated outputs for Quintd-1 are available in [data/quintd-1/outputs](data/quintd-1/outputs).
## Annotating outputs
We published our framework for annotating errors in LLM outputs as a standalone project ๐
๐๏ธ **[factgenie](https://github.com/kasnerz/factgenie)** ๐๏ธ
We highly recommended your to use factgenie for your own experiments instead of the code in this repository.
### How to
Here is how to replicate the LLM annotation process (without factgenie, using only the code in the repository).For generating outputs for a particular domain, model, and setup:
```bash
python evaluation/evaluate.py -d [DOMAIN] -s [SETUP] -m [MODEL] -p [DATASET_PATH]
```
Example:
```bash
python evaluation/evaluate.py -d ice_hockey -s direct -m zephyr -p data/quintd-1
```
You can also run the evaluation with a single command:
```bash
python run_gpt4_eval.py
```
The error annotations for Quintd-1 are available in [data/quintd-1/annotations](data/quintd-1/annotations).Note that error annotation using GPT-4 requires access to the [OpenAI API](https://platform.openai.com/docs/api-reference).
## Code structure overview
- [data](data/) - **Quintd** data collection framework.
- [generate_dataset.py](data/generate_dataset.py) - A script for generating a new dataset & replicating the collection of Quintd-1.
- [data/quintd-1](data/quintd-1) - Resources for the **Quintd-1** dataset.
- [annotations](data/quintd-1/annotation) - Error annotations (GPT-4 / human).
- [data](data/quintd-1/data) - Data inputs.
- [outputs](data/quintd-1/outputs) - Generated model outputs.
- [run_experiments.py](run_experiments.py) - A wrapper code for running **text generation**.
- [run_gpt4_eval.py](run_gpt4_eval.py) - A wrapper code for running **GPT-4 evaluation**.## Cite us
If you build upon our work, please cite our ACL paper:
```
@inproceedings{kasner2024beyond,
title = "Beyond Traditional Benchmarks: Analyzing Behaviors of Open {LLM}s on Data-to-Text Generation",
author = "Kasner, Zden{\v{e}}k and
Du{\v{s}}ek, Ond{\v{r}}ej",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.651",
pages = "12045--12072",
}
```## Acknowledgements
This work was co-funded by the European Union (ERC, NG-NLG, 101039303).
![]()