An open API service indexing awesome lists of open source software.

https://github.com/amazon-science/idioms-incontext-mt

idioms in context dataset
https://github.com/amazon-science/idioms-incontext-mt

idiomatic-expressions llm-evaluation machine-translation

Last synced: 3 months ago
JSON representation

idioms in context dataset

Awesome Lists containing this project

README

        

## Idioms in Context Dataset

This repository contains the "Idioms in Context" dataset used in our ACL 2024 paper: [The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities](https://arxiv.org/abs/2405.20089).

### Description
The dataset consists of idiomatic expressions in context and their human-written translations. It covers 2 language pairs (English-German and English-Russian) with 3 translation directions:
1. English → German
2. German → English
3. Russian → English

The dataset is designed to evaluate the performance of large language models and machine translation systems in handling idiomatic expressions, which can be challenging due to their non-literal meanings.

### Usage
If you use this dataset in your work, please cite our paper:

```
@misc{stap2024-idioms,
title={The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities},
author={David Stap and Eva Hasler and Bill Byrne and Christof Monz and Ke Tran},
year={2024},
eprint={2405.20089},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.20089},
}
```

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License

This dataset is licensed under the CC-BY-NC-4.0 License.