https://github.com/amazon-science/idioms-incontext-mt
idioms in context dataset
https://github.com/amazon-science/idioms-incontext-mt
idiomatic-expressions llm-evaluation machine-translation
Last synced: 3 months ago
JSON representation
idioms in context dataset
- Host: GitHub
- URL: https://github.com/amazon-science/idioms-incontext-mt
- Owner: amazon-science
- License: other
- Created: 2024-08-06T06:37:38.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-06T12:29:18.000Z (10 months ago)
- Last Synced: 2025-01-12T01:33:36.351Z (5 months ago)
- Topics: idiomatic-expressions, llm-evaluation, machine-translation
- Homepage:
- Size: 225 KB
- Stars: 2
- Watchers: 9
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
## Idioms in Context Dataset
This repository contains the "Idioms in Context" dataset used in our ACL 2024 paper: [The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities](https://arxiv.org/abs/2405.20089).
### Description
The dataset consists of idiomatic expressions in context and their human-written translations. It covers 2 language pairs (English-German and English-Russian) with 3 translation directions:
1. English → German
2. German → English
3. Russian → EnglishThe dataset is designed to evaluate the performance of large language models and machine translation systems in handling idiomatic expressions, which can be challenging due to their non-literal meanings.
### Usage
If you use this dataset in your work, please cite our paper:```
@misc{stap2024-idioms,
title={The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities},
author={David Stap and Eva Hasler and Bill Byrne and Christof Monz and Ke Tran},
year={2024},
eprint={2405.20089},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.20089},
}
```## Security
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
## License
This dataset is licensed under the CC-BY-NC-4.0 License.