Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
https://github.com/lyy1994/awesome-data-contamination
Last synced: about 9 hours ago
JSON representation
-
🤝 Acknowledgement
-
🛠️ Tools
-
-
📜 Papers
-
🎯 The List
- [paper
- [paper
- [paper
- [paper - crfm/helm)] [[website](https://crfm.stanford.edu/helm/classic/latest/)]
- [blog - arena)]
- [paper
- [paper
- [paper
- [paper
- [paper - memorization)]
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - NLP/DARG)]
- - green)![](https://img.shields.io/badge/Preventative-blue) <br />
- - green) <br />
- [paper
- [paper
- [paper - sys/llm-decontaminator)]
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - Lab/CLEVA)] [[website](http://www.lavicleva.com/)]
- [paper - travel-in-llms)]
- [paper
- [paper
- [paper
- [paper
- [paper - pretrain-code)] [[dataset](https://huggingface.co/datasets/swj0419/WikiMIA)] [[website](https://swj0419.github.io/detect-pretrain.github.io/)]
- [paper - lab/test_set_contamination)]
- [paper
- [paper
- [paper - zentroa/lm-contamination)] [[website](https://hitz-zentroa.github.io/lm-contamination/)]
- [paper
- [paper
- [paper
- [paper - Tabular-Memorization-Checker)]
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - llm.github.io/)]
- [paper
- [paper - Evolving-Benchmark)]
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - eval/evoeval)]
- [paper
- [paper
- [paper - plus-plus/)] [[code](https://github.com/zjysteven/mink-plus-plus)]
- [paper
- [paper - NLP/benbench)] [[website](https://gair-nlp.github.io/benbench/)]
- [paper
- [paper - KEG/DICE)]
- [paper
- [paper - COP)] [[data1](https://huggingface.co/datasets/avduarte333/BookTection)] [[data2](https://huggingface.co/datasets/avduarte333/arXivTection)]
- [paper
-
-
🧰 Resources
-
🛠️ Tools
- Language Model Evaluation Harness - evaluation-harness/blob/main/docs/decontamination.md)]
- LLM Decontaminator
-
📊 Datasets
-
Programming Languages
Categories
Sub Categories
Keywords