Projects in Awesome Lists tagged with data-contamination
A curated list of projects in awesome lists tagged with data-contamination .
https://github.com/mravanelli/pyspeechrev
This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.
convolution data-contamination distant-speech-recognition impulse-response speech-recognition speech-reverberation
Last synced: 27 Jul 2025
https://github.com/nlx-group/overlapy
Python package developed to evaluate textual overlap (N-Grams) between two volumes of text.
data-contamination nlp textual-analysis
Last synced: 14 Jan 2026
https://github.com/thu-keg/dice
DICE: Detecting In-distribution Data Contamination with LLM's Internal State
benchmark data-contamination fine-tuning-llm gsm8k llm sft
Last synced: 13 May 2025
https://github.com/auraoneai/contamination-audit
Local contamination checks for eval data overlap, hashes, and n-gram leakage.
ai-evaluation data-contamination evals leakage
Last synced: 28 May 2026