An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-contamination

A curated list of projects in awesome lists tagged with data-contamination .

https://github.com/mravanelli/pyspeechrev

This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.

convolution data-contamination distant-speech-recognition impulse-response speech-recognition speech-reverberation

Last synced: 27 Jul 2025

https://github.com/nlx-group/overlapy

Python package developed to evaluate textual overlap (N-Grams) between two volumes of text.

data-contamination nlp textual-analysis

Last synced: 14 Jan 2026

https://github.com/thu-keg/dice

DICE: Detecting In-distribution Data Contamination with LLM's Internal State

benchmark data-contamination fine-tuning-llm gsm8k llm sft

Last synced: 13 May 2025

https://github.com/auraoneai/contamination-audit

Local contamination checks for eval data overlap, hashes, and n-gram leakage.

ai-evaluation data-contamination evals leakage

Last synced: 28 May 2026