An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-selection

A curated list of projects in awesome lists tagged with data-selection .

https://github.com/princeton-nlp/less

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

data data-selection influence instruction-tuning llama llm mistral

Last synced: 05 Apr 2025

https://github.com/alon-albalak/data-selection-survey

A Survey on Data Selection for Language Models

data-selection language-model llm survey

Last synced: 04 Mar 2026

https://github.com/reds-lab/projektor

This is an official repository for "Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources" (NeurIPS 2023).

data-selection performance-prediction projection scaling-law

Last synced: 05 Apr 2025

https://github.com/zincware/znnl

A Python package for studying neural learning

data-science data-selection machinelearning mathematics physics

Last synced: 09 Aug 2025

https://github.com/allo-media/cynical-selection

Allo-media data selection tool

data-selection language-model nlp

Last synced: 08 May 2025

https://github.com/4ai/generative_deduplication

Code for Generative Deduplication For Socia Media Data Selection (Findings of EMNLP 2024)

data-selection deduplication emnlp2024 generative-deduplication nlp

Last synced: 23 Apr 2025

https://github.com/bessouat40/pdf-region-picker

A project to select only part of a PDF file. It's usefull when you want to extract informations with some python library like fitz.

data-extraction data-selection extract-data fitz javascript parsing pdf region-picker

Last synced: 06 Mar 2025

https://github.com/tigureis/data-preparation-from-kickstarter-campaigns

Kickstarter Data Prep: A hands-on guide to basic data cleaning and transformation.

data-cleaning data-construction data-integration data-science data-selection numpy pandas

Last synced: 19 Apr 2026