Projects in Awesome Lists tagged with data-selection
A curated list of projects in awesome lists tagged with data-selection .
https://github.com/princeton-nlp/less
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
data data-selection influence instruction-tuning llama llm mistral
Last synced: 05 Apr 2025
https://github.com/alon-albalak/data-selection-survey
A Survey on Data Selection for Language Models
data-selection language-model llm survey
Last synced: 04 Mar 2026
https://github.com/reds-lab/projektor
This is an official repository for "Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources" (NeurIPS 2023).
data-selection performance-prediction projection scaling-law
Last synced: 05 Apr 2025
https://github.com/zincware/znnl
A Python package for studying neural learning
data-science data-selection machinelearning mathematics physics
Last synced: 09 Aug 2025
https://github.com/allo-media/cynical-selection
Allo-media data selection tool
data-selection language-model nlp
Last synced: 08 May 2025
https://github.com/4ai/generative_deduplication
Code for Generative Deduplication For Socia Media Data Selection (Findings of EMNLP 2024)
data-selection deduplication emnlp2024 generative-deduplication nlp
Last synced: 23 Apr 2025
https://github.com/bessouat40/pdf-region-picker
A project to select only part of a PDF file. It's usefull when you want to extract informations with some python library like fitz.
data-extraction data-selection extract-data fitz javascript parsing pdf region-picker
Last synced: 06 Mar 2025
https://github.com/tigureis/data-preparation-from-kickstarter-campaigns
Kickstarter Data Prep: A hands-on guide to basic data cleaning and transformation.
data-cleaning data-construction data-integration data-science data-selection numpy pandas
Last synced: 19 Apr 2026