Projects in Awesome Lists tagged with data-deduplication
A curated list of projects in awesome lists tagged with data-deduplication .
https://github.com/dpc/rdedup
Data deduplication engine, supporting optional compression and public key encryption.
backup data-deduplication deduplication encryption
Last synced: 15 May 2025
https://github.com/sail-sg/sailcraft
🚢 Data Toolkit for Sailor Language Models
data-cleaning data-deduplication
Last synced: 05 Oct 2025
https://github.com/jchristn/watsondedupe
Self-contained C# library for data deduplication using Sqlite
chunk chunk-data chunk-key compress compression data-deduplication dedupe deduplication duplicate-data nuget sqlite-database storage
Last synced: 28 Feb 2026
https://github.com/zabuzard/fastcdc4j
Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.
cdc chunking content-defined-chunking data-deduplication fastcdc java library
Last synced: 05 Mar 2026
https://github.com/gagan3012/polydedupe
PolyDeDupe: Multi-Lingual Data Deduplication
data-deduplication multilingual nlp
Last synced: 16 Mar 2025
https://github.com/fabriziosalmi/text-boundaries
A Python-based tool for preprocessing, cleaning, and analyzing text datasets, designed to filter, deduplicate, sort data, and generate statistical insights.
data-automation data-deduplication data-preprocessing data-sorting data-statistics-generation data-validation dataset-boundaries dataset-cleaning machine-learning natural-language-processing text-data-analysis
Last synced: 07 Apr 2025
https://github.com/tracing-performance-labs/go-dedupe
Go library for deduplicating string data
Last synced: 10 Oct 2025
https://github.com/keerthanapalanikumar/data-cleaning-on-sql
This repository contains SQL scripts and documentation for cleaning and standardizing data in the NashvilleHousing table within the sqlproject2 database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.
data-cleaning data-deduplication data-manipulation data-standardization database-management mssql ssms
Last synced: 27 Jan 2026