Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with deduplication
A curated list of projects in awesome lists tagged with deduplication .
https://github.com/restic/restic
Fast, secure, efficient backup program
backup dedupe deduplication go restic secure-by-default
Last synced: 29 Sep 2024
https://github.com/borgbackup/borg
Deduplicating archiver with compression and authenticated encryption.
backup borgbackup c compression cython dedupe deduplication encryption python python-3 ssh
Last synced: 01 Oct 2024
https://github.com/prometheus/alertmanager
Prometheus Alertmanager
alertmanager deduplication email hacktoberfest monitoring notifications opsgenie pagerduty slack
Last synced: 29 Sep 2024
https://github.com/kopia/kopia
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
backup cloud deduplication encryption google-cloud-storage hacktoberfest
Last synced: 31 Jul 2024
https://github.com/openvenues/libpostal
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
address address-parser c deduping deduplication international machine-learning natural-language-processing nlp record-linkage
Last synced: 30 Sep 2024
https://github.com/mhx/dwarfs
A fast high compression read-only file system for Linux, Windows and macOS
archiving compression cpp deduplication dwarfs filesystem flac fuse fuse-filesystem gpl-license linux lrzip lzma macfuse macos squashfs windows winfsp zpaq zstd
Last synced: 01 Oct 2024
https://github.com/sahib/rmlint
Extremely fast tool to remove duplicates and other lint from your filesystem
c deduplication duplicates fdupes filesystem lint python
Last synced: 30 Sep 2024
https://github.com/borgmatic-collective/borgmatic
Simple, configuration-driven backup software for servers and workstations
backup borg borgbackup borgbase compression cronhub cronitor deduplication healthchecks loki mariadb mongodb mysql ntfy pagerduty postgresql python servers sqlite
Last synced: 27 Sep 2024
https://github.com/rustic-rs/rustic
rustic - fast, encrypted, and deduplicated backups powered by Rust
backup deduplication encryption restic rust
Last synced: 30 Sep 2024
https://github.com/cupcakearmy/autorestic
Config driven, easy backup cli for restic.
backup cli config config-driven deduplication incremental incremental-backup pruning restic
Last synced: 01 Oct 2024
https://github.com/moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science
Last synced: 28 Sep 2024
https://github.com/J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
data-matching dedupe deduplication entity-resolution machine-learning privacy python python-library record-linkage similarity string-distance utrecht-university
Last synced: 31 Jul 2024
https://github.com/zinggai/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
analytics analytics-engineering data-science data-transformation data-transformations dataengineering datalake dataquality dedupe deduplication entity-resolution etl fuzzy-matching fuzzymatch identity identity-resolution masterdata ml modern-data-stack spark
Last synced: 28 Sep 2024
https://github.com/dpc/rdedup
Data deduplication engine, supporting optional compression and public key encryption.
backup data-deduplication deduplication encryption
Last synced: 01 Aug 2024
https://github.com/Yomguithereal/talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage
Last synced: 31 Jul 2024
https://github.com/sreedevk/deduplicator
Filter, Sort & Delete Duplicate Files Recursively
deduplication duplicate-detection duplicate-files duplicatefilefinder filesystem rust
Last synced: 01 Aug 2024
https://github.com/cargo-limit/cargo-limit
Productivity improvements for Rust ecosystem: warnings are skipped until errors are fixed, LSP-independent Neovim integration, etc.
build cargo cargo-plugin cargo-wrapper cli crates deduplication filter limit neovim neovim-plugin nvim plugin productivity runner rust wrapper
Last synced: 28 Sep 2024
https://github.com/dm-vdo/kvdo
A kernel module which provide a pool of deduplicated and/or compressed block storage.
compression deduplication kernel-modules linux-kernel storage vdo
Last synced: 28 Sep 2024
https://github.com/Jaskey/RocketMQDedupListener
RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用
deduplication rocketmq rocketmq-client
Last synced: 02 Aug 2024
https://github.com/opensanctions/nomenklatura
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
data-integration deduplication record-link
Last synced: 01 Aug 2024
https://github.com/F483/dejavu
Quickly detect already witnessed data.
command-line command-line-tool deduplication duplicate-values duplicates go golang history memory probabilistic
Last synced: 01 Aug 2024
https://github.com/elemental-lf/benji
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
b2 backup block-based ceph deduplication iscsi kubernetes lvm s3
Last synced: 01 Aug 2024
https://github.com/nlfiedler/fastcdc-rs
FastCDC implementation in Rust
chunking-algorithm deduplication rust
Last synced: 06 Aug 2024
https://github.com/netinvent/npbackup
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
backup cli compression deduplication gui healthcheck monitoring prometheus-metrics restic vss
Last synced: 01 Aug 2024
https://github.com/deajan/backup-bench
Quick and dirty backup tool benchmark with reproducible results
backup benchmark benchmarking borgbackup bupstash compression deduplication duplicacy kopia restic
Last synced: 04 Aug 2024
https://github.com/jvirkki/dupd
CLI utility to find duplicate files
c deduplication duplicate-files duplicatefilefinder duplicates fdupes
Last synced: 31 Jul 2024
https://github.com/OpenGene/gencore
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
bioinformatics consensus deduplication deep-sequencing duplex duplex-sequencing duplication ngs sequencing sequencing-error sequencing-noise somatic
Last synced: 03 Aug 2024
https://github.com/tsileo/blobstash
You personal database. Mirror of https://git.sr.ht/~tsileo/blobstash
backup blob-store blobstash content-addressed deduplication document-store go storage
Last synced: 01 Aug 2024
https://github.com/lostatc/acid-store
[UNMAINTAINED] A transactional and deduplicating virtual file system
acid deduplication encryption filesystem fuse rclone redis rust s3 sftp sqlite storage
Last synced: 06 Aug 2024
https://github.com/unreadablewxy/fs-curator
Automation for the serious data hoarder that wants to have their data and use it
deduplication directory-tree file-renamer file-sorting hard-links organizer
Last synced: 13 Aug 2024
https://github.com/openvenues/lieu
Dedupe/batch geocode addresses and venues around the world with libpostal
address deduplication geocoding international venues
Last synced: 05 Aug 2024
https://github.com/lobocv/simpleflow
Generic simple workflows and concurrency patterns
batching concurrency counter deduplication generics go golang timeseries worflows workerpool
Last synced: 02 Oct 2024
https://github.com/davidsvy/Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
dataset deduplication fine-tuning fraud gpt2 huggingface lsh minhash nlp pytorch readability scam transformer web-scraping
Last synced: 05 Aug 2024
https://github.com/nebucatnetzer/borg-qt
A Qt frontend for the command line software BorgBackup.
backup borg borgbackup borgbackup-gui deduplication gplv3 pyqt5 python3 qt5
Last synced: 28 Sep 2024
https://github.com/InexplicableMagic/photodedupe
A utility for locating near duplicate photos irrespective of image resolution, compression settings or file format.
computer-vision computer-vision-tools deduplication duplicate-detection image-deduplication
Last synced: 01 Aug 2024
https://github.com/gerald-lnj/duplicate-video-finder
A python module to detect duplicate videos in a directory.
cleanup data-hoarder deduplication duplicate-detection python python-3 video-processing
Last synced: 04 Aug 2024
https://github.com/arbal/brave-control
Control Brave Browser from the command line. List, close, deduplicate and bring focus to open tabs. Also includes Alfred workflow integration.
alfred alfred-workflow automation brave brave-browser browser cli command-line command-line-tool deduplication focus jxa tabs workflow
Last synced: 01 Aug 2024
https://github.com/rosette-api/ruby
Rosette API Client Library for Ruby
deduplication entity-extraction language-identification machine-learning morphology named-entity-recognition natural-language-processing nlp ruby sentiment-analysis text-analytics text-embedding tokenization
Last synced: 03 Aug 2024
https://github.com/gblach/reflicate
Deduplicate data by creating reflinks between identical files.
btrfs deduplicate deduplication ocfs2 reflinks rust xfs
Last synced: 31 Jul 2024
https://github.com/homebase/bigpack
Blazing Fast Petabyte Scale Static Web Server + Tools. Serve Billion Files from an Indexed, Compressed and Deduplicated Archive.
archiver deduplication webserver
Last synced: 01 Aug 2024
https://github.com/gkjohnson/webpack-script-guard
Webpack loader for guarding against duplicate scripts in separate bundles
deduplicate deduplication html import imports javascript loader polymer webcomponents webpack
Last synced: 01 Oct 2024