https://github.com/dr8co/doppel
A simple, modern duplicate file detector.
https://github.com/dr8co/doppel
deduplication duplicate-detection duplicate-files duplicatefilefinder duplicates
Last synced: 5 months ago
JSON representation
A simple, modern duplicate file detector.
- Host: GitHub
- URL: https://github.com/dr8co/doppel
- Owner: dr8co
- License: mit
- Created: 2025-07-01T16:20:15.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-12-25T10:17:37.000Z (6 months ago)
- Last Synced: 2025-12-26T22:58:19.076Z (6 months ago)
- Topics: deduplication, duplicate-detection, duplicate-files, duplicatefilefinder, duplicates
- Language: Go
- Homepage: https://doppel.draco.codes
- Size: 1.72 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
Your filesystem has doppelgรคngers. Letโs hunt.
---
**doppel** is a blazing-fast, concurrent CLI tool written in Go for scanning directories
and finding duplicate files, aka _doppelgรคngers_! ๐ต๏ธโโ๏ธ๐๏ธ
Save disk space and keep your filesystem clean by quickly identifying and managing duplicate files.
Doppel is designed for speed, flexibility, and reliability.
---
## ๐ Table of Contents
* [๐ Table of Contents](#-table-of-contents)
* [โก๏ธ Quick Start](#%EF%B8%8F-quick-start)
* [๐ฎ Terminal Preview](#-terminal-preview)
* [โจ Features](#-features)
* [๐ฆ Installation](#-installation)
* [๐ Usage](#-usage)
* [๐ ๏ธ Command-Line Interface](#%EF%B8%8F-command-line-interface)
* [โ๏ธ Configuration Files](#%EF%B8%8F-configuration-files)
* [Environment Variables](#environment-variables)
* [Automatic Completion](#automatic-completion)
* [๐ Find Command](#-find-command)
* [โ๏ธ Find Command Options](#%EF%B8%8F-find-command-options)
* [๐๏ธ Preset Command](#%EF%B8%8F-preset-command)
* [๐งฌ How It Works](#-how-it-works)
* [๐๏ธ Development](#%EF%B8%8F-development)
* [๐ License](#-license)
* [๐ค Contributing](#-contributing)
## โก๏ธ Quick Start
Install (requires Go 1.25+):
```sh
go install github.com/dr8co/doppel@latest
```
Scan your home directory for duplicates:
```sh
doppel find ~
```
Or use a preset for common scenarios:
```sh
doppel preset media ~/Pictures
```
## ๐ฎ Terminal Preview

## โจ Features
* โก๏ธ **Fast scanning** with parallel hashing (Blake3, configurable workers)
* ๐ **Flexible filtering** by file size, glob patterns, and regular expressions
* ๐ **Noise reduction** with path and file exclusions
* ๐ **Detailed statistics** and verbose output
* ๐ ๏ธ **Dry-run mode** to preview filters
* ๐ **Structured output** for easy integration with other tools. Supported formats:
* JSON
* YAML
* Text (default)
* ๐งฉ **Extensible presets** for common use cases (media, dev, docs, clean)
* ๐งช **Tested** with unit tests and integration tests
* ๐ป **Cross-platform**: Works on Linux, macOS, and Windows
* ๐ ๏ธ **Automatic completion** for bash, zsh, fish, and PowerShell
* ๐ **Structured logging** for better automation, debugging, and monitoring. Formats:
* JSON
* Text
* Pretty (default)
## ๐ฆ Installation
**With Go:**
```sh
go install github.com/dr8co/doppel@latest
```
**From source:**
```sh
git clone https://github.com/dr8co/doppel.git
cd doppel
go build -o doppel main.go
```
**Pre-built binaries:**
See the [๐ releases page](https://github.com/dr8co/doppel/releases).
## ๐ Usage
### ๐ ๏ธ Command-Line Interface
Doppel provides a simple CLI interface. The main command is `doppel`,
with subcommands for different operations.
```sh
doppel [global options] [command [command options]]
```
Run `doppel --help` to see global options and available commands.
> [!NOTE]
> Running `doppel` with no command defaults to `find`.
### โ๏ธ Configuration Files
Doppel supports configuration through TOML (recommended), YAML, or JSON files.
Configuration files are automatically loaded from:
* `$CONFIG_DIR/doppel/config.toml`
* `$CONFIG_DIR/doppel/config.yaml`
* `$CONFIG_DIR/doppel/config.json`
* `$CONFIG_DIR/doppel/config` (Assume TOML if no extension)
where `$CONFIG_DIR` is your system's user configuration directory:
* Linux: `~/.config`
* macOS: `~/Library/Application Support`
* Windows: `%AppData%`
* Plan 9: `~/lib`
* Other Unix: `~/.config`
The configuration files can be used to set default values for any command-line options.
The key names in the configuration file match the long option names for each command,
with dashes replaced with underscores.
For example, to set the default minimum file size for the `find` command to 1.5MB,
you would add the following to your TOML configuration file:
```toml
[find]
min_size = "1.5MB"
```
For more details on the TOML format,
see the [TOML spec](https://toml.io/en/v1.0.0 "TOML v1.0.0").
> [!NOTE]
> Command-line arguments take precedence over configuration file values.
### Environment Variables
Doppel also supports configuration through environment variables.
Environment variable names are derived from the command and option names,
with the following rules:
* The prefix `DOPPEL_` is added to all environment variable names.
* The command name is added after the prefix (if applicable).
* The option name is added after the command name.
* All names are converted to uppercase.
* Dashes (`-`) in option names are replaced with underscores (`_`).
For example, to set the default minimum file size for the `find` command to 1.5MB,
you would set the following environment variable:
```bash
DOPPEL_FIND_MIN_SIZE=1.5MB
```
> [!NOTE]
> Environment variables take precedence over configuration file values,
> but are overridden by command-line arguments.
#### Automatic Completion
Doppel supports automatic completion for various shells. To generate completion scripts, run:
```sh
doppel completion
```
Where `` is one of: `bash`, `zsh`, `fish`, or `pwsh`.
This will print the completion script to stdout.
You can redirect it to a file or source it directly in your shell.
### ๐ Find Command
Scan for duplicate files in the current directory:
```sh
doppel find
# or simply
doppel
```
Scan specific directories:
```sh
doppel find /path/to/dir1 /path/to/dir2
```
#### โ๏ธ Find Command Options
* `-w, --workers `: Number of parallel hashing workers (default: number of CPUs)
* `-v, --verbose`: Enable verbose output
* `--min-size `: Minimum file size to consider (default: 0 = no limit)
* `--max-size `: Maximum file size to consider (default: 0 = no limit)
* `--exclude-dirs `: Comma-separated glob patterns for directories to exclude
* `--exclude-files `: Comma-separated glob patterns for files to exclude
* `--exclude-dirs-regex `: Comma-separated regex patterns for directories to exclude
* `--exclude-files-regex `: Comma-separated regex patterns for files to exclude
* `--show-filters`: Show active filters and exit
* `--output-format `: Output format for duplicate groups (default: pretty, options: `pretty`, `json`, `yaml`)
* `--output-file `: Write output to a file instead of stdout
For more details, run:
```sh
doppel find --help
# or
doppel find help
```
**Examples:**
Find duplicates in `~/Downloads` and `~/Documents`, excluding `.git` directories and files smaller than 1MB:
```sh
doppel find ~/Downloads ~/Documents --exclude-dirs=.git --min-size=1000000 --verbose
# or
doppel find ~/Downloads ~/Documents --exclude-dirs=.git --min-size=1MB --verbose
```
`--min-size` and `--max-size` support the following formats:
* Bytes: `100`, `100B`, `100b` are all equivalent
* Kilobytes: `10KB`, `10kB`, `10Kb`, `10kb`, `10000` are all equivalent
* Kibibytes: `10KiB`, `10kiB`, `10KIB`, `10240` are all equivalent
* Megabytes: `1MB`, `1mB`, `1Mb`, `1mb`, `1000000` are all equivalent
* Mebibytes: `1MiB`, `1miB`, `1MIB`. (same as `1048576`)
* Gigabytes: `1GB`, `1gB`, `1Gb`, `1gb`. (`1000000000`)
* Gibibytes: `1GiB`, `1giB`, `1gIB`. (`1073741824`)
* Terabytes: `1TB`, `1tB`, `1Tb`, `1tb`. (`1000000000000`)
* Tebibytes: `1TiB`, `1tiB`, `1TIb`. (`1099511627776`)
* Petabytes: `1PB`, `1pB`, `1Pb`, `1pb`. (`1000000000000000`)
* Pebibytes: `1PiB`, `1piB`, `1PIB`. (`1125899906842624`)
* Exabytes: `1EB`, `1eB`, `1Eb`, `1eb`. (`1000000000000000000`)
* Exbibytes: `1EiB`, `1eiB`, `1EIB`. (`1152921504606846976`)
Find duplicates in `/var/logs`, excluding all `.log` files and directories starting with `temp`,
and ignoring empty files:
```sh
doppel find /var/logs --min-size=1 --exclude-files="*.log" --exclude-dirs="temp*" # Be sure to quote patterns!
```
> [!NOTE]
> When using glob patterns and regexes, be sure to quote (and escape, if necessary) them to prevent shell expansion.
### ๐๏ธ Preset Command
Use presets for common duplicate-hunting scenarios:
* `dev`: Skip development directories and files (e.g., build, temp, version control)
* `media`: Focus on media files (images/videos), skip small files
* `docs`: Focus on document files
* `clean`: Skip temporary and cache files
**Usage:**
```sh
doppel preset [options]
```
Where `` is one of: `dev`, `media`, `docs`, or `clean`.
Preset options are the same as for `find`.
**Example:**
Find duplicate media files in your `~/Pictures` folder:
```sh
doppel preset media ~/Pictures
```
## ๐งฌ How It Works
1. **File Discovery**: Recursively scans specified directories (and their subdirectories), applying filters.
2. **Grouping**: Groups files by size to quickly eliminate non-duplicates.
3. **Hashing**: Computes Blake3 hashes for files with matching sizes.
4. **Reporting**: Displays groups of duplicate files and optional statistics.
## ๐๏ธ Development
* ๐ Code is organized in `cmd/`, `internal/`, and `assets/` directories.
* ๐งฉ Uses [urfave/cli/v3](https://github.com/urfave/cli) for CLI parsing.
* ๐ Uses [blake3](https://github.com/lukechampine/blake3) for fast hashing.
* ๐งช Run tests with:
```sh
go test -race -v ./...
```
## ๐ License
This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.
## ๐ค Contributing
Contributions, issues, and feature requests are welcome!
Please open an issue or pull request on [GitHub](https://github.com/dr8co/doppel).
See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
---
**doppel** โ Find your duplicate files, fast and reliably. โจ