{"id":30275176,"url":"https://github.com/thushan/smash","last_synced_at":"2025-10-26T19:35:20.897Z","repository":{"id":206663440,"uuid":"714099935","full_name":"thushan/smash","owner":"thushan","description":"Smash through to find duplicate files super fast by slicing files intelligently!","archived":false,"fork":false,"pushed_at":"2025-07-12T02:13:11.000Z","size":8110,"stargazers_count":16,"open_issues_count":7,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-12T03:04:58.348Z","etag":null,"topics":["cli","cli-tool","duplicate-files","freebsd","go","linux","macos","windows"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thushan.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-11-03T23:16:27.000Z","updated_at":"2025-04-18T23:10:51.000Z","dependencies_parsed_at":"2023-11-11T13:29:20.401Z","dependency_job_id":"e8be2270-62bb-447b-a182-1d9ff0769a4c","html_url":"https://github.com/thushan/smash","commit_stats":{"total_commits":101,"total_committers":3,"mean_commits":"33.666666666666664","dds":0.3168316831683168,"last_synced_commit":"92b610e66f464f3cd542aa5f43282861b0e365c4"},"previous_names":["thushan/smash"],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/thushan/smash","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thushan%2Fsmash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thushan%2Fsmash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thushan%2Fsmash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thushan%2Fsmash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thushan","download_url":"https://codeload.github.com/thushan/smash/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thushan%2Fsmash/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270691550,"owners_count":24629097,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","cli-tool","duplicate-files","freebsd","go","linux","macos","windows"],"created_at":"2025-08-16T09:13:40.512Z","updated_at":"2025-10-26T19:35:20.836Z","avatar_url":"https://github.com/thushan.png","language":"Go","funding_links":[],"categories":["\u003ca name=\"file-dir-cleanup\"\u003e\u003c/a\u003eClean up of files and directories"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cp\u003e\n    \u003cimg src=\"assets/banner.png\" width=\"392\" height=\"146\" alt=\"Smash - Deduplicate files fast!\" /\u003e \u003cbr/\u003e\n    \u003ca href=\"https://github.com/thushan/smash/blob/master/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/github/license/thushan/smash\" alt=\"License\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/thushan/smash/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/thushan/smash/actions/workflows/ci.yml/badge.svg?branch=main\" alt=\"CI\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://goreportcard.com/report/github.com/thushan/smash\"\u003e\u003cimg src=\"https://goreportcard.com/badge/github.com/thushan/smash\" alt=\"Go Report Card\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/thushan/smash/releases/latest\"\u003e\u003cimg src=\"https://img.shields.io/github/release/thushan/smash\" alt=\"Latest Release\"\u003e\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\n**Smash** is a high-performance CLI tool for detecting duplicate files — fast. It works by **slicing files or blobs into segments** and hashing them with blazing-fast, non-cryptographic algorithms like [xxhash](https://xxhash.com/) or [murmur3](https://en.wikipedia.org/wiki/MurmurHash).\n\nBuilt for speed and scale, `smash` is ideal for everything from low-bandwidth deduplication to analysing multi-terabyte datasets.\n\n### Key Features\n* **Fast**: Handles large files quickly via [slicing](./docs/slicing.md)\n* **Efficient**: Optimised for low I/O and bandwidth-constrained environments\n* **Smart hashing**: Supports [multiple algorithms](./docs/algorithms.md) like `xxhash`, `murmur3`, and more\n* **Safe**: Performs read-only scans of the filesystem\n* **Comprehensive**: Detects duplicate and empty (0-byte) files\n* **Machine-friendly**: JSON output compatible with tools like [`jq`](https://github.com/jqlang/jq) — [examples](#examples), [demos](./docs/demos.md)\n* **Proven**: Used to dedupe multi-terabyte astrophysics, image, and video datasets\n\n`smash` does **not** delete duplicates. It generates detailed reports for you to safely review and act on.\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"https://vhs.charm.sh/vhs-6UTX5Yc6CIQ6Y3lzulLKYF.gif\" alt=\"Made with VHS\"\u003e\u003cbr/\u003e\n    \u003csub\u003e\n        \u003csup\u003eFind duplicates in the \u003ca href=\"https://github.com/torvalds/linux\"\u003elinux/drivers\u003c/a\u003e source tree with \u003ccode\u003esmash\u003c/code\u003e (see our \u003ca href=\"docs/demos.md\"\u003e🍿 other demos\u003c/a\u003e). Made with \u003ca href=\"https://vhs.charm.sh\" target=\"_blank\"\u003evhs\u003c/a\u003e!\u003c/sup\u003e\n    \u003c/sub\u003e\n\u003c/p\u003e\n\nThe name comes from a prototype tool called SmartHash (written many years ago in C/ASM that's now lost in source \u0026 \ntoo hard to modernise). It operated on a similar concept of slicing and hashing (with CRC32 then later MD5).\n\n# Installation\n\n[![Operating Systems](https://img.shields.io/badge/platform-windows%20%7C%20macos%20%7C%20linux%20%7C%20freebsd-informational?style=for-the-badge)](https://github.com/thushan/smash/releases/latest)\n\nYou can download the latest binaries from [Github Releases](https://github.com/thushan/smash/releases) or via our [simple installer script](https://raw.githubusercontent.com/thushan/smash/main/install.sh) - which currently supports Linux, macos, FreeBSD \u0026 Windows:\n\n```bash\nbash \u003c(curl -s https://raw.githubusercontent.com/thushan/smash/main/install.sh)\n```\n\nIt will download the latest version \u0026 extract it to its own folder for you.\n\nAlternatively, you can install it via go:\n\n```bash\ngo install github.com/thushan/smash@latest\n```\n\n`smash` has been developed on Linux (Pop!_OS \u0026 Fedora), tested on macOS, FreeBSD \u0026 Windows.\n\n## Docker\n\nYou can also run `smash` using Docker. Multi-architecture images (amd64/arm64) are available on GitHub Container Registry:\n\n\u003e [!TIP]\n\u003e Use the `-t` flag to allocate a pseudo-TTY for better output formatting with Docker.\n\u003e \n\u003e We use the `--rm` flag to automatically remove the container after it exits, keeping \n\u003e your environment clean in these examples.\n\n```bash\n# Pull the latest image\ndocker pull ghcr.io/thushan/smash:latest\n\n# Scan current directory\ndocker run -t --rm -v \"$PWD:/data\" ghcr.io/thushan/smash:latest -r /data\n\n# Scan with output file (saves to current directory)\ndocker run -t --rm -v \"$PWD:/data\" ghcr.io/thushan/smash:latest -r --silent -o /data/report.json /data\n\n# Use the built-in /output directory (container includes a writable /output)\ndocker run -t --rm -v \"$PWD:/data\" -v \"$PWD/output:/output\" ghcr.io/thushan/smash:latest \\\n  -r --silent -o /output/report.json /data\n\n# Or create your own output directory\nmkdir -p my-reports\ndocker run -t --rm -v \"$PWD:/data\" -v \"$PWD/my-reports:/output\" ghcr.io/thushan/smash:latest \\\n  -r --silent -o /output/report.json /data\n\n# Scan multiple directories with output\ndocker run -t --rm \\\n  -v \"$HOME/Documents:/docs:ro\" \\\n  -v \"$HOME/Pictures:/pics:ro\" \\\n  -v \"$PWD/output:/output\" \\\n  ghcr.io/thushan/smash:latest -r -o /output/report.json /docs /pics\n\n# Windows PowerShell example\ndocker run --rm -v \"${PWD}:/data\" -v \"${PWD}/output:/output\" ghcr.io/thushan/smash:latest `\n  -r --silent -o /output/report.json /data\n\n# Use a specific version\ndocker pull ghcr.io/thushan/smash:v1.0.0\n```\n\n**Important notes:**\n- Output files must be written to mounted volumes (e.g., `/data` or `/output`)\n- Use `:ro` for read-only mounts when you only need to scan directories\n- The container runs as non-root user, so ensure output directories are writable\n\nThe Docker image is based on Alpine Linux for a minimal footprint (~8MB) and runs as a non-root user for security.\n\n# Usage\n\n```bash\n# Basic usage - scan current directory\nsmash\n\n# Recursive scan\nsmash -r\n\n# Scan multiple directories\nsmash -r ~/Documents ~/Downloads\n\n# Silent mode with report\nsmash -r --silent -o report.json ~/data\n```\n\nFor detailed usage, see the [User Guide](./docs/user-guide.md).\n\n## Command Line Options\n\nKey flags:\n- `-r, --recurse` - Scan subdirectories (required for recursive scanning)\n- `-o, --output-file` - Save results to JSON file\n- `--silent` - Suppress all output except errors\n- `--algorithm` - Choose hash algorithm (default: xxhash)\n- `--exclude-dir` - Skip directories (comma-separated)\n- `--exclude-file` - Skip files (comma-separated patterns)\n\nRun `smash --help` for complete options.\n\n## Quick Examples\n\n### Find Duplicates\n```bash\n# In photos directory\nsmash -r ~/photos -o duplicates.json\n\n# Across multiple drives\nsmash -r ~/Documents /mnt/backup/Documents\n\n# Large video files only\nsmash -r --min-size=104857600 ~/Videos\n```\n\n### Filter and Exclude\n```bash\n# Skip git and node_modules\nsmash -r --exclude-dir=.git,node_modules ~/projects\n\n# Include empty files\nsmash -r --ignore-empty=false ~/data\n```\n\n### Performance Tuning\n```bash\n# For network drives\nsmash -r --max-workers=4 /mnt/nas\n\n# For many small files\nsmash -r --disable-slicing ~/documents\n```\n\n### Working with Reports\n```bash\n# Generate report\nsmash -r ~/data -o report.json\n\n# List all duplicates\njq -r '.analysis.dupes[].files[].path' report.json\n\n# Show space wasted\njq '.analysis.summary.spaceWasted' report.json\n```\n\nSee the [User Guide](./docs/user-guide.md) for detailed examples and advanced usage.\n\n# Contributing\n\nWe welcome contributions! Please see our [Developer Guide](./docs/developer.md) for information on:\n- Building from source\n- Running tests\n- Development workflow\n- Docker development\n- Release process\n\n# Acknowledgements\n\nThis project was possible thanks to the following projects or folks.\n\n* [@jqlang/jq](https://github.com/jqlang/jq) - without `jq` we'd be a bit lost!\n* [@wader/fq](https://github.com/wader/fq) - countless nights of inspecting binary blobs!\n* [@cespare/xxhash](https://github.com/cespare/xxhash) - xxhash implementation\n* [@spaolacci/murmur3](https://github.com/spaolacci/murmur3) - murmur3 implementation\n* [@puzpuzpuz/xsync](https://github.com/puzpuzpuz/xsync) - Amazingly efficient map implementation\n* [@pterm/pterm](https://github.com/pterm/pterm) - Amazing TUI framework used\n* [@spf13/cobra](https://github.com/spf13/cobra) - CLI Magic with Cobra\n* [@golangci/golangci-lint](https://github.com/golangci/golangci-lint) - Go Linter\n* [@dkorunic/betteralign](https://github.com/dkorunic/betteralign) - Go alignment checker\n\nTesters - MarkB, JarredT, BenW, DencilW, JayT, ASV, TimW, RyanW, WilliamH, SpencerB, EmadA, ChrisE, AngelaB, LisaA, YousefI, JeffG, MattP\n\n# License\n\nCopyright (c) Thushan Fernando and licensed under Apache License 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthushan%2Fsmash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthushan%2Fsmash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthushan%2Fsmash/lists"}