{"id":27772812,"url":"https://github.com/deplicate/deplicate","last_synced_at":"2025-04-30T00:01:52.635Z","repository":{"id":56160140,"uuid":"98133569","full_name":"vuolter/deplicate","owner":"vuolter","description":"Advanced Duplicate File Finder for Python","archived":true,"fork":false,"pushed_at":"2020-11-23T14:28:13.000Z","size":140,"stargazers_count":77,"open_issues_count":7,"forks_count":16,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-24T08:41:58.017Z","etag":null,"topics":["deplicate","duplicate","duplicate-detection","duplicate-files","duplicatefilefinder","duplicates","duplicates-removed","duplication-finder","finder","macosx","multi-filtering","purge-duplicate-files","pypi","python","scanning","unix","windows"],"latest_commit_sha":null,"homepage":"https://pypi.python.org/pypi/deplicate","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vuolter.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-24T00:18:59.000Z","updated_at":"2025-03-01T17:56:24.000Z","dependencies_parsed_at":"2022-08-15T13:50:36.505Z","dependency_job_id":null,"html_url":"https://github.com/vuolter/deplicate","commit_stats":null,"previous_names":["deplicate/deplicate"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vuolter%2Fdeplicate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vuolter%2Fdeplicate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vuolter%2Fdeplicate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vuolter%2Fdeplicate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vuolter","download_url":"https://codeload.github.com/vuolter/deplicate/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251602807,"owners_count":21615963,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deplicate","duplicate","duplicate-detection","duplicate-files","duplicatefilefinder","duplicates","duplicates-removed","duplication-finder","finder","macosx","multi-filtering","purge-duplicate-files","pypi","python","scanning","unix","windows"],"created_at":"2025-04-30T00:01:29.127Z","updated_at":"2025-04-30T00:01:52.528Z","avatar_url":"https://github.com/vuolter.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"#\"\u003e\n    \u003cimg src=\"media/banner.png?raw=true\" alt=\"deplicate\" /\u003e\n  \u003c/a\u003e\n  \u003ch2\u003eAdvanced Duplicate File Finder for Python\u003c/h2\u003e\n  \u003ca href=\"https://pypi.python.org/pypi/deplicate\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/status/deplicate.svg\" alt=\"PyPI Status\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypi.python.org/pypi/deplicate\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/deplicate.svg\" alt=\"PyPI Version\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypi.python.org/pypi/deplicate\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/pyversions/deplicate.svg\" alt=\"PyPI Python Versions\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypi.python.org/pypi/deplicate\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/l/deplicate.svg\" alt=\"PyPI License\" /\u003e\n  \u003c/a\u003e\n  \u003ch5\u003e\u003ci\u003eNothing is impossible to solve.\u003c/i\u003e\u003ch5\u003e\n\u003c/div\u003e\n\n\nTable of contents\n-----------------\n\n- [Status](#status)\n- [Description](#description)\n- [Features](#features)\n- [Installation](#installation)\n  - [PIP Install](#pip-install)\n  - [Tarball Install](#tarball-install)\n- [Usage](#usage)\n  - [Quick Start](#quick-start)\n  - [Advanced Usage](#advanced-usage)\n- [API Reference](#api-reference)\n  - [Exceptions](#exceptions)\n  - [Classes](#classes)\n  - [Functions](#functions)\n\n\nStatus\n------\n\n[![Travis Build Status](https://travis-ci.org/deplicate/deplicate.svg?branch=master)](https://travis-ci.org/deplicate/deplicate)\n[![AppVeyor Build status](https://ci.appveyor.com/api/projects/status/liiymqadlm0hjbbb/branch/master?svg=true)](https://ci.appveyor.com/project/vuolter/deplicate/branch/master)\n[![Requirements Status](https://requires.io/github/deplicate/deplicate/requirements.svg?branch=master)](https://requires.io/github/deplicate/deplicate/requirements/?branch=master)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/bc7b97415617404694a07f2529147f7e)](https://www.codacy.com/app/deplicate/deplicate?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=deplicate/deplicate\u0026amp;utm_campaign=Badge_Grade)\n[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/deplicate/deplicate/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/deplicate/deplicate/?branch=master)\n\n\nDescription\n-----------\n\n`deplicate` is an high-performance duplicate file finder\nwritten in Pure Python with low memory impact and several advanced filters.\n\nFind out all the duplicate files in one or more directories,\nyou can also scan directly a bunch of files.\nLatest releases let you to remove the spotted duplicates and/or apply a custom\naction over them.\n\n\nFeatures\n--------\n\n- [x] N-tree layout for low memory consumption\n- [x] Multi-threaded (partially)\n- [x] Raw drive access to maximize I/O performance (Unix only)\n- [x] xxHash algorithm for fast file identification\n- [x] File size and signature checking for quick duplicate exclusion\n- [x] Extended file attributes scanning\n- [x] Multi-filtering\n- [x] Full error handling\n- [x] Unicode decoding\n- [x] Safe from path walking loop\n- [ ] SSD detection\n- [x] Duplicates purging\n- [x] Support for moving dulicates to trash/recycle bin\n- [x] Custom action handling over deletion\n- [x] **Command Line Interface** (https://github.com/deplicate/deplicate-cli)\n- [x] Unified structured result\n- [x] Support posix_fadvise\n- [ ] Graphical User Interface (https://github.com/deplicate/deplicate-gui)\n- [ ] Incremental file chunk checking\n- [ ] Hard-link scanning\n- [ ] Duplicate directories recognition\n- [ ] Multi-processing\n- [x] Fully documented\n- [ ] PyPy support\n- [ ] ~~Exif data scanning~~\n\n\nInstallation\n------------\n\n\u003e **Note:**\n\u003e This will install just `deplicate`, without its CLI and GUI.\n\u003e - CLI _(Command Line Interface)_: https://github.com/deplicate/deplicate-cli.\n\u003e - GUI _(Graphical User Interface)_: https://github.com/deplicate/deplicate-gui.\n\nThe easiest way to install `deplicate` on your system is the\n[PIP Install way](#pip-install),\nbut, if you want, you can try to install it from the sources as described in\nthe [Tarball Install section](#tarball-install).\n\n### PIP Install\n\nIf the command `pip` is not found in your system,\ninstall the latest `pip` distribution:\ndownload [get-pip.py](https://bootstrap.pypa.io/get-pip.py)\nand run it using the [Python Interpreter](https://www.python.org).\n\nThen, type in your command shell **with _administrator/root_ privileges**:\n\n    pip install deplicate\n\n\u003e **Note:**\n\u003e In Unix-based systems, you may have to type `sudo pip install deplicate`.\n\nIf the above command fails, consider installing with the option\n[`--user`](https://pip.pypa.io/en/latest/user_guide/#user-installs):\n\n    pip install --user deplicate\n\n### Tarball Install\n\n0. Make sure you have installed\nthe [Python Interpreter](https://www.python.org)\nwith the package `setuptools` **(\u003e=20.8.1)**.\n1. Get the latest tarball of the source code in format\n[ZIP](https://github.com/deplicate/deplicate/archive/master.zip) or\n[TAR](https://github.com/deplicate/deplicate/archive/master.tar.gz).\n2. Extract the downloaded archive.\n3. From the extracted path, execute the command\n`python setup.py install`.\n\n\nUsage\n-----\n\nIn your script import the module `duplicate`.\n\n    import duplicate\n\nCall its function `find` to search the duplicate files in the given path:\n\n    duplicate.find('/path')\n\nOr call the function `purge` if you want to remove them in addition:\n\n    duplicate.purge('/path')\n\nYou'll get a `duplicate.ResultInfo` object as result,\nwith the following properties:\n- `dups` – Tuples of paths of duplicate files.\n- `deldups` – Tuple of paths of purged duplicate files.\n- `duperrors` – Tuple of paths of files not filtered due errors.\n- `scanerrors` – Tuple of paths of files not scanned due errors.\n- `delerrors` – Tuple of paths of files not purged due errors.\n\n\u003e **Note:**\n\u003e By default, directory paths are scanned recursively.\n\n\u003e **Note:**\n\u003e By default, files smaller than **100 KiB** or bigger than **100 GiB**\n\u003e are not scanned.\n\n\u003e **Note:**\n\u003e File paths are returned in canonical form.\n\n\u003e **Note:**\n\u003e Tuples of duplicate files are sorted in descending order according\ninput priority, file modification time and name length.\n\n### Quick Start\n\nScan for duplicates a single directory:\n\n    import duplicate\n\n    duplicate.find('/path/to/dir')\n\nScan for duplicates two files (at least):\n\n    import duplicate\n\n    duplicate.find('/path/to/file1', '/path/to/file2')\n\nScan for duplicates a single directory and move them to the trash/recycle bin:\n\n    import duplicate\n\n    duplicate.purge('/path/to/dir')\n\nScan for duplicates a single directory and delete them:\n\n    import duplicate\n\n    duplicate.purge('/path/to/dir', trash=False)\n\nScan more directories together:\n\n    import duplicate\n\n    duplicate.find('/path/to/dir1', '/path/to/dir2', '/path/to/dir3')\n\nScan from iterable:\n\n    import duplicate\n\n    iterable = ['/path/to/dir1', '/path/to/dir2', '/path/to/dir3']\n\n    duplicate.find.from_iterable(iterable)\n\nScan ignoring the minimum file size threshold:\n\n    import duplicate\n\n    duplicate.find('/path/to/dir', minsize=0)\n\n### Advanced Usage\n\nScan without recursing directories:\n\n    import duplicate\n\n    duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/dir1',\n                   recursive=False)\n\n\u003e **Note:**\n\u003e In _not-recursive mode_, like the case above, directory paths are simply\n\u003e ignored.\n\nScan checking file names and hidden files:\n\n    import duplicate\n\n    duplicate.find.from_iterable('/path/to/file1', '/path/to/dir1',\n                                 comparename=True, scanhidden=True)\n\nScan excluding files ending with extension `.doc`:\n\n    import duplicate\n\n    duplicate.find('/path/to/dir', exclude=\"*.doc\")\n\nScan including file links:\n\n    import duplicate\n\n    duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/file3',\n                   scanlinks=True)\n\nScan for duplicates, handling errors with a custom action (printing):\n\n    import duplicate\n\n    def error_callback(exc, filename):\n        print(filename)\n\n    duplicate.find('/path/to/dir', onerror=error_callback)\n\nScan for duplicates and apply a custom action (printing), instead of purging:\n\n    import duplicate\n\n    def purge_callback(filename):\n        print(filename)\n        raise duplicate.SkipException\n\n    duplicate.purge('/path/to/dir', ondel=purge_callback)\n\nScan for duplicates, apply a custom action (printing) and move them to\nthe trash/recycle bin:\n\n    import duplicate\n\n    def purge_callback(filename):\n        print(filename)\n\n    duplicate.purge('/path/to/dir', ondel=purge_callback)\n\nScan for duplicates, handling errors with a custom action (printing), and\napply a custom action (moving to path), instead of purging:\n\n    import shutil\n    import duplicate\n\n    def error_callback(exc, filename):\n        print(filename)\n\n    def purge_callback(filename):\n        shutil.move(filename, '/path/to/custom-dir')\n        raise duplicate.SkipException\n\n    duplicate.purge('/path/to/dir',\n                    ondel=purge_callback, onerror=error_callback)\n\n\nAPI Reference\n-------------\n\n### Exceptions\n\n- duplicate.`SkipException`(_*args_, _**kwargs_)\n  - **Description**: Raised to skip file scanning, filtering or purging.\n  - **Return**: Self instance.\n  - **Parameters**: Same as built-in `Exception`.\n  - **Proprieties**: Same as built-in `Exception`.\n  - **Methods**: Same as built-in `Exception`.\n\n### Classes\n\n- duplicate.`Cache`(_maxlen_=`DEFAULT_MAXLEN`)\n  - **Description**: Internal shared cache class.\n  - **Return**: Self instance.\n  - **Parameters**:\n    - `maxlen` – Maximum number of entries stored.\n  - **Proprieties**:\n    - `DEFAULT_MAXLEN`\n      - **Description**: Default maximum number of entries stored.\n      - **Value**: `128`.\n  - **Methods**:\n    - ...\n    - `clear`(_self_)\n      - **Description**: Clear the cache if not acquired by any object.\n      - **Return**: `True` if went cleared, otherwise `False`.\n      - **Parameters**: None.\n\n- duplicate.`Deplicate`(_paths_,\n    _minsize_=`DEFAULT_MINSIZE`,\n    _maxsize_=`DEFAULT_MAXSIZE`,\n    _include_=`None`, _exclude_=`None`,\n    _comparename_=`False`, _comparemtime_=`False`, _comparemode_=`False`,\n    _recursive_=`True`, _followlinks_=`False`, _scanlinks_=`False`,\n    _scanempties_=`False`,\n    _scansystem_=`True`, _scanarchived_=`True`, _scanhidden_=`True`)\n  - **Description**: Duplicate main class.\n  - **Return**: Self instance.\n  - **Parameters**:\n    - `paths` – Iterable of directory and/or file paths.\n    - `minsize` – _(optional)_ Minimum size in bytes of files to include\n      in scanning.\n    - `maxsize` – _(optional)_ Maximum size in bytes of files to include\n      in scanning.\n    - `include` – _(optional)_ Wildcard pattern of files to include\n      in scanning.\n    - `exclude` – _(optional)_ Wildcard pattern of files to exclude\n      from scanning.\n    - `comparename` – _(optional)_ Check file name.\n    - `comparemtime` – _(optional)_ Check file modification time.\n    - `compareperms` – _(optional)_ Check file mode (permissions).\n    - `recursive` – _(optional)_ Scan directory recursively.\n    - `followlinks` – _(optional)_ Follow symbolic links pointing to directory.\n    - `scanlinks` – _(optional)_ Scan symbolic links pointing to file\n      (hard-links included).\n    - `scanempties` – _(optional)_ Scan empty files.\n    - `scansystems` – _(optional)_ Scan OS files.\n    - `scanarchived` – _(optional)_ Scan archived files.\n    - `scanhidden` – _(optional)_ Scan hidden files.\n  - **Proprieties**:\n    - `DEFAULT_MINSIZE`\n      - **Description**: Minimum size of files to include in scanning\n        (in bytes).\n      - **Value**: `102400`.\n    - `DEFAULT_MAXSIZE`\n      - **Description**: Maximum size of files to include in scanning\n        (in bytes).\n      - **Value**: `107374182400`.\n    - `result`\n        - **Description**: Result of `find` or `purge` invocation\n          (by default is `None`).\n        - **Value**: `duplicate.ResultInfo`.\n  - **Methods**:\n    - `find`(_self_, _onerror_=`None`, _notify_=`None`)\n      - **Description**: Find duplicate files.\n      - **Return**: None.\n      - **Parameters**:\n        - `onerror` – _(optional)_ Callback function called with two arguments,\n          `exception` and `filename`, when an error occurs during file\n          scanning or filtering.\n        - `notify` – _(internal)_ Notifier callback.\n    - `purge`(_self_,\n        _trash_=`True`, _ondel_=`None`, _onerror_=`None`, _notify_=`None`)\n      - **Description**: Find and purge duplicate files.\n      - **Return**: None.\n      - **Parameters**:\n        - `trash` – _(optional)_ Move duplicate files to trash/recycle bin,\n          instead of deleting.\n        - `ondel` – _(optional)_ Callback function called with one arguments,\n          `filename`, before purging a duplicate file.\n        - `onerror` – _(optional)_ Callback function called with two arguments,\n          `exception` and `filename`, when an error occurs during file\n          scanning, filtering or purging.\n        - `notify` – _(internal)_ Notifier callback.\n\n- duplicate.`ResultInfo`(_dupinfo_, _delduplist_, _scnerrlist_, _delerrors_)\n  - **Description**: Duplicate result class.\n  - **Return**: `collections.namedtuple`(`'ResultInfo'`,\n    `'dups deldups duperrors scanerrors delerrors'`).\n  - **Parameters**:\n    - `dupinfo` – _(internal)_ Instance of `duplicate.structs.DupInfo`.\n    - `delduplist` – _(internal)_ Iterable of purged files\n      (deleted or trashed).\n    - `scnerrlist` – _(internal)_ Iterable of files not scanned (due errors).\n    - `delerrors` – _(internal)_ Iterable of files not purged (due errors).\n  - **Proprieties**: Same as `collections.namedtuple`.\n  - **Methods**: Same as `collections.namedtuple`.\n\n### Functions\n\n- duplicate.`find`(_*paths_,\n    _minsize_=`duplicate.Deplicate.DEFAULT_MINSIZE`,\n    _maxsize_=`duplicate.Deplicate.DEFAULT_MAXSIZE`,\n    _include_=`None`, _exclude_=`None`,\n    _comparename_=`False`, _comparemtime_=`False`, _comparemode_=`False`,\n    _recursive_=`True`, _followlinks_=`False`, _scanlinks_=`False`,\n    _scanempties_=`False`,\n    _scansystem_=`True`, _scanarchived_=`True`, _scanhidden_=`True`,\n    _onerror_=`None`, _notify_=`None`)\n  - **Description**: Find duplicate files.\n  - **Return**: `duplicate.ResultInfo`.\n  - **Parameters**:\n    - `paths` – Iterable of directory and/or file paths.\n    - `minsize` – _(optional)_ Minimum size in bytes of files to include\n      in scanning.\n    - `maxsize` – _(optional)_ Maximum size in bytes of files to include\n      in scanning.\n    - `include` – _(optional)_ Wildcard pattern of files to include\n      in scanning.\n    - `exclude` – _(optional)_ Wildcard pattern of files to exclude\n      from scanning.\n    - `comparename` – _(optional)_ Check file name.\n    - `comparemtime` – _(optional)_ Check file modification time.\n    - `compareperms` – _(optional)_ Check file mode (permissions).\n    - `recursive` – _(optional)_ Scan directory recursively.\n    - `followlinks` – _(optional)_ Follow symbolic links pointing to directory.\n    - `scanlinks` – _(optional)_ Scan symbolic links pointing to file\n      (hard-links included).\n    - `scanempties` – _(optional)_ Scan empty files.\n    - `scansystems` – _(optional)_ Scan OS files.\n    - `scanarchived` – _(optional)_ Scan archived files.\n    - `scanhidden` – _(optional)_ Scan hidden files.\n    - `onerror` – _(optional)_ Callback function called with two arguments,\n      `exception` and `filename`, when an error occurs during file scanning or\n      filtering.\n    - `notify` – _(internal)_ _(optional)_ Notifier callback.\n\n- duplicate.`purge`(_*paths_,\n    _minsize_=`duplicate.Deplicate.DEFAULT_MINSIZE`,\n    _maxsize_=`duplicate.Deplicate.DEFAULT_MAXSIZE`,\n    _include_=`None`, _exclude_=`None`,\n    _comparename_=`False`, _comparemtime_=`False`, _comparemode_=`False`,\n    _recursive_=`True`, _followlinks_=`False`, _scanlinks_=`False`,\n    _scanempties_=`False`,\n    _scansystem_=`True`, _scanarchived_=`True`, _scanhidden_=`True`,\n    _trash_=`True`, _ondel_=`None`, _onerror_=`None`, _notify_=`None`)\n  - **Description**: Find and purge duplicate files.\n  - **Return**: `duplicate.ResultInfo`.\n  - **Parameters**:\n    - `paths` – Iterable of directory and/or file paths.\n    - `minsize` – _(optional)_ Minimum size in bytes of files to include\n      in scanning.\n    - `maxsize` – _(optional)_ Maximum size in bytes of files to include\n      in scanning.\n    - `include` – _(optional)_ Wildcard pattern of files to include\n      in scanning.\n    - `exclude` – _(optional)_ Wildcard pattern of files to exclude\n      from scanning.\n    - `comparename` – _(optional)_ Check file name.\n    - `comparemtime` – _(optional)_ Check file modification time.\n    - `compareperms` – _(optional)_ Check file mode (permissions).\n    - `recursive` – _(optional)_ Scan directory recursively.\n    - `followlinks` – _(optional)_ Follow symbolic links pointing to directory.\n    - `scanlinks` – _(optional)_ Scan symbolic links pointing to file\n      (hard-links included).\n    - `scanempties` – _(optional)_ Scan empty files.\n    - `scansystems` – _(optional)_ Scan OS files.\n    - `scanarchived` – _(optional)_ Scan archived files.\n    - `scanhidden` – _(optional)_ Scan hidden files.\n    - `trash` – _(optional)_ Move duplicate files to trash/recycle bin,\n      instead of deleting.\n    - `ondel` – _(optional)_ Callback function called with one arguments,\n      `filename`, before purging a duplicate file.\n    - `onerror` – _(optional)_ Callback function called with two arguments,\n      `exception` and `filename`, when an error occurs during file scanning,\n      filtering or purging.\n    - `notify` – _(internal)_ _(optional)_ Notifier callback.\n\n\n------------------------------------------------\n###### © 2017 Walter Purcaro \u003cvuolter@gmail.com\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeplicate%2Fdeplicate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeplicate%2Fdeplicate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeplicate%2Fdeplicate/lists"}