https://github.com/akcarsten/duplicate-finder
This Python packages identifies duplicate files in a folder of interest.
https://github.com/akcarsten/duplicate-finder
duplicate-detection python
Last synced: 11 months ago
JSON representation
This Python packages identifies duplicate files in a folder of interest.
- Host: GitHub
- URL: https://github.com/akcarsten/duplicate-finder
- Owner: akcarsten
- License: mit
- Created: 2020-12-30T23:40:47.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-04T12:20:15.000Z (about 3 years ago)
- Last Synced: 2025-02-12T16:17:17.955Z (11 months ago)
- Topics: duplicate-detection, python
- Language: Python
- Homepage:
- Size: 49.8 KB
- Stars: 25
- Watchers: 4
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# About Duplicates Finder
Duplicates Finder is a simple Python package that identifies duplicate files in and across folders.
There are three ways to search for identical files:
1. List all duplicate files in a folder of interest.
2. Pick a file and find all duplications in a folder.
3. Directly compare two folders against each other.
The results are saved as a Pandas Dataframe or can be exported as .csv files.
More information about the underlying concept can also be found in this [short article](https://towardsdatascience.com/find-duplicate-photos-and-other-files-88b0d07ef020).
---
## Installation
You can either clone the repository directly from the Github webpage or run the following command(s) in your terminal:
Pip Installation:
```
pip install duplicate-finder
```
Alternatively you can clone the Git repository:
```
git clone https://github.com/akcarsten/duplicates.git
```
Then go to the folder to which you cloned the repository and run:
```
python setup.py install
```
Now you can run Python and import the Bitfinex client.
---
## Examples of how to use the package
#### Example 1: List all duplicate files in a folder of interest.
```python
import duplicates as dup
folder_of_interest = 'C:/manyDuplicatesHere/'
dup.list_all_duplicates(folder_of_interest, to_csv=True, csv_path='C:/csvWithAllDuplicates/', fastscan=True)
```
Here the _fastscan_ parameter is set to _True_ (default is false). By doing so a pre-selection of potential duplicate files
is performed based on the file size.
If only a specific type of files is of interest this can be further defined by the 'ext' parameter. For example:
```python
df = dup.list_all_duplicates(folder_of_interest, to_csv=True, csv_path='C:/csvWithAllDuplicates/', ext='.jpg')
```
#### Example 2: Pick a file and find all duplications in a folder.
```python
import duplicates as dup
file_of_interest = 'C:/manyDuplicatesHere/thisFileExistsManyTimes.jpg'
folder_of_interest = 'C:/manyDuplicatesHere/'
df = dup.find_duplicates(file_of_interest, folder_of_interest)
```
#### Example 3: Directly compare two folders against each other.
```python
import duplicates as dup
folder_of_interest_1 = 'C:/noDuplicatesHere/'
folder_of_interest_2 = 'C:/noDuplicatesHereAsWell/'
df = dup.compare_folders(folder_of_interest_1, folder_of_interest_2)
```
As in *Example 1* above a specific filetype can be selected and the results can be written to a .csv file.