An open API service indexing awesome lists of open source software.

https://github.com/codecliff/fdupesanalyzer

A script to analyze output of fdupes linux utility to find level of overlap between directories. Written in R
https://github.com/codecliff/fdupesanalyzer

bash bash-script directory duplicates fdupes fdupes-linux-utility files r rstudio

Last synced: 11 months ago
JSON representation

A script to analyze output of fdupes linux utility to find level of overlap between directories. Written in R

Awesome Lists containing this project

README

          

code{white-space: pre;}

pre:not([class]) {
background-color: white;
}

if (window.hljs) {
hljs.configure({languages: []});
hljs.initHighlightingOnLoad();
if (document.readyState && document.readyState === "complete") {
window.setTimeout(function() { hljs.initHighlighting(); }, 0);
}
}

h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}

.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}

.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
background: white;
border: 1px solid #ddd;
border-radius: 4px;
}

.tabset-dropdown > .nav-tabs > li.active:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}

.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "";
border: none;
}

.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}

.tabset-dropdown > .nav-tabs > li.active {
display: block;
}

.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
}

.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}

.tabset-dropdown > .nav-tabs > li {
display: none;
}

$(document).ready(function () {
window.buildTabsets("TOC");
});

$(document).ready(function () {
$('.tabset-dropdown > .nav-tabs > li').click(function () {
$(this).parent().toggleClass('nav-tabs-open')
});
});


FdupesAnalyzer


A utility to analyze output of fdupes linux utility to find level of overlap between directories. Written in R. https://github.com/codecliff/FdupesAnalyzer



Why:


fdupes by Adrián López gives you a file-by-file list of duplicates. It works very well with renamed copies and files exported by image editors and such. However, to clean up a large dump of files accumulated over years by multiple users, I needed to see things like 70% of files in dir A are also in dir B, dir A has copies of all the files in dir B etc. This utility script creates a csv file with all this information.




How To Use:



  • Run fdupes and redirect results to file. fdupes -Sr rootpath >> fdupes_output.txt

  • Edit R script FDupesParser.R , update path for output file and rootpath.

  • Run R script (Preferably in interactive mode, preferably in RStudio)

  • Go over the csv file generated by script

  • (Optional) Generate fdupes commands for each directory pair and run as a batch




Output file formats:



Generated CSV file



  • “dir1” : directory 1

  • “dir2” : directory 2

  • “matchcnt”: no. of files matching between dir1 and dir2

  • “acnt” : file count in dir1

  • “bcnt” : file count in dir2

  • “aprct” : percent of files in dir1 which have copy in dir2

  • “bprct” : same for dir2

  • “maxprct” : max of above two




Generated script file


sudo fdupes -dN "./imgs/music" "./imgs/2018-03-oldccombk/stuff/"

sudo fdupes -dN "./ntfs/2017-backup/weds" "./IMAGES/Pictures_2017/.mail_downloads"
sudo fdupes -dN "./IMAGES/Picture/weds" "./IMAGES/Pictures_2017/oldlaptop_hdd"




Prerequisites



  • R

  • R Packages : data.table, tools

  • fdupes




Tested on



  • Ubuntu 18.04

  • R 3.6.2

  • RStudio 1.1.463




License


MIT



    © Rahul Singh 2020 https://github.com/codecliff/FdupesAnalyzer



// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
}
$(document).ready(function () {
bootstrapStylePandocTables();
});

(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();