Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/timstaley/lofar_data_management
Scripts to ease removal of duplicate files from a multiple user system.
https://github.com/timstaley/lofar_data_management
Last synced: about 11 hours ago
JSON representation
Scripts to ease removal of duplicate files from a multiple user system.
- Host: GitHub
- URL: https://github.com/timstaley/lofar_data_management
- Owner: timstaley
- Created: 2012-04-10T16:49:13.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2012-05-02T14:38:15.000Z (over 12 years ago)
- Last Synced: 2023-03-22T11:11:29.907Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 102 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: ReadMe
Awesome Lists containing this project
README
A few quick scripts to ease removal of duplicate files from a multiple user system.
Intended for use with 'findup' from the fslint package:
http://www.pixelbeat.org/fslint/
http://en.flossmanuals.net/FSlint/'parse_findup_output.py' is a script to organize fslint/findup output and format it into a csv.
The files are scanned to identify who they belong to,
and then csv files are output to identify duplicates for which both/all copies belong to a single user.
This identifies the easy cases where a single user can decide which copy to retain.In the case of LOFAR data, the script also scrapes some minimal tags (subband, obs. id) from the folder name.
Usage:
Run findup, e.g. using:
./findup_script.sh /some_big_folder > big_folder_dupes.txt
or
./findup_script.sh /some_folder /some_other_folder > multiple_folder_dupes.txt(You can tweak the search criteria as per the findup --help reference).
Then run:
./parse_findup_output.py folder_dupes.txtWhich will output the summary file 'all_dupes.csv', and also a folder 'user_self_dupes' containing csv's pertaining to single user duplicates.