{"id":23119542,"url":"https://github.com/programmerdan/fieldscan","last_synced_at":"2025-07-06T14:35:50.145Z","repository":{"id":14171773,"uuid":"16877841","full_name":"ProgrammerDan/fieldscan","owner":"ProgrammerDan","description":"File Duplication Detector and Drive Scanner","archived":false,"fork":false,"pushed_at":"2014-04-04T03:11:39.000Z","size":400,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-09T14:24:04.360Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ProgrammerDan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-02-16T03:58:36.000Z","updated_at":"2014-04-04T03:11:36.000Z","dependencies_parsed_at":"2022-09-23T20:11:11.341Z","dependency_job_id":null,"html_url":"https://github.com/ProgrammerDan/fieldscan","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProgrammerDan%2Ffieldscan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProgrammerDan%2Ffieldscan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProgrammerDan%2Ffieldscan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProgrammerDan%2Ffieldscan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ProgrammerDan","download_url":"https://codeload.github.com/ProgrammerDan/fieldscan/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247112728,"owners_count":20885605,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-17T05:39:16.070Z","updated_at":"2025-04-04T02:41:31.973Z","avatar_url":"https://github.com/ProgrammerDan.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"fieldscan\n=========\n\nFile Duplication Detector and Drive Scanner\n\nby ProgrammerDan (Daniel Boston)\nStarted February 15, 2014\n\nOverview\n========\n\nI've played with this idea for a while. Basically, I need a simple tool\nto index a file system and flexibly identify duplicates. \n\nMany different architectures have come and gone from favor in my mind \nover the past few years. For this incarnation, I'm attempting a \nsimpler approach (believe it or not).\n\nUltimately I'll put a web service frontend on this, but to start this\napp on invocation takes a folder as a starting point. It expects a\ncertain amount of configuration, such as a datasource configuration\nand the like. Given that configuration is correct, it will begin to navigate\nthe folder and child folders recursively (not following links). If prior\ndata on these folders and files exists, it will update the existing data.\nIf no data is found, it will construct data. Basically, for each file node\nit constructs a set of datapoint \"features\" that can be used to identify\nduplicates. These features are flexible based on file type. Once the\nanalysis is complete (or if I get multi-core support going, simultaneously)\nduplicates will be scanned for progressively. \n\nAnyway, this is an ongoing adventure, one I've started and stopped many times\non my way. Hopefully I'll stick with this one for a bit.\n\nDetails\n=======\n\nAlthough overall I'm focusing on a simple implementation, simple has a\ndifferent meaning for me than other people. I'm more interested in doing\nthings right than quickly, and that causes me to spend some more time\non details others might skip over.\n\nCase in point, I'm building a full-feature DAO framework for this project.\nI didn't intend to do so at first, but I've grown to truly love the\neventual simplicity of a DAO framework. I modelled my DAO framework after\na standard GenericDAO template, which has a client-facing set of Interfaces\nwhich is all the main classes utilizes. The Implementations are fully\nhidden by a DAO Factory. This lets me pull out and rewrite the backend,\nwrite full-feature tests, and basically do anything I want to the storage\nlayer without touching any of the implementations. Keeping the DAO separate\nfrom the application logic is just a good idea, and it's so good that \ninspite of my intention to bypass it this time, I just couldn't.\n\nOne thing I am avoiding is Spring. I've worked with it recently and while\nit is a super powerful framework, I realized that (1) it's too magical,\n(2) it's too bloated, and (3) it was keeping me from truly understanding\nsome of the underlying wireup mechanisms. So this project, instead, is\nforcing me to really look at, consider, and carefully curate my library\nchoices. \n\nIn addition to using DAO, I'm leveraging Hibernate with DBCP using Postgresql\nas my (current) database stack. This is to force myself to really get\nintimate with Hibernate, and because I want to learn DBCP better. Postgresql\nis my go-to database, and I have a feeling eventually this project might\nwind up hosted on AWS, which supports Postgresql via RDS, a technology\nI'm very interested in for both business and personal reasons.\n\nThis is also proving to be a great framework to start using the java.nio\nclasses, which although they've been around for quite a while now, I've\nnever had a good reason to truly learn them. What I'm finding is awesome\nso far, even if it's a lot to digest compared to the far simpler java.io\nclasses and how they deal with the file system. I'm regretting skipping\nthese classes all these years, already.\n\nUltimately, the majority of the code I'm writing will be all the pieces that\nsurround the main algorithm, which is very simple (as described above in \nthe overview).\nThe true challenge will be in the NodeProcessors, which I'm trying to make\nas flexible as possible, going so far as to leverage a registration approach.\nThis will allow me to grow my file processing abilities as I have time.\nI'm still trying to figure out how exactly to implement deduplication scanning\nwithin this frame, but it'll work out and in the meantime I've got a great\nframework for cataloging a filesystem.\n\nExpect more details as I continue.\n\nBuild\n=====\n\nI use jpa-modelgen, so be sure to use \"clean\" in your build directive:\n\nmvn clean package\n\netc.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprogrammerdan%2Ffieldscan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprogrammerdan%2Ffieldscan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprogrammerdan%2Ffieldscan/lists"}