{"id":17679992,"url":"https://github.com/costajob/file_scanner","last_synced_at":"2025-08-30T02:04:23.611Z","repository":{"id":56846398,"uuid":"97917590","full_name":"costajob/file_scanner","owner":"costajob","description":"A library to lazily collect a list of files by path and a set of filters.","archived":false,"fork":false,"pushed_at":"2017-08-28T13:13:35.000Z","size":59,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-07-27T17:22:34.729Z","etag":null,"topics":["filesystem","lazy-evaluation","ruby","scanner"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/costajob.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-21T07:09:36.000Z","updated_at":"2018-11-29T11:07:08.000Z","dependencies_parsed_at":"2022-09-09T01:01:08.091Z","dependency_job_id":null,"html_url":"https://github.com/costajob/file_scanner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/costajob/file_scanner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Ffile_scanner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Ffile_scanner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Ffile_scanner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Ffile_scanner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/costajob","download_url":"https://codeload.github.com/costajob/file_scanner/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Ffile_scanner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272793018,"owners_count":24993830,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["filesystem","lazy-evaluation","ruby","scanner"],"created_at":"2024-10-24T09:05:03.807Z","updated_at":"2025-08-30T02:04:23.449Z","avatar_url":"https://github.com/costajob.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Table of Contents\n\n* [Scope](#scope)\n* [Motivation](#motivation)\n* [Installation](#installation)\n* [Usage](#usage)\n  * [Filters](#filters)\n    * [Defaults](#defaults)\n    * [Custom](#custom)\n  * [Worker](#worker)\n    * [Enumerator](#enumerator)\n    * [Consuming results](#consuming-results)\n    * [Mode](#mode)\n    * [File check](#file-check)\n    * [Logger](#logger)\n\n## Scope\nThis gem is aimed to lazily collect a list of files by path and a set of filters.\n\n## Motivation\nThis gem is helpful to purge obsolete files or to promote relevant ones, by calling external services (CDN APIs) and/or local file system actions (copy, move, delete, etc).  \nBy working lazily, this library is aimed to work with a subset of large files list: just remember to apply a subset method to the final enumerator.\n\n## Installation\nAdd this line to your application's Gemfile:\n```ruby\ngem \"file_scanner\"\n```\n\nAnd then execute:\n```shell\nbundle\n```\n\nOr install it yourself as:\n```shell\ngem install file_scanner\n```\n\n## Usage\n\n### Filters\nThe first step is to provide the filters list to select file paths for which the `call` method is *truthy*.  \n\n#### Defaults\nIf you specify no filters the default ones are loaded, selecting files by:\n* checking if file is older than *30 days* \n* checking if file size is within *0KB and 5KB*\n* checking if file *basename matches* the specified *regexp* (if any)\n\nYou can update default filters behaviour by passing custom arguments:\n```ruby\na_week_ago = FileScanner::Filters::LastAccess.new(Time.now-7*24*3600)\none_two_mb = FileScanner::Filters::SizeRange.new(min: 1024**2, max: 2*1024**2)\nhidden = FileScanner::Filters::MatchingName.new(/^\\./)\nfilters = [a_week_ago, one_two_mb, hidden]\n```\n\n#### Custom\nIt is convenient to create custom filters by using `Proc` instances that satisfy the `callable` protocol:\n```ruby\nfilters \u003c\u003c -\u003e(file) { File.directory?(file) }\n```\n\n### Worker\nThe second step is to create the `Worker` instance by providing the path to scan and the list of filters to apply.  \n\n#### Enumerator\nThe `call` method of the worker return a lazy enumerator with the filtered elements:\n```ruby\nworker = FileScanner::Worker.new(path: \"~/Downloads\", filters: filters, slice: 35)\np worker.call\n=\u003e #\u003cEnumerator::Lazy: ...\n```\n\n#### Consuming results\nTo leverage on the lazy behaviour remember to call a subset method on the resulting enumerator:\n```ruby\nworker.call.take(1000).each do |file|\n  # perform action on filtered files\nend\n```\n\n#### Mode\nBy default the worker does select paths by applying any of the matching filters: it suffice just one of the filters to match to grab the path.  \nIn case you want restrict paths selection by all matching filters, just specify the `all` option:\n```ruby\nworker = FileScanner::Worker.new(loader: loader, filters: filters, all: true)\nworker.call # will filter by applying all? predicate\n```\n\n#### File check\nBy default the worker does collect both directories and files. \nIn case you want restrict selction by files only, just specify the `filecheck` option:\n```ruby\nworker = FileScanner::Worker.new(loader: loader, filters: filters, filecheck: true)\nworker.call # skip directories\n```\n\n#### Logger\nIf you dare to trace what the worker is doing (including errors), you can specify a logger to the worker class:\n```ruby\nmy_logger = Logger.new(\"my_file.log\")\nworker = FileScanner::Worker.new(loader: loader, logger: my_logger)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcostajob%2Ffile_scanner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcostajob%2Ffile_scanner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcostajob%2Ffile_scanner/lists"}