{"id":19909092,"url":"https://github.com/dyne/harvest","last_synced_at":"2025-09-02T19:34:13.190Z","repository":{"id":15024152,"uuid":"77450028","full_name":"dyne/harvest","owner":"dyne","description":"Tool to sort large collections of files according to common typologies","archived":false,"fork":false,"pushed_at":"2022-11-15T10:35:31.000Z","size":1316,"stargazers_count":40,"open_issues_count":3,"forks_count":3,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-07T10:35:46.822Z","etag":null,"topics":["file-analysis","file-type-detection","forensics","tmsu","typology"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dyne.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"dyne","patreon":"dyneorg","open_collective":"dyne"}},"created_at":"2016-12-27T11:20:19.000Z","updated_at":"2024-11-25T16:24:55.000Z","dependencies_parsed_at":"2023-01-13T18:13:36.879Z","dependency_job_id":null,"html_url":"https://github.com/dyne/harvest","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyne%2Fharvest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyne%2Fharvest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyne%2Fharvest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyne%2Fharvest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dyne","download_url":"https://codeload.github.com/dyne/harvest/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252133723,"owners_count":21699586,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["file-analysis","file-type-detection","forensics","tmsu","typology"],"created_at":"2024-11-12T21:14:20.880Z","updated_at":"2025-05-03T02:31:33.851Z","avatar_url":"https://github.com/dyne.png","language":"Shell","funding_links":["https://github.com/sponsors/dyne","https://patreon.com/dyneorg","https://opencollective.com/dyne"],"categories":[],"sub_categories":[],"readme":"# Harvest - manage large collections of files and dirs\n\nHarvest makes it easy to list files and folders by type and copy or\nmove them around.\n\n![Kant handle my swag](docs/kant_handle_my_swag.jpeg)\n\nHarvest is a compact and portable script to scan files and folders and\nrecognise their typology. Scanning is based on [file\nextensions](https://github.com/dyne/file-extension-list) and a simple\nfuzzy logic analysis of **folder contents** (not just files) to\nrecognise if they are related to video, audio or text materials, etc.\n\nHarvest is **fast**: it can read approximately 1GB of stored filenames\nper second and is operated from the console terminal. It never\nmodifies the filesystem: that is done explicitly by the user piping\nshell commands.\n\n[![Software by Dyne.org ](https://files.dyne.org/software_by_dyne.png)](https://dyne.org)\n\nHarvest operates on folders containing files without exploding the\nfiles around: it assesses the typology of a folder from the files\ncontained, but does not promote move the files outside of that folder. For\ninstance it works very well to move around large collections of\ndownloaded torrent folders.\n\n## :floppy_disk: Installation\n\nHarvest is a Zsh script and works on any POSIX platform where it can be installed including GNU/Linux, Apple/OSX and MS/Windows.\n\nInstall the latest harvest with:\n```\ncurl https://raw.githubusercontent.com/dyne/harvest/main/harvest | sudo tee /usr/local/bin/harvest\n```\n\nDependencies: `zsh`\n\nOptional:\n- `fuse tmsu` for tagged filesystem\n- `setfattr` for setting file attributes\n\n## :video_game: Usage\n\nScan a folder /PATH/ to show and save results\n```\n harvest scan [PATH]\n```\n\nList of supported category types:\n```\n code image video book text font web archiv sheet exec slide audio\n```\n\nMove all scanned text files in /PATH/ to /DEST/\n```\n harvest scan [PATH] | grep ';text;' | xargs -rn1 -I% mv % [DEST]\n```\n\nTag all file attributes in /PATH/ with `harvest.type` categories\n```\n harvest attr [PATH]\n```\n\nTag all files for use with TMSU (See section below about TMSU)\n```\n harvest tmsu [PATH]\n```\n\n\n## TMSU\n\nThis implementation supports tagged filesystems using [TMSU](https://github.com/oniony/TMSU).\n\nTo allow the navigation of files in the style of a [Semantic Filesystem](https://en.wikipedia.org/wiki/Semantic_file_system), Harvest supports [TMSU](https://tmsu.org/), an small utility to maintain a database of tags inside an hidden directory `.tmsu` in each harvested folder.\n\nTo initialise a `tmsu` database bootstrapped with harvest's tags in the currently harvested folder, do:\n```\nharvest tmsu\n```\nDirectories indexed this way can then be \"mounted\" (using fuse) and navigated:\n```\nharvest mount\n```\nInside the `$harvest` hidden subfolder (pointing to `.mnt` inside the folder) tags will become folders containing symbolic links to the actual tagged files. Any filemananger following symbolic links can be used to navigate tags, also tags will be set as bookmarks in graphical filemanagers (GTK3 supported).\n\nIn addition to the tags view, there is also a queries folder in which you can run view queries by listing or creating new folders:\n```\nls -l \"$harvest/queries/text and 2018\"\n```\n This automatic creation of the query folders makes it possible to use new file queries within the file chooser of a graphical program simply by typing the query in. Unwanted query folders can be safely removed.\n\nLimited tag management is also possible via the virtual filesystem. For example one can remove specific tags from a file by deleting the symbolic link in the tag folder, or delete a tag by performing a recursive delete.\n\nTo unmount all TMSU semantic filesystems currently mounted, just do:\n```\nharvest umount\n```\nFurther TMSU operations are possible operating directly from inside the directories that have been indexed using `harvest tmsu`, for more information see `tmsu help`. For instance, TMSU also detects duplicate files using `tmsu dupes`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdyne%2Fharvest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdyne%2Fharvest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdyne%2Fharvest/lists"}