{"id":22780001,"url":"https://github.com/babab/tagfile","last_synced_at":"2025-06-14T16:04:07.697Z","repository":{"id":32452704,"uuid":"36032073","full_name":"babab/tagfile","owner":"babab","description":"Search, index and tag your files and find duplicates.","archived":false,"fork":false,"pushed_at":"2023-12-04T23:09:44.000Z","size":456,"stargazers_count":4,"open_issues_count":3,"forks_count":0,"subscribers_count":3,"default_branch":"devel","last_synced_at":"2025-05-21T19:49:45.163Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://babab.codeberg.page/tagfile","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"damienpontifex/SwiftOpenCL","license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/babab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-05-21T19:06:01.000Z","updated_at":"2024-09-23T15:46:00.000Z","dependencies_parsed_at":"2025-04-15T14:42:01.019Z","dependency_job_id":"d842b4f6-1f9c-4b9e-b1ac-2c6f8c48d385","html_url":"https://github.com/babab/tagfile","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/babab/tagfile","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babab%2Ftagfile","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babab%2Ftagfile/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babab%2Ftagfile/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babab%2Ftagfile/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/babab","download_url":"https://codeload.github.com/babab/tagfile/tar.gz/refs/heads/devel","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babab%2Ftagfile/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259843315,"owners_count":22920308,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-11T20:12:01.692Z","updated_at":"2025-06-14T16:04:07.685Z","avatar_url":"https://github.com/babab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tagfile\n\n## Introduction\n\nSearch, index and tag your files and find duplicates.\n\nThe goal of tagfile is to manage and organize any sort of file\n(documents, music, pictures and videos in particular) in a way that is\nnot tied to any file browser program, filesystem or operating system.\nThe metadata that tagfile creates and uses to keep track of these\nfiles should be portable for use in multiple computer systems and be\nindependent from any persistent mount points, filepaths or filenames.\n\nTagfile is primarily a unixy command-line application with a focus on\nsimplicity, interactivity and scriptabilty through (shell) scripts. It\nis shaping up to be an amalgamation of features from applications such\nas locate, ls, find, file, cksum, sort and grep performed in constrained\nscopes of specific sets of files that are defined and controlled by the\nuser in one or more sqlite databases.\n\nIt also is a package for Python (but the API is unstable at this point).\n\n## Index\n\n\u003c!-- vim-markdown-toc GFM --\u003e\n\n* [Homes](#homes)\n* [Features](#features)\n* [Quick Manual](#quick-manual)\n    * [Adding paths, indexing files and housekeeping](#adding-paths-indexing-files-and-housekeeping)\n    * [Finding duplicate files with clones command](#finding-duplicate-files-with-clones-command)\n    * [Usage examples: listing, searching and filtering files](#usage-examples-listing-searching-and-filtering-files)\n* [Help and synopses for commands](#help-and-synopses-for-commands)\n* [Installing tagfile](#installing-tagfile)\n* [Relation between media-paths, databases and config files](#relation-between-media-paths-databases-and-config-files)\n* [Status](#status)\n* [Software license](#software-license)\n\n\u003c!-- vim-markdown-toc --\u003e\n\n## Homes\n\n-   Docs: \u003chttps://babab.codeberg.page/tagfile\u003e\n-   Codeberg: \u003chttps://codeberg.org/babab/tagfile\u003e\n-   Github: \u003chttps://github.com/babab/tagfile\u003e\n\n## Features\n\n-   scan all files in a directory (media-path) recursively\n-   ignore files when scanning according to rules in user config\n-   maintain a list of media-paths to prune/scan on a regular basis\n-   index files with their checksums, size and MIME-type into a sqlite\n    database\n-   show a list of files in index, sortable by checksum, size and\n    mimetype\n-   find duplicate files, based on checksums\n-   find files by matching on checksum, mimetype, size, name or\n    substring of name and/or path\n-   prune index from files that got moved or deleted\n-   print results of *list* and *find* commands terminated with a null\n    character to use for piping to other utilities like xargs.\n-   configure aliases for certain commands and options (like git alias)\n\nFeatures to be implemented in later versions:\n\n-   remove duplicate files in the same directory\n-   remove duplicate files interactively across directories\n-   add user defined tags to files (using checksums, independent from\n    filenames)\n\nIdeas that may or may not be implemented in later versions:\n\n-   ability to filter files using tags to create listings to use with\n    other programs\n-   ability to use tags to create directory structures of symlinked\n    content\n\n## Quick Manual\n\n### Adding paths, indexing files and housekeeping\n\nOpen a terminal, and add one or more media-paths be scanned for files:\n\n``` console\ncd ~/Music\ntagfile add .\ntagfile add ~/Videos\n```\n\nThis will only save a reference to the directory. To actually walk\nthrough the directories and hash the files to get checksums, you can use\nthe `updatedb` command. This will recursively scan all media-paths\nyou've added and may take some time, especially the first time.\n\nFor both the prune and scan actions, progressbars will be shown with\nestimates of the remaining time to complete. Add a `--verbose` flag to\nalso output every filename and actions performed. Use Ctrl+C to cancel.\nAll progress already done will be saved.\n\n``` console\ntagfile updatedb\n```\n\nTo see statistics of indexed files and a list of media-paths:\n\n``` console\ntagfile info\n```\n\n### Finding duplicate files with clones command\n\nShow duplicate files using the clones command with/without option flags\n(see `tagfile help clones` to see all available options):\n\n``` console\ntagfile clones\n```\n\nShow hash, path, size and full MIME-type (using long opts):\n\n``` console\ntagfile clones --show-size -show-mime\n```\n\nShow hash, path, size and first part of MIME-type (using short opts):\n\n``` console\ntagfile clones -st\n```\n\n### Usage examples: listing, searching and filtering files\n\nThe list and find commands are the most important part of tagfile and\nprobably the reason why you might want to use it. What follows are some\nusage examples with both short and long optional arguments.\n\nList all files sorted by filesize (showing checksum, filesize and\nmimetype columns):\n\n``` console\ntagfile list -aS size\ntagfile list --show-all --sort=size\n```\n\nList all files with MIME-type text/plain sorted by filesize from small\nto big (showing checksum, filesize and mimetype columns):\n\n``` console\ntagfile find --mime text/plain -a -S size\ntagfile find --mime=text/plain -show-all --sort=size\n```\n\nList all files, sorted by filetype (showing checksum, size and type):\n\n``` console\ntagfile list -HstS type\ntagfile list --show-hash --show-size --show-type --sort=type\n```\n\nList all videos larger than 100MB, sorted by filesize from big to small\n(showing type and filesize):\n\n``` console\ntagfile find --type video --size-gt 104857600 -stS size --reverse\ntagfile find --type video --size-gt 104857600 --show-size --show-type --sort=size --reverse\n```\n\n## Help and synopses for commands\n\n\u003cdetails\u003e\u003csummary\u003etagfile\u003c/summary\u003e\n\n``` console\nUsage: tagfile [--config \u003cfilename\u003e] [--db \u003cname\u003e] \u003ccommand\u003e\n   or: tagfile [-h | --help] | [-V | --version]\n\nSearch, index and tag your files and find duplicates\n\nOptions:\n--config=\u003cfilename\u003e  use specified config file\n--db=\u003cname\u003e          use database \u003cname\u003e, defined in config file\n-h, --help           show this help information\n-V, --version        show version and platform information\n\nCommands:\n  add        add a directory to media paths\n  clones     show all indexed duplicate files\n  find       find files according to certain criterias\n  help       show help information\n  info       show statistics for index and media paths\n  list       show all indexed files\n  updatedb   scan media paths and index newly added files\n  version    show version and platform information\n\nSee 'tagfile help \u003ccommand\u003e' for more information on a\nspecific command, before using it.\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003etagfile add\u003c/summary\u003e\n\n``` console\nusage: tagfile add [-q | --quiet] \u003cmedia-path\u003e\n   or: tagfile add [-h | --help]\n\nAdd a directory to media paths\n\nOptions:\n-h, --help   show this help information\n-q, --quiet  print nothing except fatal errors\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003etagfile clones\u003c/summary\u003e\n\n``` console\nusage: tagfile clones [-s | --show-size] [-t | --show-type] [-m | --show-mime]\n   or: tagfile clones [-h | --help]\n\nShow files with matching checksums. In this overview the column\nwith hashes is always printed. Add `-stm` flags to display more\ncolumns.\n\nBy default, an extra line is printed after each list of clones,\nshowing the total number of duplicates. This can be hidden with\n`--hide-sum`.\n\nOptions:\n-h, --help       show this help information\n-s, --show-size  display column with filesizes\n-t, --show-type  display column with MIME type\n-m, --show-mime  display column with MIME type/subtype\n--hide-sum       do not print \"X clones/duplicates\" line\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003etagfile find\u003c/summary\u003e\n\n``` console\nusage: tagfile find [--type=TYPE] [--mime=MIMETYPE] [--size-gt=BYTES]\n                    [--size-lt=BYTES] [--hash=HEX] [--in-path=STRING]\n                    [--name=NAME | --in-name=STRING] [-H | --show-hash]\n                    [-s | --show-size] [-t | --show-type] [-m | --show-mime]\n                    [-a | --show-all] [-S COL | --sort=COL] [--reverse]\n\n   or: tagfile find [--type=TYPE] [--mime=MIMETYPE] [--size-gt=BYTES]\n                    [--size-lt=BYTES] [--hash=HEX] [--in-path=STRING]\n                    [--name=NAME | --in-name=STRING] [-0 | --print0]\n                    [-S COL | --sort=COL] [--reverse]\n\n   or: tagfile find [-h | --help]\n\nFind files according to certain criterias\n\nOptions:\n-h, --help          show this help information\n--type=TYPE         match files on 1st part of MIME type\n--mime=MIMETYPE     match files on full MIME type/subtype\n--size-gt=BYTES     match files where size is greater than BYTES\n--size-lt=BYTES     match files where size is lesser than BYTES\n--hash=HEX          match files where checksum is (or starts with) HEX\n--in-path=STRING    match absolute paths with a substring of STRING\n--name=NAME         match filenames that are exactly NAME\n--in-name=STRING    match filenames with a substring of STRING\n-H, --show-hash     display column with checksum hash\n-s, --show-size     display column with filesizes\n-t, --show-type     display column with MIME type\n-m, --show-mime     display column with MIME type/subtype\n-a, --show-all      display hash, size, mime (same as -Hsm)\n-S COL, --sort=COL  sort on: name, hash, size, type or mime\n--reverse           reverse sort order\n-0, --print0        end lines with null instead of newline\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003etagfile help\u003c/summary\u003e\n\n``` console\nusage: tagfile help [\u003ccommand\u003e]\n\nShow usage information (for subcommands)\n\nOptions:\n-h, --help  show usage information for help command\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003etagfile info\u003c/summary\u003e\n\n``` console\nusage: tagfile info [-C | --show-config]\n   or: tagfile info [-h | --help]\n\nShow media paths, user config and statistics for index.\n\nOptions:\n-h, --help         show this help information\n-C, --show-config  pretty print active config in python\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003etagfile list\u003c/summary\u003e\n\n``` console\nusage: tagfile list [-H | --show-hash] [-s | --show-size] [-t | --show-type]\n                    [-m | --show-mime] [-a | --show-all] [-S COL | --sort=COL]\n                    [--reverse]\n\n   or: tagfile list [-0 | --print0] [-S COL | --sort=COL] [--reverse]\n\n   or: tagfile list [-h | --help]\n\nOutput a list of all indexed files.\nBy default, the list is sorted on file path.\n\nOptions:\n-h, --help          show this help information\n-H, --show-hash     display column with checksum hash\n-s, --show-size     display column with filesizes\n-t, --show-type     display column with MIME type\n-m, --show-mime     display column with MIME type/subtype\n-a, --show-all      display hash, size, mime (same as -Hsm)\n-S COL, --sort=COL  sort on: name, hash, size, type or mime\n--reverse           reverse sort order\n-0, --print0        end lines with null instead of newline\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003etagfile updatedb\u003c/summary\u003e\n\n``` console\nusage: tagfile updatedb [-v, --verbose] [-q, --quiet] [--prune] [--scan]\n                        [-n ID, --path-id=ID]\n\n   or: tagfile updatedb [-h | --help]\n\nScan media paths. Index added files and prune removed files.\n\nUse the option `--prune` if you only want to remove entries\nfrom the index if files are missing. Use the option `--scan`\nto only scan for newly added files without pruning.\n\nTo prune and/or scan for a single media-path only, use\n`--path-id=ID`. See tagfile info for an overview of paths/ID's.\n\nOptions:\n-h, --help           show this help information\n-v, --verbose        display a message for every action\n-q, --quiet          display nothing except fatal errors\n--prune              prune removed files only; don't scan\n--scan               scan for new files only; don't prune\n-n ID, --path-id=ID  prune/scan only files in path with this id\n\nWhen no options are specified, updatedb will both scan and prune.\nIt will always prune deleted files before scanning for new files.\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003etagfile version\u003c/summary\u003e\n\n``` console\nusage: tagfile version [-h | --help]\n\nShow version and platform information\n\nOptions:\n-h, --help  show this help information\n```\n\n\u003c/details\u003e\n\n## Installing tagfile\n\n**All commands should be run as a regular user (not root).**\n\nTagfile is a command-line end-user application written in Python that is\ndependant on packages from PyPI. You can install it using pip. But using\npipx (\u003chttps://pypa.github.io/pipx/\u003e) is recommended because it avoids\ndependency problems and/or clashes with python packages from your\nsystem's package manager in the future.\n\nInstall latest **release** from PyPI:\n\n``` console\npipx install tagfile\n```\n\nInstall latest **development version** from git:\n\n``` console\npipx install git+https://github.com/babab/tagfile@devel\n```\n\nTo build and install **from source** you can use:\n\n``` console\nmake install\n```\n\nTo **upgrade** or **uninstall** tagfile in the future you can use:\n\n``` console\npipx upgrade tagfile\npipx uninstall tagfile\n```\n\n\n## Relation between media-paths, databases and config files\n\nBy default, tagfile uses one config file and one database.\n\nA config file:\n\n-   Contains a single set of ignore rules for all databases.\n-   Defines one or more databases. New databases must be defined in the\n    config `[databases]` section with a `name = \"location-path\"`\n    key-value pair.\n-   Can be specified with the tagfile `--config=FILENAME` option\n\nA database:\n\n-   Can contain zero, one or multiple media-paths.\n-   The most used commands/actions (add, find, list and updatedb) are\n    performed in a database-wide scope.\n-   The default database to use can be:\n\n    \u003e -   configured in the config file `default_database = \"name\"`\n    \u003e     setting.\n    \u003e -   specified with the tagfile `--config=FILENAME` option\n\nA media-path is a parent directory that contains one or more files you\nwant to index. By scanning with `updatedb`, tagfile will walk\nrecursively through all subdirectories and add any file that does not\nmatch the ignore rules from the config. Any files that are indexed but\nremoved in the filesystem itself afterwards, will be pruned from the\nindex on the next run of `updatedb`.\n\n## Status\n\n**Until a stable version 1.0.0 is ready, the API, CLI and config\nsettings are subject to change from 0.x version to 0.x version, likely\nwithout offering migrations.** Tagfile adheres to [Semantic\nVersioning](https://semver.org).\n\n-   Current stable release: **v0.1.0**\n-   Current dev/git version: *v0.2.0a13*\n\nTagfile has been written in a short time and used by me sporadically for\n8 years after that. All code was contained in a single file script in\n`~/bin`, available from Github only.\n\nStarting in March 2023 I've decided to properly release it to PyPI and\nflesh out the current project structure, command interface and database\nhandling before working on new features so it may live up to its name.\nSince at this moment in time, you cannot tag your files yet :)\n\nPrerequisites:\n\n-   Python 3.8 or later\n\nDependencies (automatically installed with pipx / pip):\n\n-   Peewee ORM (\u003chttps://peewee.readthedocs.org/en/latest/\u003e)\n-   pycommand (\u003chttps://babab.github.io/pycommand/\u003e)\n-   python-magic (\u003chttps://pypi.python.org/pypi/python-magic/\u003e)\n-   rich (\u003chttps://pypi.python.org/pypi/rich/\u003e)\n\n## Software license\n\nCopyright (c) 2015-2023 Benjamin Althues \\\u003cbenjamin at babab . nl\\\u003e\n\nTagfile is open source software, licensed under a BSD-3-Clause license.\nSee the [LICENSE](https://github.com/babab/tagfile/blob/devel/LICENSE)\nfile for the full license text.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbabab%2Ftagfile","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbabab%2Ftagfile","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbabab%2Ftagfile/lists"}