{"id":13737503,"url":"https://github.com/davanstrien/hugit-cli","last_synced_at":"2025-04-05T10:12:40.500Z","repository":{"id":37103304,"uuid":"475954371","full_name":"davanstrien/hugit-cli","owner":"davanstrien","description":"push ImageFolder style image datasets to the 🤗 Hub from the command line","archived":false,"fork":false,"pushed_at":"2023-03-02T06:00:54.000Z","size":969,"stargazers_count":2,"open_issues_count":16,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-31T09:13:53.306Z","etag":null,"topics":["cli","datasets","huggingface-datasets"],"latest_commit_sha":null,"homepage":"https://hugit-cli.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davanstrien.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-03-30T16:05:31.000Z","updated_at":"2022-11-08T12:33:52.000Z","dependencies_parsed_at":"2024-01-12T04:45:45.791Z","dependency_job_id":"677f974a-4e3f-4fc5-934f-7fbcdd919d1f","html_url":"https://github.com/davanstrien/hugit-cli","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fhugit-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fhugit-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fhugit-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fhugit-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davanstrien","download_url":"https://codeload.github.com/davanstrien/hugit-cli/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247318746,"owners_count":20919484,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","datasets","huggingface-datasets"],"created_at":"2024-08-03T03:01:51.148Z","updated_at":"2025-04-05T10:12:40.480Z","avatar_url":"https://github.com/davanstrien.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Hugit\n\n[![PyPI](https://img.shields.io/pypi/v/hugit.svg)][pypi_]\n[![Status](https://img.shields.io/pypi/status/hugit.svg)][status]\n[![Python Version](https://img.shields.io/pypi/pyversions/hugit)][python version]\n[![License](https://img.shields.io/pypi/l/hugit)][license]\n\n[![Read the documentation at https://hugit.readthedocs.io/](https://img.shields.io/readthedocs/hugit-cli/latest.svg?label=Read%20the%20Docs)][read the docs]\n[![Tests](https://github.com/davanstrien/hugit-cli/workflows/Tests/badge.svg)][tests]\n[![Codecov](https://codecov.io/gh/davanstrien/hugit-cli/branch/main/graph/badge.svg)][codecov]\n\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white)][pre-commit]\n[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)][black]\n\n[pypi_]: https://pypi.org/project/hugit/\n[status]: https://pypi.org/project/hugit/\n[python version]: https://pypi.org/project/hugit\n[license]: https://opensource.org/licenses/MIT\n[read the docs]: https://hugit-cli.readthedocs.io/\n[tests]: https://github.com/davanstrien/hugit/actions?workflow=Tests\n[codecov]: https://app.codecov.io/gh/davanstrien/hugit\n[pre-commit]: https://github.com/pre-commit/pre-commit\n[black]: https://github.com/psf/black\n\n**Warning**: this code is very much a work in progress and is primarily being intended for a particular workflow. It may not work well (or at all) for your workflow.\n\n`hugit` is a command line tool for loading ImageFolder style datasets into a 🤗 `datasets` `Dataset` and pushing to the 🤗 hub.\n\nThe primary goal of `hugit` is to help quickly get a local dataset into a format that can be used for training computer vision models. `hugit` was developed to support the workflow for [`flyswot`](https://github.com/davanstrien/flyswot/) where we wanted a quicker iteration between creating new training data, training a model, and using the new model inside [`flyswot`](https://github.com/davanstrien/flyswot/).\n\n![hugit workflow diagram](/docs/assets/hugit-workflow.png)\n\n## Supported formats\n\nAt the moment **hugit** supports ImageFolder style datasets i.e:\n\n```bash\ndata/\n    dog/\n        dog1.jpg\n    cat/\n        cat.1.jpg\n\n```\n\n## Features\n\n- A command line interface for quickly loading a dataset stored on disk into a 🤗 `datasets.Dataset`\n- Push your local dataset to the 🤗 hub\n- Get statistics about your dataset. These statistics focus on 'high level' statistic that would be useful to include in Datasheets and Model Cards. Currently these statistics include:\n  - label frequencies, organised by split\n  - train, test, valid split sizes\n\n## Installation\n\nYou can install _Hugit_ via [pip] from [PyPI], inside a virtual environment install `hugit` using\n\n```console\n$ pip install hugit\n```\n\nAlternatively, you can use [pipx](https://pypa.github.io/pipx/) to install `hugit`\n\n```console\n$ pipx install hugit\n```\n\n## Usage\n\nYou can see help for `hugit` using `hugit --help`\n\n\u003c!-- [[[cog\nimport cog\nfrom hugit import cli\nfrom click.testing import CliRunner\nrunner = CliRunner()\nresult = runner.invoke(cli.cli, [\"--help\"])\nhelp = result.output.replace(\"Usage: cli\", \"Usage: hugit\")\ncog.out(\n    \"```\\n{}\\n```\".format(help)\n)\n]]] --\u003e\n\n```\n\n Usage: hugit [OPTIONS] COMMAND [ARGS]...\n\n Hugit Command Line\n\n╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ --help      Show this message and exit.                                                                                                                                  │\n╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ convert_images                                     Convert images in directory to `save_format`                                                                          │\n│ push_image_dataset                                 Load an ImageFolder style dataset.                                                                                    │\n╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n\n\n```\n\n\u003c!-- [[[end]]] --\u003e\n\nTo load an ImageFolder style dataset onto the 🤗 Hub you can use the `push_image_dataset` command.\n\n\u003c!-- [[[cog\nimport cog\nfrom hugit import cli\nfrom click.testing import CliRunner\nrunner = CliRunner()\nresult = runner.invoke(cli.cli, [\"push_image_dataset\", \"--help\"])\nhelp = result.output.replace(\"Usage: cli\", \"Usage: hugit\")\ncog.out(\n    \"```\\n{}\\n```\".format(help)\n)\n]]] --\u003e\n\n```\n\n Usage: hugit push_image_dataset [OPTIONS] DIRECTORY\n\n Load an ImageFolder style dataset.\n\n╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ *  --repo-id                                           TEXT     Repo id for the Hugging Face Hub [required]                                                              │\n│    --private/--no-private                                       Whether to keep dataset private on the Hub [default: private]                                            │\n│    --do-resize/--no-do-resize                                   Whether to resize images before upload [default: no-do-resize]                                           │\n│    --size                                              INTEGER  Size to resize image. This will be used on the shortest side of the image i.e. the aspect ratio will be  │\n│                                                                 maintained                                                                                               │\n│                                                                 [default: 224]                                                                                           │\n│    --preserve-file-path/--no-preserve-file-path                 preserve original file path [default: preserve-file-path]                                                │\n│    --ignore-verifications/--no-ignore-verifications             Whether to perform verifications on the file before loading into dataset [default: ignore-verifications] │\n│    --huggingface-hub-token                             TEXT     Hugging Face Hub authentication token  [default: ***]                                                    │\n│    --help                                                       Show this message and exit.                                                                              │\n╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n\n\n```\n\n\u003c!-- [[[end]]] --\u003e\n\nUnder the hood `hugit` uses [`typed-settings`](https://typed-settings.readthedocs.io/en/latest/index.html), which means that configuration can either be done through the command line or through a `TOML` file. See [usage] for more detailed discussion of how to use `hugit`.\n\n## Contributing\n\nIt is likely that _Hugit_ may only work for our particular workflow. With that said if you have suggestions please open an issue.\n\n## License\n\nDistributed under the terms of the [MIT license],\n_Hugit_ is free and open source software.\n\n## Issues\n\nIf you encounter any problems,\nplease [file an issue] along with a detailed description.\n\n## Credits\n\nThis project was generated from [@cjolowicz]'s [Hypermodern Python Cookiecutter] template.\n\n[@cjolowicz]: https://github.com/cjolowicz\n[cookiecutter]: https://github.com/audreyr/cookiecutter\n[mit license]: https://opensource.org/licenses/MIT\n[pypi]: https://pypi.org/\n[hypermodern python cookiecutter]: https://github.com/cjolowicz/cookiecutter-hypermodern-python\n[file an issue]: https://github.com/davanstrien/hugit/issues\n[pip]: https://pip.pypa.io/\n\n\u003c!-- github-only --\u003e\n\n[contributor guide]: https://github.com/davanstrien/hugit/blob/main/CONTRIBUTING.md\n[usage]: https://hugit-cli.readthedocs.io/en/latest/usage.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavanstrien%2Fhugit-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavanstrien%2Fhugit-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavanstrien%2Fhugit-cli/lists"}