Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davanstrien/hugit-cli
push ImageFolder style image datasets to the ๐ค Hub from the command line
https://github.com/davanstrien/hugit-cli
cli datasets huggingface-datasets
Last synced: 29 days ago
JSON representation
push ImageFolder style image datasets to the ๐ค Hub from the command line
- Host: GitHub
- URL: https://github.com/davanstrien/hugit-cli
- Owner: davanstrien
- License: mit
- Created: 2022-03-30T16:05:31.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-03-02T06:00:54.000Z (over 1 year ago)
- Last Synced: 2024-08-04T03:09:24.746Z (3 months ago)
- Topics: cli, datasets, huggingface-datasets
- Language: Python
- Homepage: https://hugit-cli.readthedocs.io/en/latest/
- Size: 946 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Hugit
[![PyPI](https://img.shields.io/pypi/v/hugit.svg)][pypi_]
[![Status](https://img.shields.io/pypi/status/hugit.svg)][status]
[![Python Version](https://img.shields.io/pypi/pyversions/hugit)][python version]
[![License](https://img.shields.io/pypi/l/hugit)][license][![Read the documentation at https://hugit.readthedocs.io/](https://img.shields.io/readthedocs/hugit-cli/latest.svg?label=Read%20the%20Docs)][read the docs]
[![Tests](https://github.com/davanstrien/hugit-cli/workflows/Tests/badge.svg)][tests]
[![Codecov](https://codecov.io/gh/davanstrien/hugit-cli/branch/main/graph/badge.svg)][codecov][![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)][pre-commit]
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)][black][pypi_]: https://pypi.org/project/hugit/
[status]: https://pypi.org/project/hugit/
[python version]: https://pypi.org/project/hugit
[license]: https://opensource.org/licenses/MIT
[read the docs]: https://hugit-cli.readthedocs.io/
[tests]: https://github.com/davanstrien/hugit/actions?workflow=Tests
[codecov]: https://app.codecov.io/gh/davanstrien/hugit
[pre-commit]: https://github.com/pre-commit/pre-commit
[black]: https://github.com/psf/black**Warning**: this code is very much a work in progress and is primarily being intended for a particular workflow. It may not work well (or at all)ย for your workflow.
`hugit` is a command line tool for loading ImageFolder style datasets into a ๐ค `datasets` `Dataset` and pushing to the ๐ค hub.
The primary goal of `hugit` is to help quickly get a local dataset into a format that can be used for training computer vision models. `hugit` was developed to support the workflow for [`flyswot`](https://github.com/davanstrien/flyswot/) where we wanted a quicker iteration between creating new training data, training a model, and using the new model inside [`flyswot`](https://github.com/davanstrien/flyswot/).
![hugit workflow diagram](/docs/assets/hugit-workflow.png)
## Supported formats
At the moment **hugit** supports ImageFolder style datasets i.e:
```bash
data/
dog/
dog1.jpg
cat/
cat.1.jpg```
## Features
- A command line interface for quickly loading a dataset stored on disk into a ๐ค `datasets.Dataset`
- Push your local dataset to the ๐ค hub
- Get statistics about your dataset. These statistics focus on 'high level' statistic that would be useful to include in Datasheets and Model Cards. Currently these statistics include:
- label frequencies, organised by split
- train, test, valid split sizes## Installation
You can install _Hugit_ via [pip] from [PyPI], inside a virtual environment install `hugit` using
```console
$ pip install hugit
```Alternatively, you can use [pipx](https://pypa.github.io/pipx/) to install `hugit`
```console
$ pipx install hugit
```## Usage
You can see help for `hugit` using `hugit --help`
```
Usage: hugit [OPTIONS] COMMAND [ARGS]...
Hugit Command Line
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Commands โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ convert_images Convert images in directory to `save_format` โ
โ push_image_dataset Load an ImageFolder style dataset. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ```
To load an ImageFolder style dataset onto the ๐ค Hub you can use the `push_image_dataset` command.
```
Usage: hugit push_image_dataset [OPTIONS] DIRECTORY
Load an ImageFolder style dataset.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --repo-id TEXT Repo id for the Hugging Face Hub [required] โ
โ --private/--no-private Whether to keep dataset private on the Hub [default: private] โ
โ --do-resize/--no-do-resize Whether to resize images before upload [default: no-do-resize] โ
โ --size INTEGER Size to resize image. This will be used on the shortest side of the image i.e. the aspect ratio will be โ
โ maintained โ
โ [default: 224] โ
โ --preserve-file-path/--no-preserve-file-path preserve original file path [default: preserve-file-path] โ
โ --ignore-verifications/--no-ignore-verifications Whether to perform verifications on the file before loading into dataset [default: ignore-verifications] โ
โ --huggingface-hub-token TEXT Hugging Face Hub authentication token [default: ***] โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ```
Under the hood `hugit` uses [`typed-settings`](https://typed-settings.readthedocs.io/en/latest/index.html), which means that configuration can either be done through the command line or through a `TOML` file. See [usage] for more detailed discussion of how to use `hugit`.
## Contributing
It is likely that _Hugit_ may only work for our particular workflow. With that said if you have suggestions please open an issue.
## License
Distributed under the terms of the [MIT license],
_Hugit_ is free and open source software.## Issues
If you encounter any problems,
please [file an issue] along with a detailed description.## Credits
This project was generated from [@cjolowicz]'s [Hypermodern Python Cookiecutter] template.
[@cjolowicz]: https://github.com/cjolowicz
[cookiecutter]: https://github.com/audreyr/cookiecutter
[mit license]: https://opensource.org/licenses/MIT
[pypi]: https://pypi.org/
[hypermodern python cookiecutter]: https://github.com/cjolowicz/cookiecutter-hypermodern-python
[file an issue]: https://github.com/davanstrien/hugit/issues
[pip]: https://pip.pypa.io/[contributor guide]: https://github.com/davanstrien/hugit/blob/main/CONTRIBUTING.md
[usage]: https://hugit-cli.readthedocs.io/en/latest/usage.html