Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/waikato-datamining/image-dataset-converter

For converting image annotation datasets from one format into another.
https://github.com/waikato-datamining/image-dataset-converter
conversion deep-learning image-dataset
Last synced: about 2 months ago
JSON representation
For converting image annotation datasets from one format into another.
Host: GitHub
URL: https://github.com/waikato-datamining/image-dataset-converter
Owner: waikato-datamining
License: mit
Created: 2024-02-06T22:52:39.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-05-16T23:15:07.000Z (7 months ago)
Last Synced: 2024-05-17T23:46:12.634Z (7 months ago)
Topics: conversion, deep-learning, image-dataset
Language: Python
Homepage:
Size: 287 KB
Stars: 0
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.rst
- License: LICENSE
Awesome Lists containing this project

README

        # image-dataset-converter

For converting image annotations datasets from one format into another.

Filters can be supplied as well, e.g., for cleaning up the data.

## Installation

Via PyPI:

```bash

pip install image-dataset-converter

```

The latest code straight from the repository:

```bash

pip install git+https://github.com/waikato-datamining/image-dataset-converter.git

```

## Docker

Docker images are available as well. Please see the following page por more information:

https://github.com/waikato-datamining/image-dataset-converter-all/tree/main/docker

## Dataset formats

The following dataset formats are supported:

| Domain               | Format                                                                        | Read                                       | Write                                | 

|:---------------------|:------------------------------------------------------------------------------|:-------------------------------------------|:-------------------------------------| 

| Image classification | [ADAMS](formats/adams.md)                                                     | [Y](plugins/from-adams-ic.md)              | [Y](plugins/to-adams-ic.md)          | 

| Image classification | [subdir](formats/subdir.md)                                                   | [Y](plugins/from-subdir-ic.md)             | [Y](plugins/to-subdir-ic.md)         | 

| Image segmentation   | [Blue-channel](formats/bluechannel.md)                                        | [Y](plugins/from-blue-channel-is.md)       | [Y](plugins/to-blue-channel-is.md)   | 

| Image segmentation   | [Grayscale](formats/grayscale.md)                                             | [Y](plugins/from-grayscale-is.md)          | [Y](plugins/to-grayscale-is.md)      | 

| Image segmentation   | [Indexed PNG](formats/indexedpng.md)                                          | [Y](plugins/from-indexed-png-is.md)        | [Y](plugins/to-indexed-png-is.md)    | 

| Image segmentation   | [Layer segments](formats/layersegments.md)                                    | [Y](plugins/from-layer-segments-is.md) | [Y](plugins/to-layer-segments-is.md) | 

| Object detection     | [ADAMS](formats/adams.md)                                                     | [Y](plugins/from-adams-od.md)              | [Y](plugins/to-adams-od.md)          | 

| Object detection     | [COCO](https://cocodataset.org/#format-data)                                  | [Y](plugins/from-coco-od.md)               | [Y](plugins/to-coco-od.md)           | 

| Object detection     | [OPEX](https://github.com/WaikatoLink2020/objdet-predictions-exchange-format) | [Y](plugins/from-opex-od.md)               | [Y](plugins/to-opex-od.md)           | 

| Object detection     | [ROI CSV](formats/roicsv.md)                                                  | [Y](plugins/from-roicsv-od.md)             | [Y](plugins/to-roicsv-od.md)         | 

| Object detection     | [VOC](formats/voc.md)                                                         | [Y](plugins/from-voc-od.md)                | [Y](plugins/to-voc-od.md)            | 

| Object detection     | [YOLO](formats/yolo.md)                                                       | [Y](plugins/from-yolo-od.md)               | [Y](plugins/to-yolo-od.md)           | 

## Tools

### Dataset conversion

```

usage: idc-convert [-h|--help|--help-all|--help-plugin NAME] [-u INTERVAL]

                   [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

                   reader

                   [filter [filter [...]]]

                   [writer]

Tool for converting between image annotation dataset formats.

readers (15):

   from-adams-ic, from-adams-od, from-blue-channel-is, from-coco-od, 

   from-data, from-grayscale-is, from-indexed-png-is, 

   from-layer-segments-is, from-opex-od, from-pyfunc, from-roicsv-od, 

   from-subdir-ic, from-voc-od, from-yolo-od, poll-dir

filters (30):

   check-duplicate-filenames, coerce-box, coerce-mask, 

   convert-image-format, dimension-discarder, discard-invalid-images, 

   discard-negatives, filter-labels, inspect, label-from-name, 

   label-present, map-labels, max-records, metadata, metadata-from-name, 

   od-to-ic, od-to-is, passthrough, polygon-discarder, 

   polygon-simplifier, pyfunc-filter, randomize-records, record-window, 

   remove-classes, rename, sample, split-records, strip-annotations, 

   tee, write-labels

writers (14):

   to-adams-ic, to-adams-od, to-blue-channel-is, to-coco-od, to-data, 

   to-grayscale-is, to-indexed-png-is, to-layer-segments-is, to-opex-od, 

   to-pyfunc, to-roicsv-od, to-subdir-ic, to-voc-od, to-yolo-od

optional arguments:

  -h, --help            show basic help message and exit

  --help-all            show basic help message plus help on all plugins and exit

  --help-plugin NAME    show help message for plugin NAME and exit

  -u INTERVAL, --update_interval INTERVAL

                        outputs the progress every INTERVAL records (default: 1000)

  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        the logging level to use (default: WARN)

  -b, --force_batch     processes the data in batches

```

### Executing pipeline multiple times

```

usage: idc-exec [-h] -p PIPELINE -g GENERATOR [-n] [-P PREFIX]

                [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

Tool for executing a pipeline multiple times, each time with a different set

of variables expanded. A variable is surrounded by curly quotes (e.g.,

variable 'i' gets referenced with '{i}'). Available generators: dirs, list,

null, range

optional arguments:

  -h, --help            show this help message and exit

  -p PIPELINE, --pipeline PIPELINE

                        The pipeline template with variables to expand and

                        then execute. (default: None)

  -g GENERATOR, --generator GENERATOR

                        The generator plugin to use. (default: None)

  -n, --dry_run         Applies the generator to the pipeline template and

                        only outputs it on stdout. (default: False)

  -P PREFIX, --prefix PREFIX

                        The string to prefix the pipeline with when in dry-run

                        mode. (default: None)

  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        The logging level to use. (default: WARN)

```

### Locating files

Readers tend to support input via file lists. The `idc-find` tool can generate

these.

```

usage: idc-find [-h] -i DIR [DIR ...] [-r] -o FILE [-m [REGEXP [REGEXP ...]]]

                [-n [REGEXP [REGEXP ...]]]

                [--split_ratios [SPLIT_RATIOS [SPLIT_RATIOS ...]]]

                [--split_names [SPLIT_NAMES [SPLIT_NAMES ...]]]

                [--split_name_separator SPLIT_NAME_SEPARATOR]

                [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

Tool for locating files in directories that match certain patterns and store

them in files.

optional arguments:

  -h, --help            show this help message and exit

  -i DIR [DIR ...], --input DIR [DIR ...]

                        The dir(s) to scan for files. (default: None)

  -r, --recursive       Whether to search the directories recursively

                        (default: False)

  -o FILE, --output FILE

                        The file to store the located file names in (default:

                        None)

  -m [REGEXP [REGEXP ...]], --match [REGEXP [REGEXP ...]]

                        The regular expression that the (full) file names must

                        match to be included (default: None)

  -n [REGEXP [REGEXP ...]], --not-match [REGEXP [REGEXP ...]]

                        The regular expression that the (full) file names must

                        match to be excluded (default: None)

  --split_ratios [SPLIT_RATIOS [SPLIT_RATIOS ...]]

                        The split ratios to use for generating the splits

                        (int; must sum up to 100) (default: None)

  --split_names [SPLIT_NAMES [SPLIT_NAMES ...]]

                        The split names to use as filename suffixes for the

                        generated splits (before .ext) (default: None)

  --split_name_separator SPLIT_NAME_SEPARATOR

                        The separator to use between file name and split name

                        (default: -)

  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        The logging level to use. (default: WARN)

```

### Generating help screens for plugins

```

usage: idc-help [-h] [-c [PACKAGE [PACKAGE ...]]] [-e EXCLUDED_CLASS_LISTERS]

                [-T {pipeline,generator}] [-p NAME] [-f {text,markdown}]

                [-L INT] [-o PATH] [-i FILE] [-t TITLE]

                [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

Tool for outputting help for plugins in various formats.

optional arguments:

  -h, --help            show this help message and exit

  -c [PACKAGE [PACKAGE ...]], --custom_class_listers [PACKAGE [PACKAGE ...]]

                        The custom class listers to use, uses the default ones

                        if not provided. (default: None)

  -e EXCLUDED_CLASS_LISTERS, --excluded_class_listers EXCLUDED_CLASS_LISTERS

                        The comma-separated list of class listers to exclude.

                        (default: None)

  -T {pipeline,generator}, --plugin_type {pipeline,generator}

                        The types of plugins to generate the help for.

                        (default: pipeline)

  -p NAME, --plugin_name NAME

                        The name of the plugin to generate the help for,

                        generates it for all if not specified (default: None)

  -f {text,markdown}, --help_format {text,markdown}

                        The output format to generate (default: text)

  -L INT, --heading_level INT

                        The level to use for the heading (default: 1)

  -o PATH, --output PATH

                        The directory or file to store the help in; outputs it

                        to stdout if not supplied; if pointing to a directory,

                        automatically generates file name from plugin name and

                        help format (default: None)

  -i FILE, --index_file FILE

                        The file in the output directory to generate with an

                        overview of all plugins, grouped by type (in markdown

                        format, links them to the other generated files)

                        (default: None)

  -t TITLE, --index_title TITLE

                        The title to use in the index file (default: image-

                        dataset-converter plugins)

  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        The logging level to use. (default: WARN)

```

### Plugin registry

```

usage: idc-registry [-h] [-c CUSTOM_CLASS_LISTERS] [-e EXCLUDED_CLASS_LISTERS]

                    [-l {plugins,pipeline,custom-class-listers,env-class-listers,readers,filters,writers,generators}]

For inspecting/querying the registry.

optional arguments:

  -h, --help            show this help message and exit

  -c CUSTOM_CLASS_LISTERS, --custom_class_listers CUSTOM_CLASS_LISTERS

                        The comma-separated list of custom class listers to

                        use. (default: None)

  -e EXCLUDED_CLASS_LISTERS, --excluded_class_listers EXCLUDED_CLASS_LISTERS

                        The comma-separated list of class listers to exclude.

                        (default: None)

  -l {plugins,pipeline,custom-class-listers,env-class-listers,readers,filters,writers,generators}, --list {plugins,pipeline,custom-class-listers,env-class-listers,readers,filters,writers,generators}

                        For outputting various lists on stdout. (default:

                        None)

```

### Testing generators

```

usage: idc-test-generator [-h] -g GENERATOR

                          [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

Tool for testing generators by outputting the generated variables and their

associatd values. Available generators: dirs, list, null, range

optional arguments:

  -h, --help            show this help message and exit

  -g GENERATOR, --generator GENERATOR

                        The generator plugin to use. (default: None)

  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        The logging level to use. (default: WARN)

```

## Plugins

You can find help screens for the plugins here:

* [Pipeline plugins](plugins/README.md) (readers/filters/writers)

* [Generator plugins](generators/README.md) (used by `idc-exec`)

## Command-line examples

Examples can be found on the [image-dataset-converter-examples](https://waikato-datamining.github.io/image-dataset-converter-examples/)

website.

## Class listers

The *image-dataset-converter* uses the *class lister registry* provided 

by the [seppl](https://github.com/waikato-datamining/seppl) library.

Each module defines a function, typically called `list_classes` that returns

a dictionary of names of superclasses associated with a list of modules that

should be scanned for derived classes. Here is an example:

```python

from typing import List, Dict

def list_classes() -> Dict[str, List[str]]:

    return {

        "seppl.io.Reader": [

            "mod.ule1",

            "mod.ule2",

        ],

        "seppl.io.Filter": [

            "mod.ule3",

            "mod.ule4",

        ],

        "seppl.io.Writer": [

            "mod.ule5",

        ],

    }

```

Such a class lister gets referenced in the `entry_points` section of the `setup.py` file:

```python

    entry_points={

        "class_lister": [

            "unique_string=module_name:function_name",

        ],

    },

```

`:function_name` can be omitted if `:list_classes`.

The following environment variables can be used to influence the class listers:

* `IDC_CLASS_LISTERS`

* `IDC_CLASS_LISTERS_EXCL`

Each variable is a comma-separated list of `module_name:function_name`, defining the class listers.

## Additional libraries

* [Image augmentation](https://github.com/waikato-datamining/image-dataset-converter-imgaug)

* [Image statistics](https://github.com/waikato-datamining/image-dataset-converter-imgstats)

* [Image visualizations](https://github.com/waikato-datamining/image-dataset-converter-imgvis)

* [Redis](https://github.com/waikato-datamining/image-dataset-converter-redis)

* [Video](https://github.com/waikato-datamining/image-dataset-converter-video)