https://github.com/soroushj/image-dataset-loader

Load image datasets as NumPy arrays
https://github.com/soroushj/image-dataset-loader

dataset image-dataset machine-learning numpy-arrays numpy-data

Last synced: 2 months ago
JSON representation

Load image datasets as NumPy arrays

Host: GitHub
URL: https://github.com/soroushj/image-dataset-loader
Owner: soroushj
License: mit
Created: 2019-06-26T12:22:22.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2024-03-26T21:11:10.000Z (over 1 year ago)
Last Synced: 2025-04-03T20:18:42.795Z (9 months ago)
Topics: dataset, image-dataset, machine-learning, numpy-arrays, numpy-data
Language: Python
Homepage:
Size: 31.3 KB
Stars: 6
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # image-dataset-loader: Load image datasets as NumPy arrays

[![PyPI](https://img.shields.io/pypi/v/image-dataset-loader.svg)](https://pypi.org/project/image-dataset-loader/)

[![MIT license](https://img.shields.io/badge/license-MIT-brightgreen.svg)](https://opensource.org/licenses/MIT)

## Installation

```bash

pip install image-dataset-loader

```

## Overview

Suppose you have an image dataset in a directory which looks like this:

```

data/

  train/

    cats/

      cat0001.jpg

      cat0002.jpg

      ...

    dogs/

      dog0001.jpg

      dog0002.jpg

      ...

  test/

    cats/

      cat0001.jpg

      cat0002.jpg

      ...

    dogs/

      dog0001.jpg

      dog0002.jpg

      ...

```

You can use the `image_dataset_loader.load` function to load this dataset as NumPy arrays:

```python

import image_dataset_loader

(x_train, y_train), (x_test, y_test) = image_dataset_loader.load('path/to/data', ['train', 'test'])

```

The shape of the `x_*` arrays will be `(instances, rows, cols, channels)` for color images and `(instances, rows, cols)` for grayscale images.

Also, the shape of the `y_*` arrays will be `(instances,)`.

All images in the dataset must have the same shape.

Also, all data subsets (i.e., `train` and `test` in this example) must contain the same set of classes.

Class names will be sorted alphabetically.

So, in this example, `cats` and `dogs` will be represented by `0` and `1`, respectively.

You can also load a single data subset. For example:

```python

(x_train, y_train), = image_dataset_loader.load('path/to/data', ['train'])

```

Note that the comma after `(x_train, y_train)` is required, because the function always returns a tuple of tuples.

## API

```python

load(dataset_path, set_names,

     shuffle=True, seed=None,

     x_dtype='uint8', y_dtype='uint32')

```

- **`dataset_path:`** Path to the dataset directory.

- **`set_names:`** List of the data subsets (subdirectories of the dataset directory).

- **`shuffle:`** Whether to shuffle the samples. If false, instances will be sorted by class name and then by file name.

- **`seed:`** Random seed used for shuffling (see the [docs](https://docs.python.org/3/library/random.html#random.seed)).

- **`x_dtype:`** NumPy data type for the X arrays (see the [docs](https://numpy.org/devdocs/user/basics.types.html)).

- **`y_dtype:`** NumPy data type for the Y arrays (see the [docs](https://numpy.org/devdocs/user/basics.types.html)).

- Returns a tuple of `(x, y)` tuples corresponding to `set_names`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/soroushj/image-dataset-loader

Awesome Lists containing this project

README