https://github.com/soroushj/image-dataset-loader
Load image datasets as NumPy arrays
https://github.com/soroushj/image-dataset-loader
dataset image-dataset machine-learning numpy-arrays numpy-data
Last synced: about 2 months ago
JSON representation
Load image datasets as NumPy arrays
- Host: GitHub
- URL: https://github.com/soroushj/image-dataset-loader
- Owner: soroushj
- License: mit
- Created: 2019-06-26T12:22:22.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-03-26T21:11:10.000Z (over 1 year ago)
- Last Synced: 2025-04-03T20:18:42.795Z (8 months ago)
- Topics: dataset, image-dataset, machine-learning, numpy-arrays, numpy-data
- Language: Python
- Homepage:
- Size: 31.3 KB
- Stars: 6
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# image-dataset-loader: Load image datasets as NumPy arrays
[](https://pypi.org/project/image-dataset-loader/)
[](https://opensource.org/licenses/MIT)
## Installation
```bash
pip install image-dataset-loader
```
## Overview
Suppose you have an image dataset in a directory which looks like this:
```
data/
train/
cats/
cat0001.jpg
cat0002.jpg
...
dogs/
dog0001.jpg
dog0002.jpg
...
test/
cats/
cat0001.jpg
cat0002.jpg
...
dogs/
dog0001.jpg
dog0002.jpg
...
```
You can use the `image_dataset_loader.load` function to load this dataset as NumPy arrays:
```python
import image_dataset_loader
(x_train, y_train), (x_test, y_test) = image_dataset_loader.load('path/to/data', ['train', 'test'])
```
The shape of the `x_*` arrays will be `(instances, rows, cols, channels)` for color images and `(instances, rows, cols)` for grayscale images.
Also, the shape of the `y_*` arrays will be `(instances,)`.
All images in the dataset must have the same shape.
Also, all data subsets (i.e., `train` and `test` in this example) must contain the same set of classes.
Class names will be sorted alphabetically.
So, in this example, `cats` and `dogs` will be represented by `0` and `1`, respectively.
You can also load a single data subset. For example:
```python
(x_train, y_train), = image_dataset_loader.load('path/to/data', ['train'])
```
Note that the comma after `(x_train, y_train)` is required, because the function always returns a tuple of tuples.
## API
```python
load(dataset_path, set_names,
shuffle=True, seed=None,
x_dtype='uint8', y_dtype='uint32')
```
- **`dataset_path:`** Path to the dataset directory.
- **`set_names:`** List of the data subsets (subdirectories of the dataset directory).
- **`shuffle:`** Whether to shuffle the samples. If false, instances will be sorted by class name and then by file name.
- **`seed:`** Random seed used for shuffling (see the [docs](https://docs.python.org/3/library/random.html#random.seed)).
- **`x_dtype:`** NumPy data type for the X arrays (see the [docs](https://numpy.org/devdocs/user/basics.types.html)).
- **`y_dtype:`** NumPy data type for the Y arrays (see the [docs](https://numpy.org/devdocs/user/basics.types.html)).
- Returns a tuple of `(x, y)` tuples corresponding to `set_names`.