https://github.com/imdeepmind/processed-imdb-wiki-dataset
Processes IMDB WIKI dataset ready to be used in any projects
https://github.com/imdeepmind/processed-imdb-wiki-dataset
age-classification computer-vision dataset deep-learning gender-classification imdb-wiki-dataset machine-learning
Last synced: about 1 year ago
JSON representation
Processes IMDB WIKI dataset ready to be used in any projects
- Host: GitHub
- URL: https://github.com/imdeepmind/processed-imdb-wiki-dataset
- Owner: imdeepmind
- License: mit
- Created: 2019-01-12T16:27:45.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-14T10:27:49.000Z (about 7 years ago)
- Last Synced: 2025-03-17T19:52:29.322Z (about 1 year ago)
- Topics: age-classification, computer-vision, dataset, deep-learning, gender-classification, imdb-wiki-dataset, machine-learning
- Language: Jupyter Notebook
- Size: 47.9 MB
- Stars: 62
- Watchers: 1
- Forks: 12
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Processed IMDB WIKI Dataset
This GitHub repository contains a preprocessed IMDB WIKI dataset.
## Table of contents:
- [Introduction](#introduction)
- [IMDB WIKI Dataset](#imdb-wiki-dataset)
- [The Problem](#the-problem)
- [The Solution](#the-solution)
- [File Structure](#file-structure)
- [How to Run Locally](#how-to-run-locally)
- [Dependencies](#dependencies)
- [Acknowledgments](#acknowledgments)
## Introduction
IMDB WIKI dataset is the largest dataset of human faces with gender, name and age information. In this project, I preprocessed the entire dataset so that it can be used easily without any problems.
## IMDB WIKI Dataset
IMDB WIKI dataset is the largest publically available dataset of human faces with gender, age, and name. It contains more than `500 thousand+` images with all the meta information. All the images are in `.jpg` format.
For more information about the dataset please visit [this website](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/).
## The Problem
The dataset is great for research purposes. It contains more than `500 thousand+` images of faces. But the dataset is not ready for any Machine Learning algorithm. There are some problems with the dataset.
- All the images are of different size
- Some of the images are completely corrupted
- Some images don't have any faces
- Some of the ages are invalid
- The distribution between the gender is not equal(there are more male faces than female faces)
- Also, the meta information is in `.mat` format. Reading `.mat` files in python is a tedious process.
## The Solution
In this project, I filter all the images, resized them all to `128x128`, remove all the images with invalid age, fix the gender distribution problem, and save them in the proper format. Along with that, I’ve also processed the `.mat` files and converted them in `.csv` files also.
## File Structure
This repository contains 3 files
- `mat.py`
- `gender.py`
- `age.py`
The first `mat.py` file converts the mat files IMDB and WIKI dataset to `.csv` format and merge them into one file.
The last two file process the images for gender and age classification.
**As the size of the dataset is huge, I can not upload it here on GitHub**
## How to Run Locally
Following are the steps for running it locally
- Download the dataset from [this](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/) link and unzip it
- Extract the dataset and save it in the project directory
- After that, you should have the following folders
- `imdb_crop`
- `wiki_crop`
- Run the `mat.py` file
- Run `age.py` and `gender.py` file
- Now the dataset in preprocessed and ready for your project
## Dependencies
- `Numpy=1.15.4`
- `Scipy=1.2.0`
- `pandas=0.23.4`
- `cv2=4.0.0`
## Acknowledgments
I really thankful to these peoples for providing this amazing dataset
- [IMDB-WIKI – 500k+ face images with age and gender labels](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/)
- [yu4u/age-gender-estimation](https://github.com/yu4u/age-gender-estimation)