https://github.com/dynamicanupam/hand_gesture_recognition_using_cnn-rnn
Build deep learning model for detecting hand gestures for Smart TVs using CNN and RNN
https://github.com/dynamicanupam/hand_gesture_recognition_using_cnn-rnn
3d-convs cnn-rnn generator-functions model-building-and-training model-evaluation model-tuning
Last synced: 8 months ago
JSON representation
Build deep learning model for detecting hand gestures for Smart TVs using CNN and RNN
- Host: GitHub
- URL: https://github.com/dynamicanupam/hand_gesture_recognition_using_cnn-rnn
- Owner: dynamicanupam
- Created: 2024-04-29T05:18:46.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-29T09:07:24.000Z (over 1 year ago)
- Last Synced: 2025-03-01T04:42:07.486Z (over 1 year ago)
- Topics: 3d-convs, cnn-rnn, generator-functions, model-building-and-training, model-evaluation, model-tuning
- Language: Jupyter Notebook
- Homepage:
- Size: 342 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Gesture Recognition
> Make a Smart TV system which can control the TV with user’s hand gestures as the remote control.
---
## Problem Statement
Imagine you are working as a data scientist at a home electronics company which manufactures state of the art smart televisions. You want to develop a cool feature in the smart-TV that can recognise five different gestures performed by the user which will help users control the TV without using a remote.
The gestures are continuously monitored by the webcam mounted on the TV. Each gesture corresponds to a specific command:
Thumbs up: Increase the volume
Thumbs down: Decrease the volume
Left swipe: 'Jump' backwards 10 seconds
Right swipe: 'Jump' forward 10 seconds
Stop: Pause the movie
## Understanding the Dataset
The training data consists of a few hundred videos categorised into one of the five classes. Each video (typically 2-3 seconds long) is divided into a sequence of 30 frames(images). These videos have been recorded by various people performing one of the five gestures in front of a webcam - similar to what the smart TV will use.
The data is in a zip file. The zip file contains a 'train' and a 'val' folder with two CSV files for the two folders.
These folders are in turn divided into subfolders where each subfolder represents a video of a particular gesture. Each subfolder, i.e. a video, contains 30 frames (or images). Note that all images in a particular video subfolder have the same dimensions but different videos may have different dimensions. Specifically, videos have two types of dimensions - either 360x360 or 120x160 (depending on the webcam used to record the videos). Hence, you will need to do some pre-processing to standardise the videos.
## Deep Learning Models used
1. Conv3D
2. CNN with LSTM
3. CNN with GRU
4. CNN with GRU (with trainable weights of Transfer Learning)
## Conclusion
Based on the results CNN with GRU (with trainable weights of Transfer Learning) model
performing well on the dataset provided for Gesture Recognition.
Selected model: **CNN with GRU (with trainable weights of Transfer Learning)**
- Training Accuracy: 0.99
- Validation Accuracy: 0.99
- Batch Size: 5
- Frames: 16
- Number of epochs: 20
- Model File Name: model-00020-0.01408-0.99698-0.02186-1.00000.keras
- Model Loss and Accuracy comparison b/w Train and Validation set:
