An open API service indexing awesome lists of open source software.

https://github.com/kuldeep-gif/interactive-gesture-speech-system

An interactive AI system that translates real-time hand gestures into audible speech and converts spoken words into visual gestures using OpenCV and MediaPipe.
https://github.com/kuldeep-gif/interactive-gesture-speech-system

computer-vision gesture-recognition hci machine-learning mediapipe opencv python scikit-learn speech-recognition

Last synced: 2 months ago
JSON representation

An interactive AI system that translates real-time hand gestures into audible speech and converts spoken words into visual gestures using OpenCV and MediaPipe.

Awesome Lists containing this project

README

          

# Interactive AI Gesture & Speech Recognition System

A real-time, interactive AI system that translates hand gestures into audible speech and spoken words into visual gestures, creating a seamless, two-way communication experience.

## 🌟 Features

Gesture-to-Speech: Performs a gesture and the system provides instant, synthesized speech feedback.

* **Gesture-to-Speech**: Performs a gesture (e.g., thumbs up 👍) and the system responds with a synthesized voice ("Great job!").
* **Speech-to-Gesture**: Speaks a command (e.g., "Hello") and the system displays a corresponding image of the gesture (e.g., a wave 👋).
* **Real-Time Recognition**: Low-latency processing for a natural and seamless interaction.
* **Multi-Hand Tracking**: Capable of recognizing gestures performed with one or two hands.
* **Extendable**: Easily customizable to add new gestures and voice commands.

## Tech Stack
* **Programming Language**: Python
* **Core Libraries**:
* **Computer Vision**: OpenCV, MediaPipe
* **Machine Learning**: scikit-learn
* **Speech Synthesis (TTS)**: gTTS, playsound
* **Speech Recognition (STT)**: SpeechRecognition, PyAudio
## 🚀 Getting Started
Follow these instructions to set up and run the project on your local machine.

### 1. Prerequisites

* Python 3.8+
* A webcam and a microphone

### 2. Installation & Setup

**a. Project Structure:**

Project Interactive-Gesture-Speech-System/

├── .venv/

├── data/
│ ├── 0/
│ │ ├── 0.jpg
│ │ ├── 1.jpg
│ │ └── ...
│ ├── 1/
│ │ └── ...
│ └── (etc.)

├── gesture_images/
│ ├── A.jpg
│ ├── B.jpg
│ ├── thumbs_up.jpg
│ └── (etc.)

├── collect_imgs.py
├── create_dataset.py
├── train_classifier.py
├── synapse_interactive.py

├── model.p
├── data.pickle

└── README.md

**b. Create and activate a virtual environment:**

```bash
# For Windows
python -m venv .venv
.venv\Scripts\activate
```
**c. Install the required libraries:**

**Core Libraries**

These libraries provide the main functionalities for computer vision, machine learning, and user interaction.

* **opencv-python:** The primary library for all computer vision tasks, including capturing webcam video and displaying images.

* **mediapipe:** Used for real-time hand tracking and landmark detection.

* **scikit-learn:** Used to train the RandomForestClassifier for gesture recognition.

* **numpy:** A fundamental library for numerical operations, used to handle the data arrays.

**Speech Functionality Libraries**

These libraries were added to handle the text-to-speech and speech-to-text features.

* **gTTS:** (Google Text-to-Speech) Used to convert text phrases into audible speech.

* **playsound:** A simple library used to play the audio files generated by gTTS.

* **SpeechRecognition:** The main library for capturing microphone audio and converting speech to text.

* **PyAudio:** Required by SpeechRecognition to access the microphone's audio stream.

### 2. Folder Structure

Make sure you create a folder named **gesture_images** and place one clear .jpg or .png file for each gesture you want to be displayed.
## Usage
The project is divided into three main steps: data collection, model training, and running the interactive application:

**1. Collect Gesture Data:** Run the **collect_imgs.py** script to capture images for your gestures. You will be prompted to **press 'q'** to start capturing for each class.

```Bash
python collect_imgs.py
```
**2. Create the Dataset:** Run the **create_dataset.py** script. This will process all the images in the data folder and create a data.pickle file.

```Bash
python create_dataset.py
```
**3. Train the Model:** Run the **train_classifier.py** script. This will use **data.pickle** to train the gesture recognition model and save it as **model.p**

```Bash
python train_classifier.py
```
**4.Run the Interactive Application:** You are now ready to run the main application!

```Bash
python synapse_interactive.py
```
Look at your webcam, perform gestures to hear the speech output, and speak commands to see the gesture images appear. Press 'q' to quit.