https://github.com/kuldeep-gif/interactive-gesture-speech-system

An interactive AI system that translates real-time hand gestures into audible speech and converts spoken words into visual gestures using OpenCV and MediaPipe.
https://github.com/kuldeep-gif/interactive-gesture-speech-system

computer-vision gesture-recognition hci machine-learning mediapipe opencv python scikit-learn speech-recognition

Last synced: 2 months ago
JSON representation

An interactive AI system that translates real-time hand gestures into audible speech and converts spoken words into visual gestures using OpenCV and MediaPipe.

Host: GitHub
URL: https://github.com/kuldeep-gif/interactive-gesture-speech-system
Owner: Kuldeep-gif
Created: 2025-10-07T18:08:10.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-10-07T18:24:56.000Z (9 months ago)
Last Synced: 2025-10-07T18:40:56.413Z (9 months ago)
Topics: computer-vision, gesture-recognition, hci, machine-learning, mediapipe, opencv, python, scikit-learn, speech-recognition
Language: Jupyter Notebook
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README(1).md

Awesome Lists containing this project

README

# Interactive AI Gesture & Speech Recognition System

A real-time, interactive AI system that translates hand gestures into audible speech and spoken words into visual gestures, creating a seamless, two-way communication experience.

## 🌟 Features

Gesture-to-Speech: Performs a gesture and the system provides instant, synthesized speech feedback.

* **Gesture-to-Speech**: Performs a gesture (e.g., thumbs up 👍) and the system responds with a synthesized voice ("Great job!").
* **Speech-to-Gesture**: Speaks a command (e.g., "Hello") and the system displays a corresponding image of the gesture (e.g., a wave 👋).
* **Real-Time Recognition**: Low-latency processing for a natural and seamless interaction.
* **Multi-Hand Tracking**: Capable of recognizing gestures performed with one or two hands.
* **Extendable**: Easily customizable to add new gestures and voice commands.

## Tech Stack
* **Programming Language**: Python
* **Core Libraries**:
* **Computer Vision**: OpenCV, MediaPipe
* **Machine Learning**: scikit-learn
* **Speech Synthesis (TTS)**: gTTS, playsound
* **Speech Recognition (STT)**: SpeechRecognition, PyAudio
## 🚀 Getting Started
Follow these instructions to set up and run the project on your local machine.

### 1. Prerequisites

* Python 3.8+
* A webcam and a microphone

### 2. Installation & Setup

**a. Project Structure:**

Project Interactive-Gesture-Speech-System/
│
├── .venv/
│
├── data/
│ ├── 0/
│ │ ├── 0.jpg
│ │ ├── 1.jpg
│ │ └── ...
│ ├── 1/
│ │ └── ...
│ └── (etc.)
│
├── gesture_images/
│ ├── A.jpg
│ ├── B.jpg
│ ├── thumbs_up.jpg
│ └── (etc.)
│
├── collect_imgs.py
├── create_dataset.py
├── train_classifier.py
├── synapse_interactive.py
│
├── model.p
├── data.pickle
│
└── README.md

**b. Create and activate a virtual environment:**

```bash
# For Windows
python -m venv .venv
.venv\Scripts\activate
```
**c. Install the required libraries:**

**Core Libraries**

These libraries provide the main functionalities for computer vision, machine learning, and user interaction.

* **opencv-python:** The primary library for all computer vision tasks, including capturing webcam video and displaying images.

* **mediapipe:** Used for real-time hand tracking and landmark detection.

* **scikit-learn:** Used to train the RandomForestClassifier for gesture recognition.

* **numpy:** A fundamental library for numerical operations, used to handle the data arrays.

**Speech Functionality Libraries**

These libraries were added to handle the text-to-speech and speech-to-text features.

* **gTTS:** (Google Text-to-Speech) Used to convert text phrases into audible speech.

* **playsound:** A simple library used to play the audio files generated by gTTS.

* **SpeechRecognition:** The main library for capturing microphone audio and converting speech to text.

* **PyAudio:** Required by SpeechRecognition to access the microphone's audio stream.

### 2. Folder Structure

Make sure you create a folder named **gesture_images** and place one clear .jpg or .png file for each gesture you want to be displayed.
## Usage
The project is divided into three main steps: data collection, model training, and running the interactive application:

**1. Collect Gesture Data:** Run the **collect_imgs.py** script to capture images for your gestures. You will be prompted to **press 'q'** to start capturing for each class.

```Bash
python collect_imgs.py
```
**2. Create the Dataset:** Run the **create_dataset.py** script. This will process all the images in the data folder and create a data.pickle file.

```Bash
python create_dataset.py
```
**3. Train the Model:** Run the **train_classifier.py** script. This will use **data.pickle** to train the gesture recognition model and save it as **model.p**

```Bash
python train_classifier.py
```
**4.Run the Interactive Application:** You are now ready to run the main application!

```Bash
python synapse_interactive.py
```
Look at your webcam, perform gestures to hear the speech output, and speak commands to see the gesture images appear. Press 'q' to quit.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kuldeep-gif/interactive-gesture-speech-system

Awesome Lists containing this project

README