https://github.com/kuldeep-gif/interactive-gesture-speech-system
An interactive AI system that translates real-time hand gestures into audible speech and converts spoken words into visual gestures using OpenCV and MediaPipe.
https://github.com/kuldeep-gif/interactive-gesture-speech-system
computer-vision gesture-recognition hci machine-learning mediapipe opencv python scikit-learn speech-recognition
Last synced: 2 months ago
JSON representation
An interactive AI system that translates real-time hand gestures into audible speech and converts spoken words into visual gestures using OpenCV and MediaPipe.
- Host: GitHub
- URL: https://github.com/kuldeep-gif/interactive-gesture-speech-system
- Owner: Kuldeep-gif
- Created: 2025-10-07T18:08:10.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-10-07T18:24:56.000Z (9 months ago)
- Last Synced: 2025-10-07T18:40:56.413Z (9 months ago)
- Topics: computer-vision, gesture-recognition, hci, machine-learning, mediapipe, opencv, python, scikit-learn, speech-recognition
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README(1).md
Awesome Lists containing this project
README
# Interactive AI Gesture & Speech Recognition System
A real-time, interactive AI system that translates hand gestures into audible speech and spoken words into visual gestures, creating a seamless, two-way communication experience.
## 🌟 Features
Gesture-to-Speech: Performs a gesture and the system provides instant, synthesized speech feedback.
* **Gesture-to-Speech**: Performs a gesture (e.g., thumbs up 👍) and the system responds with a synthesized voice ("Great job!").
* **Speech-to-Gesture**: Speaks a command (e.g., "Hello") and the system displays a corresponding image of the gesture (e.g., a wave 👋).
* **Real-Time Recognition**: Low-latency processing for a natural and seamless interaction.
* **Multi-Hand Tracking**: Capable of recognizing gestures performed with one or two hands.
* **Extendable**: Easily customizable to add new gestures and voice commands.
## Tech Stack
* **Programming Language**: Python
* **Core Libraries**:
* **Computer Vision**: OpenCV, MediaPipe
* **Machine Learning**: scikit-learn
* **Speech Synthesis (TTS)**: gTTS, playsound
* **Speech Recognition (STT)**: SpeechRecognition, PyAudio
## 🚀 Getting Started
Follow these instructions to set up and run the project on your local machine.
### 1. Prerequisites
* Python 3.8+
* A webcam and a microphone
### 2. Installation & Setup
**a. Project Structure:**
Project Interactive-Gesture-Speech-System/
│
├── .venv/
│
├── data/
│ ├── 0/
│ │ ├── 0.jpg
│ │ ├── 1.jpg
│ │ └── ...
│ ├── 1/
│ │ └── ...
│ └── (etc.)
│
├── gesture_images/
│ ├── A.jpg
│ ├── B.jpg
│ ├── thumbs_up.jpg
│ └── (etc.)
│
├── collect_imgs.py
├── create_dataset.py
├── train_classifier.py
├── synapse_interactive.py
│
├── model.p
├── data.pickle
│
└── README.md
**b. Create and activate a virtual environment:**
```bash
# For Windows
python -m venv .venv
.venv\Scripts\activate
```
**c. Install the required libraries:**
**Core Libraries**
These libraries provide the main functionalities for computer vision, machine learning, and user interaction.
* **opencv-python:** The primary library for all computer vision tasks, including capturing webcam video and displaying images.
* **mediapipe:** Used for real-time hand tracking and landmark detection.
* **scikit-learn:** Used to train the RandomForestClassifier for gesture recognition.
* **numpy:** A fundamental library for numerical operations, used to handle the data arrays.
**Speech Functionality Libraries**
These libraries were added to handle the text-to-speech and speech-to-text features.
* **gTTS:** (Google Text-to-Speech) Used to convert text phrases into audible speech.
* **playsound:** A simple library used to play the audio files generated by gTTS.
* **SpeechRecognition:** The main library for capturing microphone audio and converting speech to text.
* **PyAudio:** Required by SpeechRecognition to access the microphone's audio stream.
### 2. Folder Structure
Make sure you create a folder named **gesture_images** and place one clear .jpg or .png file for each gesture you want to be displayed.
## Usage
The project is divided into three main steps: data collection, model training, and running the interactive application:
**1. Collect Gesture Data:** Run the **collect_imgs.py** script to capture images for your gestures. You will be prompted to **press 'q'** to start capturing for each class.
```Bash
python collect_imgs.py
```
**2. Create the Dataset:** Run the **create_dataset.py** script. This will process all the images in the data folder and create a data.pickle file.
```Bash
python create_dataset.py
```
**3. Train the Model:** Run the **train_classifier.py** script. This will use **data.pickle** to train the gesture recognition model and save it as **model.p**
```Bash
python train_classifier.py
```
**4.Run the Interactive Application:** You are now ready to run the main application!
```Bash
python synapse_interactive.py
```
Look at your webcam, perform gestures to hear the speech output, and speak commands to see the gesture images appear. Press 'q' to quit.