Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amiriiw/speech_recognition
Welcome to the Speech Recognition Project! This project is designed to train a model for recognizing voice commands and using it to control a simple Snake game in real-time. The project involves training a TensorFlow model for speech recognition and integrating it with a Pygame-based game.
https://github.com/amiriiw/speech_recognition
key-word-recognition numpy pygame queue random sounddevice speech-recognition tensorflow threading
Last synced: about 3 hours ago
JSON representation
Welcome to the Speech Recognition Project! This project is designed to train a model for recognizing voice commands and using it to control a simple Snake game in real-time. The project involves training a TensorFlow model for speech recognition and integrating it with a Pygame-based game.
- Host: GitHub
- URL: https://github.com/amiriiw/speech_recognition
- Owner: amiriiw
- License: mit
- Created: 2024-07-21T05:55:31.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-12T05:59:25.000Z (3 months ago)
- Last Synced: 2024-08-12T07:01:00.470Z (3 months ago)
- Topics: key-word-recognition, numpy, pygame, queue, random, sounddevice, speech-recognition, tensorflow, threading
- Language: Python
- Homepage:
- Size: 32.2 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Speech Recognition Project
Welcome to the **Speech Recognition Project**! This project is designed to train a model for recognizing voice commands and using it to control a simple Snake game in real-time. The project involves training a TensorFlow model for speech recognition and integrating it with a Pygame-based game.
## Overview
This project consists of two main components:
1. **speech_recognition_model_trainer.py**: This script is responsible for training a TensorFlow model to recognize voice commands. The model is trained using audio data to classify commands such as 'left,' 'right,' 'up,' and 'down.'
2. **speech_recognition_game.py**: This script uses the trained model to control a Snake game based on voice commands. It captures audio from a microphone, processes it to make predictions, and translates those predictions into game controls.## Libraries Used
The following libraries are used in this project:
- **[tensorflow](https://www.tensorflow.org/)**: TensorFlow is used for building and training the speech recognition model.
- **[numpy](https://numpy.org/)**: NumPy is used for numerical operations and handling arrays.
- **[pygame](https://www.pygame.org/docs/)**: Pygame is used to create the Snake game and handle graphics and user input.
- **[sounddevice](https://python-sounddevice.readthedocs.io/en/0.4.7/)**: Sounddevice is used to capture audio from the microphone in real-time.
- **[queue](https://docs.python.org/3/library/queue.html)**: The Queue module is used for handling audio data in a thread-safe manner.
- **[random](https://docs.python.org/3/library/random.html)**: The Random module is used to generate random positions for the Snake game food.
- **[threading](https://docs.python.org/3/library/threading.html)**: Threading is used to process audio data in parallel with the game loop.## Detailed Explanation
### `speech_recognition_model_trainer.py`
This script is the core of the project, responsible for training the speech recognition model. The key components of the script are:
- **TrainModel Class**: This class handles the training process for the speech recognition model. It includes methods to load datasets, prepare audio data, build the model, train the model, and evaluate its performance.
- **_load_commands() Method**: Loads command names from the dataset directory.
- **load_datasets() Method**: Loads and splits audio datasets into training, validation, and test sets.
- **_make_spec_ds() Method**: Converts audio data into spectrograms for model input.
- **build_model() Method**: Defines and compiles the CNN model architecture for speech recognition.
- **train_model() Method**: Trains the model on the prepared dataset.
- **evaluate_model() Method**: Evaluates the model's performance on the test set.
- **ExportModel Class**: This class is used to export the trained model for inference. It includes methods to preprocess audio data and make predictions using the trained model.### `speech_recognition_game.py`
This script integrates the trained model with a real-time game controlled by voice commands. The key components of the script are:
- **SnakeGame Class**: This class sets up and runs the Snake game using Pygame. It handles game initialization, drawing, scoring, and game logic.
- **VoiceControl Class**: This class handles voice command processing and game control. It includes methods for audio processing, making predictions with the model, and translating predictions into game actions.
- **audio_callback() Method**: Captures audio data from the microphone and adds it to a queue.
- **process_audio() Method**: Processes audio data, makes predictions using the trained model, and updates the game state based on the predicted commands.### How It Works
1. **Model Training**:
- The `speech_recognition_model_trainer.py` script reads audio data from the specified dataset directory.
- The audio is converted into spectrograms, and the model is trained to classify different voice commands.
- The trained model is saved for later use.2. **Voice-Controlled Game**:
- The `speech_recognition_game.py` script loads the trained model and starts a microphone stream to capture real-time audio.
- The captured audio is processed and passed through the model to predict voice commands.
- The predicted commands are used to control the Snake game in real-time.### Dataset
The dataset used for training the model can be accessed via this [Dataset](https://drive.google.com/drive/folders/1aAST8IX1-3Ri1eBdhq-4oyZVY8gjuuub?usp=sharing).
## Installation and Setup
To use this project, follow these steps:
1. Clone the repository:
```bash
git clone https://github.com/amiriiw/speech_recognition
cd speech_recognition
```2. Install the required libraries:
```bash
pip install tensorflow numpy pygame sounddevice
```3. Prepare your dataset (an audio dataset with voice commands).
4. Train the model:
```bash
python speech_recognition_model_trainer.py
```5. Run the voice-controlled game:
```bash
python speech_recognition_game.py
```## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.