An open API service indexing awesome lists of open source software.

https://github.com/alihassanml/speech-recognition-system

This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.
https://github.com/alihassanml/speech-recognition-system

deep-learning librosa speech-recognition speech-to-text

Last synced: 6 months ago
JSON representation

This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.

Awesome Lists containing this project

README

          

# Speech Recognition System

This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.

---

## Table of Contents
- [Overview](#overview)
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Results](#results)
- [Contributing](#contributing)
- [License](#license)

---

## Overview
This speech recognition system is designed to process audio files, extract meaningful features, and train a deep learning model using LSTMs to predict sequences. The project leverages the LibriSpeech dataset and the `librosa` library for audio analysis.

---

## Features
- Audio preprocessing and feature extraction using `librosa`.
- LSTM-based sequence model for speech recognition.
- End-to-end training pipeline with data preprocessing, model training, and evaluation.

---

## Installation
Clone the repository and install the required dependencies:

```bash
git clone https://github.com/alihassanml/Speech-Recognition-System.git
cd Speech-Recognition-System
pip install -r requirements.txt
```

---

## Usage
1. **Preprocess Data**: Extract MFCC features using `librosa`.
2. **Train Model**: Train the LSTM model with your processed dataset.
3. **Evaluate Performance**: Test the trained model on unseen audio samples.

### Feature Extraction Example
```python
import librosa
import numpy as np

audio_path = 'path_to_audio_file.wav'
y, sr = librosa.load(audio_path, sr=None)
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=20)
```

### Model Training Example
```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout

model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(length, 20)))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
```

---

## Model Architecture
The model is built using a stack of LSTMs with dropout layers to prevent overfitting. The architecture is as follows:
- **LSTM (128 units)**: Captures temporal dependencies in the feature sequences.
- **Dropout (20%)**: Regularization to reduce overfitting.
- **Dense Layer**: Final layer with a softmax activation for classification.

---

## Dataset
- **LibriSpeech**: A large-scale corpus of English speech data.
- Download and preprocess the dataset using your preferred method.

---

## Results
Add your training accuracy, evaluation metrics, or sample predictions here.

---

## Contributing
Contributions are welcome! Please fork the repository, create a branch, and submit a pull request.

---

## License
This project is licensed under the MIT License.