https://github.com/alihassanml/speech-recognition-system
This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.
https://github.com/alihassanml/speech-recognition-system
deep-learning librosa speech-recognition speech-to-text
Last synced: 6 months ago
JSON representation
This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.
- Host: GitHub
- URL: https://github.com/alihassanml/speech-recognition-system
- Owner: alihassanml
- License: mit
- Created: 2024-12-02T16:42:13.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-02T16:48:49.000Z (10 months ago)
- Last Synced: 2025-02-06T11:57:00.155Z (8 months ago)
- Topics: deep-learning, librosa, speech-recognition, speech-to-text
- Language: Jupyter Notebook
- Homepage:
- Size: 466 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Speech Recognition System
This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.
---
## Table of Contents
- [Overview](#overview)
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Results](#results)
- [Contributing](#contributing)
- [License](#license)---
## Overview
This speech recognition system is designed to process audio files, extract meaningful features, and train a deep learning model using LSTMs to predict sequences. The project leverages the LibriSpeech dataset and the `librosa` library for audio analysis.---
## Features
- Audio preprocessing and feature extraction using `librosa`.
- LSTM-based sequence model for speech recognition.
- End-to-end training pipeline with data preprocessing, model training, and evaluation.---
## Installation
Clone the repository and install the required dependencies:```bash
git clone https://github.com/alihassanml/Speech-Recognition-System.git
cd Speech-Recognition-System
pip install -r requirements.txt
```---
## Usage
1. **Preprocess Data**: Extract MFCC features using `librosa`.
2. **Train Model**: Train the LSTM model with your processed dataset.
3. **Evaluate Performance**: Test the trained model on unseen audio samples.### Feature Extraction Example
```python
import librosa
import numpy as npaudio_path = 'path_to_audio_file.wav'
y, sr = librosa.load(audio_path, sr=None)
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=20)
```### Model Training Example
```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropoutmodel = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(length, 20)))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
```---
## Model Architecture
The model is built using a stack of LSTMs with dropout layers to prevent overfitting. The architecture is as follows:
- **LSTM (128 units)**: Captures temporal dependencies in the feature sequences.
- **Dropout (20%)**: Regularization to reduce overfitting.
- **Dense Layer**: Final layer with a softmax activation for classification.---
## Dataset
- **LibriSpeech**: A large-scale corpus of English speech data.
- Download and preprocess the dataset using your preferred method.---
## Results
Add your training accuracy, evaluation metrics, or sample predictions here.---
## Contributing
Contributions are welcome! Please fork the repository, create a branch, and submit a pull request.---
## License
This project is licensed under the MIT License.