https://github.com/alihassanml/speech-recognition-system

This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.
https://github.com/alihassanml/speech-recognition-system

deep-learning librosa speech-recognition speech-to-text

Last synced: 6 months ago
JSON representation

This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.

Host: GitHub
URL: https://github.com/alihassanml/speech-recognition-system
Owner: alihassanml
License: mit
Created: 2024-12-02T16:42:13.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-12-02T16:48:49.000Z (10 months ago)
Last Synced: 2025-02-06T11:57:00.155Z (8 months ago)
Topics: deep-learning, librosa, speech-recognition, speech-to-text
Language: Jupyter Notebook
Homepage:
Size: 466 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Speech Recognition System

This project implements a speech recognition system using the LibriSpeech dataset and the `librosa` library for feature extraction, alongside a deep learning model built with TensorFlow/Keras.

---

## Table of Contents

- [Overview](#overview)

- [Features](#features)

- [Installation](#installation)

- [Usage](#usage)

- [Model Architecture](#model-architecture)

- [Dataset](#dataset)

- [Results](#results)

- [Contributing](#contributing)

- [License](#license)

---

## Overview

This speech recognition system is designed to process audio files, extract meaningful features, and train a deep learning model using LSTMs to predict sequences. The project leverages the LibriSpeech dataset and the `librosa` library for audio analysis.

---

## Features

- Audio preprocessing and feature extraction using `librosa`.

- LSTM-based sequence model for speech recognition.

- End-to-end training pipeline with data preprocessing, model training, and evaluation.

---

## Installation

Clone the repository and install the required dependencies:

```bash

git clone https://github.com/alihassanml/Speech-Recognition-System.git

cd Speech-Recognition-System

pip install -r requirements.txt

```

---

## Usage

1. **Preprocess Data**: Extract MFCC features using `librosa`.

2. **Train Model**: Train the LSTM model with your processed dataset.

3. **Evaluate Performance**: Test the trained model on unseen audio samples.

### Feature Extraction Example

```python

import librosa

import numpy as np

audio_path = 'path_to_audio_file.wav'

y, sr = librosa.load(audio_path, sr=None)

mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=20)

```

### Model Training Example

```python

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, LSTM, Dropout

model = Sequential()

model.add(LSTM(128, return_sequences=True, input_shape=(length, 20)))

model.add(Dropout(0.2))

model.add(LSTM(128, return_sequences=True))

model.add(Dropout(0.2))

model.add(LSTM(128))

model.add(Dropout(0.2))

model.add(Dense(num_classes, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

```

---

## Model Architecture

The model is built using a stack of LSTMs with dropout layers to prevent overfitting. The architecture is as follows:

- **LSTM (128 units)**: Captures temporal dependencies in the feature sequences.

- **Dropout (20%)**: Regularization to reduce overfitting.

- **Dense Layer**: Final layer with a softmax activation for classification.

---

## Dataset

- **LibriSpeech**: A large-scale corpus of English speech data.

- Download and preprocess the dataset using your preferred method.

---

## Results

Add your training accuracy, evaluation metrics, or sample predictions here.

---

## Contributing

Contributions are welcome! Please fork the repository, create a branch, and submit a pull request.

---

## License

This project is licensed under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alihassanml/speech-recognition-system

Awesome Lists containing this project

README