An open API service indexing awesome lists of open source software.

https://github.com/aryansk/speech-to-text-converter

This GitHub repository contains a Python Streamlit app that utilizes machine learning to convert speech to text. The app allows users to upload audio files (in WAV or MP3 format) and uses the Speech Recognition library to perform speech recognition and convert the speech into text.
https://github.com/aryansk/speech-to-text-converter

machine-learning python speech-recognition speech-to-text streamlit

Last synced: about 2 months ago
JSON representation

This GitHub repository contains a Python Streamlit app that utilizes machine learning to convert speech to text. The app allows users to upload audio files (in WAV or MP3 format) and uses the Speech Recognition library to perform speech recognition and convert the speech into text.

Awesome Lists containing this project

README

          

# Audio Text Transcriber 🎙️📝

![Python](https://img.shields.io/badge/Python-3.7+-blue.svg)
![Streamlit](https://img.shields.io/badge/Streamlit-1.0+-red.svg)
![SpeechRecognition](https://img.shields.io/badge/SpeechRecognition-3.8+-green.svg)
![PyDub](https://img.shields.io/badge/PyDub-0.25+-yellow.svg)
![License](https://img.shields.io/badge/License-MIT-lightgrey.svg)
![Maintenance](https://img.shields.io/badge/Maintenance-Active-brightgreen.svg)

A powerful web application built with Streamlit that converts audio files to text using advanced speech recognition technology.

## 📖 Table of Contents
- [Features](#-features)
- [Technical Architecture](#-technical-architecture)
- [Installation & Setup](#-installation--setup)
- [Usage Guide](#-usage-guide)
- [Implementation Details](#-implementation-details)
- [Performance](#-performance)
- [Development](#-development)
- [Contributing](#-contributing)
- [License](#-license)

## 🌟 Features

### 🎵 Audio Processing
- **File Format Support**
- MP3 file processing
- WAV file processing
- Automatic format conversion
- Multi-channel audio support
- **Audio Handling**
- Sample rate normalization
- Audio chunk processing
- Noise reduction
- Volume normalization

### 🔍 Speech Recognition
- **Recognition Engine**
- Google Speech Recognition integration
- Offline fallback capability
- Multiple language support
- Confidence scoring
- **Processing Features**
- Real-time transcription
- Batch processing support
- Error handling
- Progress tracking

### 💻 User Interface
- **Interface Elements**
- Drag-and-drop file upload
- Progress indicators
- Error notifications
- Results display
- **Export Options**
- Text file export
- Copy to clipboard
- Download transcription
- Format preservation

## 🛠 Technical Architecture

### System Components
```mermaid
graph TD
A[Audio Input] --> B[Format Conversion]
B --> C[Audio Processing]
C --> D[Speech Recognition]
D --> E[Text Output]
E --> F[Export Options]
```

### Dependencies
```python
# requirements.txt
streamlit>=1.0.0
SpeechRecognition>=3.8.1
pydub>=0.25.1
numpy>=1.20.0
pandas>=1.3.0
```

## 💻 Installation & Setup

### System Requirements
- **Minimum Specifications**
- Python 3.7+
- 4GB RAM
- 2GB storage
- **Recommended Specifications**
- Python 3.9+
- 8GB RAM
- 5GB storage
- Stable internet connection

### Quick Start
```bash
# Clone repository
git clone https://github.com/yourusername/audio-text-transcriber.git

# Navigate to project
cd audio-text-transcriber

# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
.\venv\Scripts\activate # Windows

# Install dependencies
pip install -r requirements.txt
```

### Configuration
```python
# config.py
CONFIG = {
'audio': {
'supported_formats': ['mp3', 'wav'],
'max_file_size': 50 * 1024 * 1024, # 50MB
'sample_rate': 16000,
'chunk_size': 30000
},
'recognition': {
'language': 'en-US',
'enable_offline': False,
'confidence_threshold': 0.8
},
'interface': {
'theme': 'light',
'max_display_length': 1000,
'auto_scroll': True
}
}
```

## 🚀 Usage Guide

### Basic Implementation
```python
import streamlit as st
from audio_transcriber import AudioTranscriber

def main():
st.title("Audio Text Transcriber 🎙️📝")

uploaded_file = st.file_uploader(
"Upload your audio file",
type=['mp3', 'wav']
)

if uploaded_file is not None:
transcriber = AudioTranscriber()

with st.spinner('Transcribing...'):
text = transcriber.process_audio(uploaded_file)

st.success('Transcription Complete!')
st.text_area("Transcribed Text:", text)

if __name__ == "__main__":
main()
```

### Advanced Usage
```python
class AudioTranscriber:
def process_audio(self, audio_file):
"""
Process audio file and return transcribed text.

Args:
audio_file: UploadedFile object from Streamlit

Returns:
str: Transcribed text
"""
# Convert audio if needed
audio = self._convert_audio(audio_file)

# Initialize recognizer
recognizer = sr.Recognizer()

# Process audio in chunks
text = ""
for chunk in self._get_chunks(audio):
try:
chunk_text = recognizer.recognize_google(chunk)
text += chunk_text + " "
except sr.UnknownValueError:
st.warning("Could not understand audio")
except sr.RequestError:
st.error("API unavailable")

return text.strip()
```

## ⚡ Performance

### Optimization Techniques
- Audio chunking
- Parallel processing
- Caching mechanisms
- Memory management

### Benchmarks
| File Size | Processing Time | Memory Usage |
|-----------|----------------|--------------|
| 1MB | 2s | 100MB |
| 5MB | 8s | 250MB |
| 10MB | 15s | 400MB |
| 50MB | 60s | 1GB |

## 👨‍💻 Development

### Project Structure
```
audio-text-transcriber/
├── src/
│ ├── transcriber.py
│ ├── audio_processing.py
│ └── utils.py
├── app/
│ ├── main.py
│ └── components/
├── tests/
│ └── test_transcriber.py
├── config.py
├── requirements.txt
└── README.md
```

### Testing
```bash
# Run all tests
python -m pytest

# Run specific test file
python -m pytest tests/test_transcriber.py

# Run with coverage
python -m pytest --cov=src
```

## 🤝 Contributing

### Development Workflow
1. Fork repository
2. Create feature branch
3. Implement changes
4. Add tests
5. Submit pull request

### Code Style Guidelines
- Follow PEP 8
- Write docstrings
- Add type hints
- Include unit tests

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Google Speech Recognition API
- Streamlit team
- PyDub developers
- Open source community