An open API service indexing awesome lists of open source software.

https://github.com/mxcaoalina/speech_recognition


https://github.com/mxcaoalina/speech_recognition

Last synced: 12 months ago
JSON representation

Awesome Lists containing this project

README

          

# Speech Recognition Project

This project provides various speech recognition and audio processing capabilities using AssemblyAI and OpenAI APIs.

## Features

- Basic audio file processing (WAV/MP3)
- Real-time speech recognition
- Sentiment analysis of speech
- Podcast summarization
- Real-time transcription with OpenAI integration

## Prerequisites

- Python 3.8 or higher
- AssemblyAI API key
- OpenAI API key (for some features)

## Installation

1. Clone the repository:
```bash
git clone https://github.com/yourusername/speech-recognition-python.git
cd speech-recognition-python
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Set up environment variables:
Create a `.env` file in the root directory with the following content:
```
ASSEMBLYAI_API_KEY=your_assemblyai_api_key
OPENAI_API_KEY=your_openai_api_key
```

## Project Structure

- `1.basic framework/`: Basic audio file processing
- `2.simple recognition/`: Simple speech recognition
- `3.sentiment-analysis/`: Sentiment analysis of speech
- `4.podcast summarization/`: Podcast transcription and summarization
- `5.realtime-openai/`: Real-time transcription with OpenAI integration
- `shared/`: Shared modules and configuration

## Usage

### Basic Audio Processing
```bash
python "1.basic framework/load_mp3.py"
```

### Simple Speech Recognition
```bash
python "2.simple recognition/main.py"
```

### Sentiment Analysis
```bash
python "3.sentiment-analysis/main.py"
```

### Podcast Summarization
```bash
python "4.podcast summarization/main.py"
```

### Real-time Transcription
```bash
python "5.realtime-openai/main.py"
```

## Configuration

The project uses a centralized configuration system in `shared/config.py`. You can modify audio and API settings there.

## Error Handling

The project includes comprehensive error handling for:
- API communication issues
- File operations
- Audio stream management
- Resource cleanup

## Contributing

1. Fork the repository
2. Create your feature branch
3. Commit your changes
4. Push to the branch
5. Create a new Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- AssemblyAI for speech recognition capabilities
- OpenAI for language model integration
- PyAudio for audio processing