https://github.com/mxcaoalina/speech_recognition
https://github.com/mxcaoalina/speech_recognition
Last synced: 12 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/mxcaoalina/speech_recognition
- Owner: mxcaoalina
- Created: 2025-03-24T21:43:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-24T21:46:36.000Z (over 1 year ago)
- Last Synced: 2025-06-18T15:49:18.135Z (about 1 year ago)
- Language: Python
- Size: 760 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Speech Recognition Project
This project provides various speech recognition and audio processing capabilities using AssemblyAI and OpenAI APIs.
## Features
- Basic audio file processing (WAV/MP3)
- Real-time speech recognition
- Sentiment analysis of speech
- Podcast summarization
- Real-time transcription with OpenAI integration
## Prerequisites
- Python 3.8 or higher
- AssemblyAI API key
- OpenAI API key (for some features)
## Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/speech-recognition-python.git
cd speech-recognition-python
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Set up environment variables:
Create a `.env` file in the root directory with the following content:
```
ASSEMBLYAI_API_KEY=your_assemblyai_api_key
OPENAI_API_KEY=your_openai_api_key
```
## Project Structure
- `1.basic framework/`: Basic audio file processing
- `2.simple recognition/`: Simple speech recognition
- `3.sentiment-analysis/`: Sentiment analysis of speech
- `4.podcast summarization/`: Podcast transcription and summarization
- `5.realtime-openai/`: Real-time transcription with OpenAI integration
- `shared/`: Shared modules and configuration
## Usage
### Basic Audio Processing
```bash
python "1.basic framework/load_mp3.py"
```
### Simple Speech Recognition
```bash
python "2.simple recognition/main.py"
```
### Sentiment Analysis
```bash
python "3.sentiment-analysis/main.py"
```
### Podcast Summarization
```bash
python "4.podcast summarization/main.py"
```
### Real-time Transcription
```bash
python "5.realtime-openai/main.py"
```
## Configuration
The project uses a centralized configuration system in `shared/config.py`. You can modify audio and API settings there.
## Error Handling
The project includes comprehensive error handling for:
- API communication issues
- File operations
- Audio stream management
- Resource cleanup
## Contributing
1. Fork the repository
2. Create your feature branch
3. Commit your changes
4. Push to the branch
5. Create a new Pull Request
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- AssemblyAI for speech recognition capabilities
- OpenAI for language model integration
- PyAudio for audio processing