https://github.com/djleamen/renamer
Utility to rename mp3 files based on speech content
https://github.com/djleamen/renamer
ffmpeg google-speech-recognition googlespeechapi mp3 openai pyaudio pydub python speech-recognition speech-to-text torch util utility wav whisper whisper-ai
Last synced: 2 months ago
JSON representation
Utility to rename mp3 files based on speech content
- Host: GitHub
- URL: https://github.com/djleamen/renamer
- Owner: djleamen
- License: mit
- Created: 2025-06-22T05:25:00.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-07-26T04:46:32.000Z (11 months ago)
- Last Synced: 2025-07-26T10:56:01.318Z (11 months ago)
- Topics: ffmpeg, google-speech-recognition, googlespeechapi, mp3, openai, pyaudio, pydub, python, speech-recognition, speech-to-text, torch, util, utility, wav, whisper, whisper-ai
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MP3 Renamer
[](https://www.python.org/downloads/)
[](LICENSE)

[](https://github.com/djleamen/renamer/commits)
A Python tool that automatically renames MP3 files based on their speech content using AI-powered Speech-to-Text technology.
## Features
- Scans a directory for MP3 files
- Converts each MP3 to WAV format (temporarily)
- Uses OpenAI's Whisper model for high-quality transcription (with Google Speech API as fallback)
- Extracts the first sentence or a specified number of words from the transcription
- Renames the MP3 file based on this extracted text
- Cleans filenames to ensure they are valid
## Requirements
- Python 3.6 or later
- Required Python packages (install with `pip install -r requirements.txt`):
- openai-whisper (for state-of-the-art speech recognition)
- torch (required for Whisper)
- SpeechRecognition (fallback recognition)
- pydub (audio file manipulation)
- PyAudio (for audio processing)
- FFmpeg (for MP3 conversion, used by pydub)
## Installation
1. Clone or download this repository
2. Install required packages:
```bash
pip install -r requirements.txt
```
3. Install FFmpeg (if not already installed):
- macOS: `brew install ffmpeg`
- Linux: `apt-get install ffmpeg`
- Windows: Download from [ffmpeg.org](https://ffmpeg.org/download.html)
## Usage
### Basic Usage
Run the script with the path to the directory containing MP3 files:
```bash
python mp3_renamer.py /path/to/mp3/folder
```
### Advanced Options
```bash
python mp3_renamer.py /path/to/mp3/folder [options]
```
Available options:
- `-d, --duration SECONDS` - Process only a specific duration (default: 10 seconds)
- `-s, --start SECONDS` - Start processing from this time (default: 0 seconds)
- `-f, --first N` - Use the first N words for the filename instead of a sentence
- `-v, --verbose` - Enable verbose output for debugging
- `-e, --engine {whisper,google,both}` - Choose the speech recognition engine (default: whisper)
- `-m, --model {tiny,base,small,medium,large}` - Select Whisper model size (default: base)
### Examples
Use Whisper with a larger model for better accuracy:
```bash
python mp3_renamer.py /path/to/mp3/folder --model medium
```
Process only the first 5 seconds of each file:
```bash
python mp3_renamer.py /path/to/mp3/folder --duration 5
```
Use the first 10 words instead of trying to detect a sentence:
```bash
python mp3_renamer.py /path/to/mp3/folder --first 10
```
## How It Works
1. The script scans the specified directory for MP3 files
2. For each MP3 file:
- Converts it to WAV format (needed for speech recognition)
- Uses OpenAI's Whisper model to transcribe the audio
- Extracts the first sentence or specified number of words
- Cleans the text to make it a valid filename
- Renames the MP3 file with the new name
- Removes the temporary WAV file
## Choosing a Whisper Model
- `tiny`: Fastest, lowest accuracy, minimal resource usage
- `base`: Good balance of speed and accuracy (default)
- `small`: Better accuracy, moderate resource usage
- `medium`: High accuracy, higher resource usage
- `large`: Best accuracy, significant resource usage
## Troubleshooting
If you encounter issues:
1. Try running with `--verbose` to see detailed logs
2. Use `--model small` or `--model medium` for better transcription
3. Adjust the `--start` and `--duration` parameters to capture the correct part of audio
4. Use `--first N` to bypass sentence detection if it's not working well