https://github.com/bradsec/audionorm
Command line audio processing and transcription tools
https://github.com/bradsec/audionorm
audio-processing
Last synced: 8 months ago
JSON representation
Command line audio processing and transcription tools
- Host: GitHub
- URL: https://github.com/bradsec/audionorm
- Owner: bradsec
- License: mit
- Created: 2025-09-23T10:21:14.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-23T10:21:49.000Z (9 months ago)
- Last Synced: 2025-09-23T12:28:48.888Z (9 months ago)
- Topics: audio-processing
- Language: Python
- Homepage:
- Size: 36.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AudioNorm & AudioScribe
Command line audio processing and transcription tools combining FFmpeg,
AI-based denoising, and speech recognition. Can be used with TTS-generated audio, podcasts, and voice recordings.
## What These Tools Do
### AudioNorm - Audio Enhancement & Cleanup
- **AI-powered denoising** using SpeechBrain MTL-MIMIC and Demucs models
- **Professional normalization** with EBU R128 loudness standards
- **Vocal/background separation** with stem export functionality
### AudioScribe - Audio Transcription
- **Whisper-based transcription** using Faster-Whisper models
- **Multi-language support** with auto-detection
- **Timestamps and formatting** options
## System Requirements
- **Python 3.8+**
- **FFmpeg** (must be in system PATH)
## Installation
### System Dependencies
```bash
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# macOS (with Homebrew)
brew install ffmpeg
# Windows (with Chocolatey)
choco install ffmpeg
# Verify installation
ffmpeg -version
```
### Clone Repo and setup Python Environment
## Option 1: Python conda environment
```bash
# Create conda virtual environment
conda create -n audionorm-env python=3.12
conda activate audionorm-env
git clone https://github.com/bradsec/audionorm.git
cd audionorm
# Install dependencies
pip install -r requirements.txt
# Verify installation
python audionorm.py --help
python audioscribe.py --help
```
## Option 2: Python venv environment
```bash
# Create virtual environment
python -m venv audionorm-env
source audionorm-env/bin/activate # Linux/macOS
# audionorm-env\Scripts\activate # Windows
git clone https://github.com/bradsec/audionorm.git
cd audionorm
# Install dependencies
pip install -r requirements.txt
# Verify installation
python audionorm.py --help
python audioscribe.py --help
```
## AudioNorm - Audio Enhancement
### AudioNorm Features
- **SpeechBrain Enhancement**: MTL-MIMIC model for voice quality improvement
- **Demucs AI Denoising**: htdemucs_ft model optimized for vocal separation
- **Stem Separation**: Extract vocals and background tracks separately
- **Professional Normalization**: EBU R128 loudness standards
- **Intelligent Processing**: Auto-detects optimal processing pipeline
- **Batch Processing**: Handle entire directories recursively
### AudioNorm Examples
```bash
usage: audionorm.py [-h] [--target-lufs TARGET_LUFS] [--model MODEL]
[--device {cpu,cuda}] [--recursive] [--keep-temp]
[--skip-demucs] [--enhanced-cleaning] [--basic-cleaning]
[--python-restoration] [--verbose] [--quiet]
[--trim-silence] [--silence-threshold SILENCE_THRESHOLD]
[--intensive-cleanup] [--use-loudnorm] [--overwrite]
[--format {wav,mp3}] [--two-pass-loudnorm]
[--single-pass-loudnorm] [--no-speechbrain] [--save-stems]
[--stems-only] [--voice-consistent]
input
Audio normalization and cleanup for TTS-generated audio
positional arguments:
input Input audio file or directory
options:
-h, --help show this help message and exit
--target-lufs TARGET_LUFS
Target loudness in LUFS (default: -18.0 optimized for
voice content)
--model MODEL Demucs model (default: htdemucs_ft for better vocal
separation)
--device {cpu,cuda} Processing device (auto-detect if not specified)
--recursive, -r Process directories recursively
--keep-temp Keep temporary files for debugging
--skip-demucs Skip Demucs denoising (FFmpeg normalization only)
--enhanced-cleaning Enable enhanced FFmpeg cleaning (gentle)
--basic-cleaning Use basic cleaning only (faster processing)
--python-restoration Use Python-based restoration (librosa + noisereduce)
instead of Demucs
--verbose, -v Enable verbose logging
--quiet, -q Minimal output mode
--trim-silence Trim extended silences (>1s) to natural pause lengths
(0.2-0.8s)
--silence-threshold SILENCE_THRESHOLD
Silence detection threshold in dB (default: -30,
lower=more sensitive)
--intensive-cleanup Enable intensive post-AI cleanup (may cause reverb
artifacts, disabled by default)
--use-loudnorm Use loudnorm instead of dynaudnorm (may sound over-
processed)
--overwrite Overwrite existing output files (default: skip
existing files)
--format {wav,mp3}, -f {wav,mp3}
Output audio format (default: wav)
--two-pass-loudnorm Use professional two-pass loudnorm for superior volume
consistency (default: enabled)
--single-pass-loudnorm
Use single-pass loudnorm instead of two-pass (faster
but less accurate)
--no-speechbrain Disable SpeechBrain MTL-MIMIC voice enhancement
(default: enabled)
--save-stems Save all Demucs stems (vocals, drums, bass, other)
when using Demucs pipeline
--stems-only Only separate into vocal and background stems without
any processing or normalization
--voice-consistent Use voice-consistent normalization for uniform voice
levels throughout audio (fixes level variations)
Examples:
audionorm.py audio.wav
audionorm.py /path/to/audio/folder --recursive
audionorm.py audio.wav --target-lufs -23 --model htdemucs
audionorm.py folder/ --device cuda --keep-temp --quiet
```
## AudioScribe - Speech Transcription
### Core Features
- **Faster-Whisper Models**: Optimized Whisper implementation
- **Multi-language Support**: 60+ languages with auto-detection
- **Flexible Output**: Plain text, timestamps, or structured formats
- **Batch Transcription**: Process entire directories
- **Model Selection**: From tiny (fast) to large-v3 (most accurate)
### Usage
```bash
usage: audioscribe.py [-h]
[--model {tiny,tiny.en,base,base.en,small,small.en,medium,medium.en,large-v1,large-v2,large-v3,distil-large-v2,distil-large-v3}]
[--language LANGUAGE] [--task {transcribe,translate}] [--device {cpu,cuda}]
[--beam-size BEAM_SIZE] [--recursive] [--no-timestamps] [--keep-temp] [--verbose] [--quiet]
[--list-models] [--overwrite]
input
High-quality audio transcription with Faster-Whisper
positional arguments:
input Input audio file or directory
options:
-h, --help show this help message and exit
--model {tiny,tiny.en,base,base.en,small,small.en,medium,medium.en,large-v1,large-v2,large-v3,distil-large-v2,distil-large-v3}
Whisper model size (default: large-v2 for best balance)
--language LANGUAGE Source language code (auto-detect if not specified). Examples: en, es, fr, de, it, pt, ru, ja,
ko, zh
--task {transcribe,translate}
Task to perform (default: transcribe)
--device {cpu,cuda} Processing device (auto-detect if not specified)
--beam-size BEAM_SIZE
Beam size for decoding (higher = more accurate but slower, default: 5)
--recursive, -r Process directories recursively
--no-timestamps Disable timestamps in output (plain text only)
--keep-temp Keep temporary files for debugging
--verbose, -v Enable verbose logging
--quiet, -q Minimal output mode
--list-models List available models and exit
--overwrite Overwrite existing transcript files (default: skip existing files)
Examples:
audioscribe.py audio.wav
audioscribe.py /path/to/audio/folder --recursive
audioscribe.py audio.wav --model large-v3 --language en
audioscribe.py folder/ --device cuda --no-timestamps --quiet
```