https://github.com/busrarafa/podcast-autopilot

Last synced: 18 days ago
JSON representation

Host: GitHub
URL: https://github.com/busrarafa/podcast-autopilot
Owner: BusraRafa
License: mit
Created: 2026-05-19T16:24:03.000Z (28 days ago)
Default Branch: main
Last Pushed: 2026-05-19T17:03:23.000Z (28 days ago)
Last Synced: 2026-05-19T20:39:48.807Z (28 days ago)
Language: Python
Size: 31.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 🎙️ Podcast Autopilot

> **AI-powered podcast summarization pipeline** — transcribe, intelligently trim, and export a condensed highlight reel from any audio file using OpenAI Whisper and GPT-4 Turbo.

---

## 📌 Overview

**Podcast Autopilot** is an end-to-end audio processing pipeline that automatically transforms long-form podcast or lecture recordings into tight, high-signal highlight clips — without touching a single audio editor. You drop in an MP3 or WAV file, set a target length (e.g., 30% of the original), and the system handles everything: transcription, intelligent segment selection, audio cropping, and final export.

The tool is designed for content creators, researchers, and developers who want to extract the most valuable moments from hours of audio — automatically and at scale.

---

## ✨ Features

- **Automatic Speech-to-Text** — Transcribes audio using OpenAI Whisper with word-level timestamps
- **Timestamped Transcript Formatting** — Groups words into readable lines, split on natural speech pauses (>1 second)
- **AI-Driven Segment Selection** — Uses GPT-4 Turbo to identify and select only the highest-value segments from the transcript
- **Target Duration Control** — Specify an exact percentage of the original runtime to keep (default: 30%)
- **Retry & Validation Logic** — Automatically re-queries the model if the selected duration is off-target, ensuring accuracy within ±10%
- **Audio Cropping & Stitching** — Extracts the selected segments and joins them with smooth 800ms silence padding
- **Dual Export** — Outputs both a high-quality WAV and a 192kbps MP3 of the final edited audio
- **Django-Ready Utility Module** — `utils.py` is structured for integration into a Django web application with settings-based API key management
- **WAV → MP3 Auto-Conversion** — Accepts both WAV and MP3 inputs; WAV files are automatically converted before processing

---

## 🗂️ Project Structure

```
podcast-autopilot/
│
├── main.py # Standalone pipeline script with retry/validation logic
├── utils.py # Django-integrated version of the pipeline
├── test_final_1.py # Test script variant 1
├── test_final_2.py # Test script variant 2
├── requirements.txt # All Python dependencies
├── .gitignore
│
├── Bishop Varden Lecture.mp3 # Sample audio file
└── Tucker.mp3 # Sample audio file
```

---

## 🔄 How It Works

```
Audio File (MP3/WAV)
│
▼
[1] Whisper Transcription
(word-level timestamps)
│
▼
[2] Timestamp Formatting
(pause-aware line grouping)
│
▼
[3] GPT-4 Turbo Summarization
(select key segments → JSON)
│
▼
[4] Duration Validation & Retry
(re-query if off target by >10%)
│
▼
[5] Audio Cropping & Stitching
(pydub segment extraction)
│
▼
[6] Export: WAV + MP3
```

---

## 🛠️ Tech Stack

| Technology | Purpose |
|---|---|
| **Python 3.x** | Core language |
| **OpenAI Whisper** (`whisper-1`) | Speech-to-text transcription with word timestamps |
| **GPT-4 Turbo** | Intelligent segment selection and summarization |
| **pydub** | Audio cropping, stitching, and format conversion |
| **ffmpeg** | Audio backend for pydub |
| **python-dotenv** | Environment variable management |
| **Django** | Web framework integration (via `utils.py`) |

---

## ⚙️ Setup & Installation

### Prerequisites

- Python 3.9+
- [ffmpeg](https://ffmpeg.org/download.html) installed and available in your system PATH
- An OpenAI API key

### 1. Clone the repository

```bash
git clone https://github.com/BusraRafa/podcast-autopilot.git
cd podcast-autopilot
```

### 2. Install dependencies

```bash
pip install -r requirements.txt
```

### 3. Configure your API key

Create a `.env` file in the project root:

```env
OPENAI_API_KEY=your_openai_api_key_here
```

---

## 🚀 Usage

### Standalone Script (`main.py`)

Edit the bottom of `main.py` to point to your audio file:

```python
if __name__ == "__main__":
audio_file = "./your_podcast.mp3"
output_folder = "./output"

result = process_audio_pipeline(audio_file, output_folder, target_percentage=30)
```

Then run:

```bash
python main.py
```

### Function Signature

```python
process_audio_pipeline(
audio_file_path: str, # Path to your MP3 or WAV file
output_folder: str, # Directory for all output files
target_percentage: int # % of original duration to keep (default: 30)
)
```

### Output Files

After running, the `output_folder` will contain:

| File | Description |
|---|---|
| `demo_transcription_formatted_output.txt` | Full timestamped transcript |
| `output.json` | JSON array of selected segments with timestamps |
| `_FINAL_EDITED.wav` | Final highlight reel (WAV) |
| `_FINAL_EDITED.mp3` | Final highlight reel (MP3, 192kbps) |

---

## 📦 Key Dependencies

```
openai==2.9.0
openai-whisper==20250625
pydub==0.25.1
ffmpeg-python==0.2.0
python-dotenv==1.2.1
torch==2.9.1
```

See `requirements.txt` for the full list.

---

## 🧠 Design Decisions

**Why keep exact wording from the transcript?**
The pipeline instructs GPT to never paraphrase or modify source text — all selected segments are verbatim excerpts. This ensures the cropped audio matches the selected text exactly, making the JSON-to-audio alignment reliable.

**Why retry logic?**
LLMs don't always produce outputs of a precise length on the first attempt. The pipeline calculates the total duration of selected segments after each response and retries with a stricter prompt if the result deviates more than 10% from the target.

**Why 800ms silence padding?**
Short silence gaps between stitched segments make the final audio sound natural rather than abruptly cut. This value is configurable in the code.

---

## 🔮 Potential Extensions

- Web UI via Django for drag-and-drop audio upload
- Support for YouTube URL input (via `yt-dlp`)
- Chapter-aware summarization for structured podcasts
- Speaker diarization to preserve only a specific speaker
- Batch processing for entire podcast RSS feeds

---

## 📄 License

This project is open source. Feel free to use, modify, and build upon it.

---

*Built with OpenAI Whisper + GPT-4 Turbo + pydub*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/busrarafa/podcast-autopilot

Awesome Lists containing this project

README