An open API service indexing awesome lists of open source software.

https://github.com/avanigupta06/chaptify


https://github.com/avanigupta06/chaptify

ffmpeg flask html keybert mbart whisper-ai yt-dlp

Last synced: 12 months ago
JSON representation

Awesome Lists containing this project

README

          

# πŸ”Š Chatify – YouTube Video Chaptering & Summarization Tool

**Chatify** is an AI-powered Flask web application that takes a YouTube video URL and automatically generates meaningful **chapter-wise summaries** and **titles** from spoken Hindi or Hinglish content. It uses **OpenAI's Whisper**, **mBART**, and **KeyBERT** to convert speech to text, summarize it, and generate chapter titles. The output is a clean, timestamped JSON fileβ€”ideal for content indexing, accessibility, or quick navigation.

---

## 🧰 Tech Stack

### πŸ–₯️ Backend
- **Flask** β€” Lightweight Python web framework for handling routes and requests.

### 🧠 Machine Learning & NLP
- **Whisper (by OpenAI)** β€” For speech-to-text transcription from Hindi/Hinglish audio.
- **mBART (by Facebook AI)** β€” For abstractive summarization and Hindi β†’ English translation.
- **KeyBERT** β€” For keyword-based title generation using BERT embeddings.

### πŸŽ₯ Audio & Video Processing
- **yt-dlp** β€” For downloading audio from YouTube videos.
- **ffmpeg** β€” For converting and processing audio formats (MP4 β†’ MP3/WAV).

### πŸ“ File Handling & Utilities
- **uuid** β€” For generating unique job identifiers.
- **pathlib / os / json** β€” For safe file and directory operations.

### 🌐 Frontend
- **HTML** β€” For rendering dynamic content using Flask templates.

### πŸ“ Output
- **JSON** β€” Chapters with timestamps, titles, and summaries.

---

## πŸš€ Features

- πŸŽ₯ Accepts a YouTube video URL as input
- 🧠 Converts spoken Hindi/Hinglish content into English summaries
- πŸ• Breaks videos into timestamped chapters (default: every 5 minutes)
- πŸ“ Generates meaningful chapter titles using keyword extraction
- πŸ“ Generates a structured `.json` file containing start time, title, and summary
- 🌐 Simple Flask UI to interact with the tool via browser

---

## πŸ“‚ Project Structure

```text
chatify/
β”œβ”€β”€ app.py # Flask application entry point
β”œβ”€β”€ workspace/ # Temporary folder to store job-specific files
β”œβ”€β”€ templates/
β”‚ └── index.html # Main web interface
β”œβ”€β”€ static/
β”‚ └── style.css # Web design
β”œβ”€β”€ trail/ # Demo files (sample output)
β”‚ β”œβ”€β”€ try.ipynb
β”‚ └── chapters.ipynb
β”œβ”€β”€ pipeline/
β”‚ β”œβ”€β”€ downloader.py # Uses yt-dlp to extract audio from YouTube
β”‚ β”œβ”€β”€ transcriber.py # Whisper transcription + transcript saver
β”‚ β”œβ”€β”€ chapterizer.py # Chunking + summarization + title generation
β”‚ └── utils.py # Time conversion utilities

```
---

## πŸ”§ Pipeline Explanation

### 1. πŸŽ₯ Input: YouTube Video Link
- The user provides a YouTube video URL.
- The audio stream is extracted and saved as an MP3 using `yt-dlp` and `ffmpeg`.

### 2. πŸ—£οΈ ASR (Automatic Speech Recognition) with Whisper
- Audio is transcribed using OpenAI's **Whisper** model.
- **Output**: Timestamped transcript in Hindi/Hinglish.
- **Format**: `[start_time - end_time]: text`

### 3. 🧹 Preprocessing
- The transcript is cleaned and formatted.
- Each segment includes a timestamp and its corresponding spoken content.

### 4. 🧩 Chunking into Segments
- The transcript is split into fixed-length chunks (e.g., 300 seconds = 5 minutes).
- Timestamp alignment is preserved.
- Each chunk is treated as a potential chapter.

### 5. 🧠 Summarization using mBART
- Each chunk is summarized using **mBART**, a multilingual transformer fine-tuned for Hindi-to-English summarization.
- **Output**: Concise English summary of the chunk’s content.

### 6. 🏷️ Chapter Title Generation with KeyBERT
- Using **KeyBERT**, important keywords are extracted from each summary.
- The most relevant keyword or phrase is selected as the chapter title.

### 7. πŸ“¦ Chapter Assembly
- For each chunk, the following are saved:
- `start_time`
- `summary`
- `title`
- Final output is stored as a structured `.json` file.

---

## βœ… Example Output

```json
[
{
"start_time": "0:00:00",
"title": "Social Professions",
"summary": "The speaker discusses how certain professions like tea vendors, garbage collectors, and dancers are perceived with bias in Indian society..."
},
{
"start_time": "0:05:00",
"title": "Education Challenges",
"summary": "The video highlights problems in the Indian education system including outdated curriculum, exam pressure, and limited access in rural areas..."
}
]