https://github.com/avanigupta06/chaptify
https://github.com/avanigupta06/chaptify
ffmpeg flask html keybert mbart whisper-ai yt-dlp
Last synced: 12 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/avanigupta06/chaptify
- Owner: avanigupta06
- License: mit
- Created: 2025-06-09T04:44:08.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-09T05:30:19.000Z (about 1 year ago)
- Last Synced: 2025-06-18T07:01:43.315Z (12 months ago)
- Topics: ffmpeg, flask, html, keybert, mbart, whisper-ai, yt-dlp
- Language: Jupyter Notebook
- Homepage:
- Size: 30.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π Chatify β YouTube Video Chaptering & Summarization Tool
**Chatify** is an AI-powered Flask web application that takes a YouTube video URL and automatically generates meaningful **chapter-wise summaries** and **titles** from spoken Hindi or Hinglish content. It uses **OpenAI's Whisper**, **mBART**, and **KeyBERT** to convert speech to text, summarize it, and generate chapter titles. The output is a clean, timestamped JSON fileβideal for content indexing, accessibility, or quick navigation.
---
## π§° Tech Stack
### π₯οΈ Backend
- **Flask** β Lightweight Python web framework for handling routes and requests.
### π§ Machine Learning & NLP
- **Whisper (by OpenAI)** β For speech-to-text transcription from Hindi/Hinglish audio.
- **mBART (by Facebook AI)** β For abstractive summarization and Hindi β English translation.
- **KeyBERT** β For keyword-based title generation using BERT embeddings.
### π₯ Audio & Video Processing
- **yt-dlp** β For downloading audio from YouTube videos.
- **ffmpeg** β For converting and processing audio formats (MP4 β MP3/WAV).
### π File Handling & Utilities
- **uuid** β For generating unique job identifiers.
- **pathlib / os / json** β For safe file and directory operations.
### π Frontend
- **HTML** β For rendering dynamic content using Flask templates.
### π Output
- **JSON** β Chapters with timestamps, titles, and summaries.
---
## π Features
- π₯ Accepts a YouTube video URL as input
- π§ Converts spoken Hindi/Hinglish content into English summaries
- π Breaks videos into timestamped chapters (default: every 5 minutes)
- π Generates meaningful chapter titles using keyword extraction
- π Generates a structured `.json` file containing start time, title, and summary
- π Simple Flask UI to interact with the tool via browser
---
## π Project Structure
```text
chatify/
βββ app.py # Flask application entry point
βββ workspace/ # Temporary folder to store job-specific files
βββ templates/
β βββ index.html # Main web interface
βββ static/
β βββ style.css # Web design
βββ trail/ # Demo files (sample output)
β βββ try.ipynb
β βββ chapters.ipynb
βββ pipeline/
β βββ downloader.py # Uses yt-dlp to extract audio from YouTube
β βββ transcriber.py # Whisper transcription + transcript saver
β βββ chapterizer.py # Chunking + summarization + title generation
β βββ utils.py # Time conversion utilities
```
---
## π§ Pipeline Explanation
### 1. π₯ Input: YouTube Video Link
- The user provides a YouTube video URL.
- The audio stream is extracted and saved as an MP3 using `yt-dlp` and `ffmpeg`.
### 2. π£οΈ ASR (Automatic Speech Recognition) with Whisper
- Audio is transcribed using OpenAI's **Whisper** model.
- **Output**: Timestamped transcript in Hindi/Hinglish.
- **Format**: `[start_time - end_time]: text`
### 3. π§Ή Preprocessing
- The transcript is cleaned and formatted.
- Each segment includes a timestamp and its corresponding spoken content.
### 4. π§© Chunking into Segments
- The transcript is split into fixed-length chunks (e.g., 300 seconds = 5 minutes).
- Timestamp alignment is preserved.
- Each chunk is treated as a potential chapter.
### 5. π§ Summarization using mBART
- Each chunk is summarized using **mBART**, a multilingual transformer fine-tuned for Hindi-to-English summarization.
- **Output**: Concise English summary of the chunkβs content.
### 6. π·οΈ Chapter Title Generation with KeyBERT
- Using **KeyBERT**, important keywords are extracted from each summary.
- The most relevant keyword or phrase is selected as the chapter title.
### 7. π¦ Chapter Assembly
- For each chunk, the following are saved:
- `start_time`
- `summary`
- `title`
- Final output is stored as a structured `.json` file.
---
## β
Example Output
```json
[
{
"start_time": "0:00:00",
"title": "Social Professions",
"summary": "The speaker discusses how certain professions like tea vendors, garbage collectors, and dancers are perceived with bias in Indian society..."
},
{
"start_time": "0:05:00",
"title": "Education Challenges",
"summary": "The video highlights problems in the Indian education system including outdated curriculum, exam pressure, and limited access in rural areas..."
}
]