https://github.com/wendy7756/AI-Video-Transcriber
Transcribe and summarize video content using AI. Open-source, multi-platform, and supports multiple languages.
https://github.com/wendy7756/AI-Video-Transcriber
aitool tiktok transcribe videototext youtube
Last synced: 9 months ago
JSON representation
Transcribe and summarize video content using AI. Open-source, multi-platform, and supports multiple languages.
- Host: GitHub
- URL: https://github.com/wendy7756/AI-Video-Transcriber
- Owner: wendy7756
- License: apache-2.0
- Created: 2025-08-28T06:27:27.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-09-05T17:54:29.000Z (9 months ago)
- Last Synced: 2025-09-05T19:36:53.087Z (9 months ago)
- Topics: aitool, tiktok, transcribe, videototext, youtube
- Language: Python
- Homepage:
- Size: 279 KB
- Stars: 120
- Watchers: 0
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI Video Transcriber
English | [δΈζ](README_ZH.md)
An AI-powered video transcription and summarization tool that supports multiple video platforms including YouTube, Tiktok, Bilibili, and 30+ platforms.

## β¨ Features
- π₯ **Multi-Platform Support**: Works with YouTube, Tiktok, Bilibili, and 30+ more
- π£οΈ **Intelligent Transcription**: High-accuracy speech-to-text using Faster-Whisper
- π€ **AI Text Optimization**: Automatic typo correction, sentence completion, and intelligent paragraphing
- π **Multi-Language Summaries**: Generate intelligent summaries in multiple languages
- β‘ **Real-Time Progress**: Live progress tracking and status updates
- βοΈ **Conditional Translation**: When the selected summary language differs from the detected transcript language, the system auto-translates with GPTβ4o
- π± **Mobile-Friendly**: Perfect support for mobile devices
## π Quick Start
### Prerequisites
- Python 3.8+
- FFmpeg
- Optional: OpenAI API key (for AI summary features)
### Installation
#### Method 1: Automatic Installation
```bash
# Clone the repository
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber
# Run installation script
chmod +x install.sh
./install.sh
```
#### Method 2: Docker
```bash
# Clone the repository
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber
# Using Docker Compose (easiest)
cp .env.example .env
# Edit .env file and set your OPENAI_API_KEY
docker-compose up -d
# Or using Docker directly
docker build -t ai-video-transcriber .
docker run -p 8000:8000 -e OPENAI_API_KEY="your_api_key_here" ai-video-transcriber
```
#### Method 3: Manual Installation
1. **Install Python Dependencies**
```bash
# macOS (PEP 668) strongly recommends using a virtualenv
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
```
2. **Install FFmpeg**
```bash
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# CentOS/RHEL
sudo yum install ffmpeg
```
3. **Configure Environment Variables**
```bash
# Required for AI summary/translation features
export OPENAI_API_KEY="your_api_key_here"
# Optional: only if you use a custom OpenAI-compatible gateway
### Start the Service
```bash
python3 start.py
```
After the service starts, open your browser and visit `http://localhost:8000`
#### Production Mode (Recommended for long videos)
To avoid SSE disconnections during long processing, start in production mode (hot-reload disabled):
```bash
python3 start.py --prod
```
This keeps the SSE connection stable throughout long tasks (30β60+ min).
#### Run with explicit env (example)
```bash
source .venv/bin/activate
export OPENAI_API_KEY=your_api_key_here
# export OPENAI_BASE_URL=https://oneapi.basevec.com/v1 # if using a custom endpoint
python3 start.py --prod
```
## π Usage Guide
1. **Enter Video URL**: Paste a video link from YouTube, Bilibili, or other supported platforms
2. **Select Summary Language**: Choose the language for the generated summary
3. **Start Processing**: Click the "Start" button
4. **Monitor Progress**: Watch real-time progress through multiple stages:
- Video download and parsing
- Audio transcription with Faster-Whisper
- AI-powered transcript optimization (typo correction, sentence completion, intelligent paragraphing)
- AI summary generation in selected language
5. **View Results**: Review the optimized transcript and intelligent summary
- If transcript language β selected summary language, a third tab βTranslationβ is shown containing a translated transcript
6. **Download Files**: Click download buttons to save Markdown-formatted files (Transcript / Translation / Summary)
## π οΈ Technical Architecture
### Backend Stack
- **FastAPI**: Modern Python web framework
- **yt-dlp**: Video downloading and processing
- **Faster-Whisper**: Efficient speech transcription
- **OpenAI API**: Intelligent text summarization
### Frontend Stack
- **HTML5 + CSS3**: Responsive interface design
- **JavaScript (ES6+)**: Modern frontend interactions
- **Marked.js**: Markdown rendering
- **Font Awesome**: Icon library
### Project Structure
```
AI-Video-Transcriber/
βββ backend/ # Backend code
β βββ main.py # FastAPI main application
β βββ video_processor.py # Video processing module
β βββ transcriber.py # Transcription module
β βββ summarizer.py # Summary module
β βββ translator.py # Translation module
βββ static/ # Frontend files
β βββ index.html # Main page
β βββ app.js # Frontend logic
βββ temp/ # Temporary files directory
βββ Dockerfile # Docker image configuration
βββ docker-compose.yml # Docker Compose configuration
βββ .dockerignore # Docker ignore rules
βββ .env.example # Environment variables template
βββ requirements.txt # Python dependencies
βββ start.py # Startup script
βββ README.md # Project documentation
```
## βοΈ Configuration Options
### Environment Variables
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| `OPENAI_API_KEY` | OpenAI API key | - | Yes (for AI features) |
| `HOST` | Server address | `0.0.0.0` | No |
| `PORT` | Server port | `8000` | No |
| `WHISPER_MODEL_SIZE` | Whisper model size | `base` | No |
### Whisper Model Size Options
| Model | Parameters | English-only | Multilingual | Speed | Memory Usage |
|-------|------------|--------------|--------------|-------|--------------|
| tiny | 39 M | β | β | Fast | Low |
| base | 74 M | β | β | Medium | Low |
| small | 244 M | β | β | Medium | Medium |
| medium | 769 M | β | β | Slow | Medium |
| large | 1550 M | β | β | Very Slow | High |
## π§ FAQ
### Q: Why is transcription slow?
A: Transcription speed depends on video length, Whisper model size, and hardware performance. Try using smaller models (like tiny or base) to improve speed.
### Q: Which video platforms are supported?
A: All platforms supported by yt-dlp, including but not limited to: YouTube, TikTok, Facebook, Instagram, Twitter, Bilibili, Youku, iQiyi, Tencent Video, etc.
### Q: What if the AI optimization features are unavailable?
A: Both transcript optimization and summary generation require an OpenAI API key. Without it, the system provides the raw transcript from Whisper and a simplified summary.
### Q: I get HTTP 500 errors when starting/using the service. Why?
A: In most cases this is an environment configuration issue rather than a code bug. Please check:
- Ensure a virtualenv is activated: `source .venv/bin/activate`
- Install deps inside the venv: `pip install -r requirements.txt`
- Set `OPENAI_API_KEY` (required for summary/translation)
- If using a custom gateway, set `OPENAI_BASE_URL` correctly and ensure network access
- Install FFmpeg: `brew install ffmpeg` (macOS) / `sudo apt install ffmpeg` (Debian/Ubuntu)
- If port 8000 is occupied, stop the old process or change `PORT`
### Q: How to handle long videos?
A: The system can process videos of any length, but processing time will increase accordingly. For very long videos, consider using smaller Whisper models.
### Q: How to use Docker for deployment?
A: Docker provides the easiest deployment method:
**Prerequisites:**
- Install Docker Desktop from https://www.docker.com/products/docker-desktop/
- Ensure Docker service is running
**Quick Start:**
```bash
# Clone and setup
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber
cp .env.example .env
# Edit .env file to set your OPENAI_API_KEY
# Start with Docker Compose (recommended)
docker-compose up -d
# Or build and run manually
docker build -t ai-video-transcriber .
docker run -p 8000:8000 --env-file .env ai-video-transcriber
```
**Common Docker Issues:**
- **Port conflict**: Change port mapping `-p 8001:8000` if 8000 is occupied
- **Permission denied**: Ensure Docker Desktop is running and you have proper permissions
- **Build fails**: Check disk space (need ~2GB free) and network connection
- **Container won't start**: Verify .env file exists and contains valid OPENAI_API_KEY
**Docker Commands:**
```bash
# View running containers
docker ps
# Check container logs
docker logs ai-video-transcriber-ai-video-transcriber-1
# Stop service
docker-compose down
# Rebuild after changes
docker-compose build --no-cache
```
### Q: What are the memory requirements?
A: Memory usage varies depending on the deployment method and workload:
**Docker Deployment:**
- **Base memory**: ~128MB for idle container
- **During processing**: 500MB - 2GB depending on video length and Whisper model
- **Docker image size**: ~1.6GB disk space required
- **Recommended**: 4GB+ RAM for smooth operation
**Traditional Deployment:**
- **Base memory**: ~50-100MB for FastAPI server
- **Whisper models memory usage**:
- `tiny`: ~150MB
- `base`: ~250MB
- `small`: ~750MB
- `medium`: ~1.5GB
- `large`: ~3GB
- **Peak usage**: Base + Model + Video processing (~500MB additional)
**Memory Optimization Tips:**
```bash
# Use smaller Whisper model to reduce memory usage
WHISPER_MODEL_SIZE=tiny # or base
# For Docker, limit container memory if needed
docker run -m 1g -p 8000:8000 --env-file .env ai-video-transcriber
# Monitor memory usage
docker stats ai-video-transcriber-ai-video-transcriber-1
```
### Q: Network connection errors or timeouts?
A: If you encounter network-related errors during video downloading or API calls, try these solutions:
**Common Network Issues:**
- Video download fails with "Unable to extract" or timeout errors
- OpenAI API calls return connection timeout or DNS resolution failures
- Docker image pull fails or is extremely slow
**Solutions:**
1. **Switch VPN/Proxy**: Try connecting to a different VPN server or switch your proxy settings
2. **Check Network Stability**: Ensure your internet connection is stable
3. **Retry After Network Change**: Wait 30-60 seconds after changing network settings before retrying
4. **Use Alternative Endpoints**: If using custom OpenAI endpoints, verify they're accessible from your network
5. **Docker Network Issues**: Restart Docker Desktop if container networking fails
**Quick Network Test:**
```bash
# Test video platform access
curl -I https://www.youtube.com/
# Test OpenAI API access (replace with your endpoint)
curl -I https://api.openai.com
# Test Docker Hub access
docker pull hello-world
``
## π― Supported Languages
### Transcription
- Supports 100+ languages through Whisper
- Automatic language detection
- High accuracy for major languages
### Summary Generation
- English
- Chinese (Simplified)
- Japanese
- Korean
- Spanish
- French
- German
- Portuguese
- Russian
- Arabic
- And more...
## π Performance Tips
- **Hardware Requirements**:
- Minimum: 4GB RAM, dual-core CPU
- Recommended: 8GB RAM, quad-core CPU
- Ideal: 16GB RAM, multi-core CPU, SSD storage
- **Processing Time Estimates**:
| Video Length | Estimated Time | Notes |
|-------------|----------------|-------|
| 1 minute | 30s-1 minute | Depends on network and hardware |
| 5 minutes | 2-5 minutes | Recommended for first-time testing |
| 15 minutes | 5-15 minutes | Suitable for regular use |
## π€ Contributing
We welcome Issues and Pull Requests!
1. Fork the project
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## Acknowledgments
- [yt-dlp](https://github.com/yt-dlp/yt-dlp) - Powerful video downloading tool
- [Faster-Whisper](https://github.com/guillaumekln/faster-whisper) - Efficient Whisper implementation
- [FastAPI](https://fastapi.tiangolo.com/) - Modern Python web framework
- [OpenAI](https://openai.com/) - Intelligent text processing API
## π Contact
For questions or suggestions, please submit an Issue or contact Wendy.
## β Star History
If you find this project helpful, please consider giving it a star!