https://github.com/krviolent/subtitles_extract
Tool for extraction hard-coded (hardsub) Chinese subtitles from video files with 720p resolution
https://github.com/krviolent/subtitles_extract
chinese chinese-translation easyocr machine-learning ocr python srt-subtitles subtitles video
Last synced: 11 months ago
JSON representation
Tool for extraction hard-coded (hardsub) Chinese subtitles from video files with 720p resolution
- Host: GitHub
- URL: https://github.com/krviolent/subtitles_extract
- Owner: krviolent
- License: apache-2.0
- Created: 2021-10-19T07:03:09.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-05-12T18:32:36.000Z (about 4 years ago)
- Last Synced: 2024-11-29T07:36:59.721Z (over 1 year ago)
- Topics: chinese, chinese-translation, easyocr, machine-learning, ocr, python, srt-subtitles, subtitles, video
- Language: Python
- Homepage:
- Size: 17.9 MB
- Stars: 12
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# subtitles_extract
Tool for extraction hardcoded chinese subtitles from video files with 720p resolution (1280 × 720) based on [EasyOCR](https://github.com/JaidedAI/EasyOCR) tool by [JaidedAI](https://github.com/JaidedAI)
Inspride by [Entrepreneurial Age/创业时代 (2018)](https://www.imdb.com/title/tt9085276/)
# Download:
git clone https://github.com/krviolent/subtitles_extract.git
or tap Code -> Download ZIP and extract
# Install requirements:
OS: Windows 10/WSL
Instructions: [Enable and install WSL](https://www.windowscentral.com/install-windows-subsystem-linux-windows-10)
Install python3, ffmpeg, easyocr (https://github.com/JaidedAI/EasyOCR):
sudo apt install python3
sudo apt install ffmpeg
git clone https://github.com/JaidedAI/EasyOCR.git
cd EasyOCR
sudo python3 setup.py install
# Use:
Tested on WSL Ubuntu 20.04. Meet some difficulties running CUDA on Windows to use GPU for OCR.
bash scripts/run_extract_subs.sh [video.mp4] [episode_number] [duration_of_video_in_seconds] [frame_rate]
[duration_of_video_in_seconds] - optional argument
[frame_rate] = 1
Example:
bash scripts/run_extract_subs.sh video_ep34.mp4 34 2600
Divide subs_file_[EP].txt into the timestamps.txt and textonly.txt:
bash scripts/divide_timestamp_and_text.py [episode_number]
# Steps to extract subtitles into the text file:
1. crop.sh -> frame_xx/*.jpg
2. 2580 - 43 minites, 2600 - ok
python3 easyocr_test.py [episode_number] [duration_in_seconds]
Output files will saved in files:
subs/subs_file_[episode_number].txt
subs/EP.A.[episode_number]/subs_[episode_number].srt
3. Auto-translate obtained subs using https://translatesubtitles.co/
# Optional (replace names, for example):
bash scripts/replace.sh
command to replace A -> B:
sed -i -e 's/[A]/[B]/g' subs_file.srt
This might not work quite right.
# Info
Duplicated subs not removed during extraction, because same phrases might be repeated during video.
Also sometimes recognition accuracy is not sophisticated.