Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/drogbadvc/extract-sub-video-streamlit
Extract Sub from Video for Streamlit : Extract hardcoded subtitles from videos and Youtube.
https://github.com/drogbadvc/extract-sub-video-streamlit
machine-learning ocr streamlit subtitles
Last synced: about 1 month ago
JSON representation
Extract Sub from Video for Streamlit : Extract hardcoded subtitles from videos and Youtube.
- Host: GitHub
- URL: https://github.com/drogbadvc/extract-sub-video-streamlit
- Owner: drogbadvc
- Created: 2023-05-10T22:06:46.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-05-10T22:19:58.000Z (over 1 year ago)
- Last Synced: 2024-11-08T06:44:22.124Z (3 months ago)
- Topics: machine-learning, ocr, streamlit, subtitles
- Language: Python
- Homepage:
- Size: 62.5 KB
- Stars: 5
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Introduction
Extract Sub from Video for Streamlit : Extract hardcoded subtitles from videos and Youtube.
Using a Streamlit interface, Youtube video download + video cropping and subtitle extraction.
Program works with machine learning, multilingual OCR toolkits, ffmpeg, a flask server, docker...
##what technology and program used
- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6) (PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.)
- Paddle works on docker because it requires a specific version of Linux. But a flask API is developed to make it work. You just have to launch the docker command. See the explanations about the installation.
- [Videocr](https://github.com/oliverfei/videocr-PaddleOCR) Extract hardcoded (burned-in) subtitles from videos using the PaddleOCR OCR engine with Python
- videocr is integrated in the flask api which is launched with the docker.
- [Streamlit](https://streamlit.io/) a graphical interface allows to manage all the process of file upload, Youtube video download as well as cropping and subtitle extraction.##how to install it
**Note: you must have docker and python3.7 minimum installed on your machine**
```bash
sudo docker build -t your_name .
sudo docker run -p 5000:5000 --name your_name_container your_name
```From there, you should have the api that handles OCR working well
```bash
cd app
pip install -r requirements.txt
streamlit run app_streamlit.py
```**Note: modify pip by pip3 for example depending on what you have**
Streamlit server starts, you can start using the application and play with the settings.
![Demo Image](./assets/demo.png)
## API
1. `filename`
File name uploaded to the docker server, either a cropped video or an original video
2. `lang`
The language of the subtitles. (French, Chinese, English, Latin)
3. `gpu`
if 1 the OCR process uses the GPU (it is more powerful and faster than the CPU.
4. `sim`
Similarity threshold for subtitle lines.
5. `conf`
Confidence threshold for word predictions. Words with lower confidence than this value will be discarded.6. `start_time` and `end_time`
Extract subtitles from only a clip of the video. The subtitle timestamps are still calculated according to the full video length.