https://github.com/profusion/pf-video-transcribe
Transcribe videos and create html pages with that
https://github.com/profusion/pf-video-transcribe
Last synced: 15 days ago
JSON representation
Transcribe videos and create html pages with that
- Host: GitHub
- URL: https://github.com/profusion/pf-video-transcribe
- Owner: profusion
- Created: 2023-05-29T14:50:39.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-29T21:20:24.000Z (about 3 years ago)
- Last Synced: 2025-11-11T08:03:03.052Z (8 months ago)
- Language: Python
- Size: 55.7 KB
- Stars: 1
- Watchers: 16
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
ProFUSION Video Transcribe
==========================
Install
-------
Install the project using `Poetry `_:
.. code-block:: console
$ poetry install --with dev
Installing dependencies from lock file
...
Installing the current project: pf-video-transcribe
This project uses `Faster Whisper `_,
a faster implementation of `OpenAI's Whisper `_,
which in turn is built on top of `CTranslate2 `_
hardware optimizations, that requires installation of **NVidia CUDA libraries**, see
`their installation instructions `_.
Run
---
Run the command line tool:
.. code-block:: console
$ pf-video-transcribe --help
All commands take ``--log=LEVEL`` or ``--log=DOMAIN:LEVEL`` to change the
log level of every package, such as ``pf_video_transcribe.transcribe``,
``faster_whisper`` and so on. If no domain is given, then the provided level
applies to all log domains. This is a global option and should be specified
before the subcommand.
Subcommands are explained in the next sections.
Transcription
=============
Given that the transcription is a heavy process and takes a lot to load the model
and then to process each media file, it's implemented as a batch operation that
generates an intermediate in the `JSON Lines `_
(``".jsonl"``) format, with a ``"header"`` line followed by all the ``"segment"``,
ended by a ``"finished"`` line with success or failure indicator. Each ``segment``
carries the useful information extracted by
`OpenAI Whisper `_:
.. code-block:: console
$ pf-video-transcribe transcribe videos/my-video.mp4 videos/other-video.mp4
This will generate ``videos/my-video.jsonl`` and ``videos/other-video.jsonl``.
Note that the first time it will take a lot to download the model from the internet.
In the next iterations, the local model will be used, but first they will be checked
remotely -- which can also take time. Using the ``--local`` flag will skip that check.
The language is auto-detected from the first 30 seconds of actual sound (silent is
ignored), but if you do know the language, use the ``--language=LANG`` flag.
Audio Speech Recognition (ASR) models work on slices of the media, producing segments
that are smaller than an actual human language sentence/phrase.
The ``--merge-threshold=SECONDS`` will merge sibling segments if:
``next_segment.start - last_segment.end <= merge_threshold``. The default is 1 second.
A more complex example:
.. code-block:: console
$ pf-video-transcribe \
--log=DEBUG \
transcribe \
--local \
--language=pt \
--merge-threshold=5 \
videos/my-video.mp4 videos/other-video.mp4
With the transcribed ``".jsonl"`` one can convert to more usable formats,
see the next sections.
Convert to HTML
===============
This generates the HTML meant to easy viewing of the result, a ```` linking
to the transcribed media alongside a ```` linking to the
subtitles, the thumbnail to be used by `OpenGraph `_ ``og:image``
and the actual transcription segments.
Note: both ``.vtt`` (subtitles) and ``.jpeg`` (thumbnail) are auto-generated
if they don't exist or if they are older than the actual input ``.jsonl``.
Convert to VTT
==============
Web Video Text Track is a subtitle specified by the
`W3C `_ and used by all web browsers whenever
specified inside the ```` element.
The conversion takes parameter ``--duration-threshold=SECONDS`` to control the maximum
duration of a single subtitle entry.
.. code-block:: console
$ pf-video-transcribe vtt videos/*.jsonl
Convert to SRT
==============
SRT or SubRip is a defacto standard subtitle format that most media players will take.
The conversion takes parameter ``--duration-threshold=SECONDS`` to control the maximum
duration of a single subtitle entry.
.. code-block:: console
$ pf-video-transcribe srt videos/*.jsonl
Create Thumbnail
================
Uses `FFmpeg `_ to generate a thumbnail from the video or
its transcription. The ``--size=WIDTHxHEIGHT`` allows to override the default
``320x-1`` (-1 is used to calculate that dimension from the other, keeping the
aspect ratio).
.. code-block:: console
$ pf-video-transcribe thumbnail videos/*.jsonl
Creating Index HTML
===================
Recursively scans the given directories looking for ``.html`` files, which
can be produced by this tool or not. The generated index will take the ````
and ```` to gather the actual title or preview.
It's a very simple way to generate a landing page.
.. code-block:: console
$ pf-video-transcribe index_html videos/
Serving (Development)
=====================
While developing this tool or playing with parameters it's useful to serve
the files from ``http://`` as the ``file://`` will have some issues with
video files (security limitations). By default serves at ``--port=8000``.
.. code-block:: console
$ pf-video-transcribe serve videos/
Development
-----------
Install the project with development dependencies:
.. code-block:: console
$ poetry install --with dev
Installing dependencies from lock file
...
Installing the current project: pf-video-transcribe
Install `pre-commit `_ in your machine, then install the GIT Hooks:
.. code-block:: console
$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
pre-commit installed at .git/hooks/pre-push
pre-commit installed at .git/hooks/pre-merge-commit
Used tools:
* Code Formatter: `Black `_
* Static Type Checker: `MyPy `_
* Style Enforcement/Linter: `Flake8 `_