An open API service indexing awesome lists of open source software.

https://github.com/profusion/pf-video-transcribe

Transcribe videos and create html pages with that
https://github.com/profusion/pf-video-transcribe

Last synced: 15 days ago
JSON representation

Transcribe videos and create html pages with that

Awesome Lists containing this project

README

          

ProFUSION Video Transcribe
==========================

Install
-------

Install the project using `Poetry `_:

.. code-block:: console

$ poetry install --with dev
Installing dependencies from lock file
...
Installing the current project: pf-video-transcribe

This project uses `Faster Whisper `_,
a faster implementation of `OpenAI's Whisper `_,
which in turn is built on top of `CTranslate2 `_
hardware optimizations, that requires installation of **NVidia CUDA libraries**, see
`their installation instructions `_.

Run
---

Run the command line tool:

.. code-block:: console

$ pf-video-transcribe --help

All commands take ``--log=LEVEL`` or ``--log=DOMAIN:LEVEL`` to change the
log level of every package, such as ``pf_video_transcribe.transcribe``,
``faster_whisper`` and so on. If no domain is given, then the provided level
applies to all log domains. This is a global option and should be specified
before the subcommand.

Subcommands are explained in the next sections.

Transcription
=============

Given that the transcription is a heavy process and takes a lot to load the model
and then to process each media file, it's implemented as a batch operation that
generates an intermediate in the `JSON Lines `_
(``".jsonl"``) format, with a ``"header"`` line followed by all the ``"segment"``,
ended by a ``"finished"`` line with success or failure indicator. Each ``segment``
carries the useful information extracted by
`OpenAI Whisper `_:

.. code-block:: console

$ pf-video-transcribe transcribe videos/my-video.mp4 videos/other-video.mp4

This will generate ``videos/my-video.jsonl`` and ``videos/other-video.jsonl``.

Note that the first time it will take a lot to download the model from the internet.
In the next iterations, the local model will be used, but first they will be checked
remotely -- which can also take time. Using the ``--local`` flag will skip that check.

The language is auto-detected from the first 30 seconds of actual sound (silent is
ignored), but if you do know the language, use the ``--language=LANG`` flag.

Audio Speech Recognition (ASR) models work on slices of the media, producing segments
that are smaller than an actual human language sentence/phrase.
The ``--merge-threshold=SECONDS`` will merge sibling segments if:
``next_segment.start - last_segment.end <= merge_threshold``. The default is 1 second.

A more complex example:

.. code-block:: console

$ pf-video-transcribe \
--log=DEBUG \
transcribe \
--local \
--language=pt \
--merge-threshold=5 \
videos/my-video.mp4 videos/other-video.mp4

With the transcribed ``".jsonl"`` one can convert to more usable formats,
see the next sections.

Convert to HTML
===============

This generates the HTML meant to easy viewing of the result, a ```` linking
to the transcribed media alongside a ```` linking to the
subtitles, the thumbnail to be used by `OpenGraph `_ ``og:image``
and the actual transcription segments.

Note: both ``.vtt`` (subtitles) and ``.jpeg`` (thumbnail) are auto-generated
if they don't exist or if they are older than the actual input ``.jsonl``.

Convert to VTT
==============

Web Video Text Track is a subtitle specified by the
`W3C `_ and used by all web browsers whenever
specified inside the ```` element.

The conversion takes parameter ``--duration-threshold=SECONDS`` to control the maximum
duration of a single subtitle entry.

.. code-block:: console

$ pf-video-transcribe vtt videos/*.jsonl

Convert to SRT
==============

SRT or SubRip is a defacto standard subtitle format that most media players will take.
The conversion takes parameter ``--duration-threshold=SECONDS`` to control the maximum
duration of a single subtitle entry.

.. code-block:: console

$ pf-video-transcribe srt videos/*.jsonl

Create Thumbnail
================

Uses `FFmpeg `_ to generate a thumbnail from the video or
its transcription. The ``--size=WIDTHxHEIGHT`` allows to override the default
``320x-1`` (-1 is used to calculate that dimension from the other, keeping the
aspect ratio).

.. code-block:: console

$ pf-video-transcribe thumbnail videos/*.jsonl

Creating Index HTML
===================

Recursively scans the given directories looking for ``.html`` files, which
can be produced by this tool or not. The generated index will take the ````
and ```` to gather the actual title or preview.

It's a very simple way to generate a landing page.

.. code-block:: console

$ pf-video-transcribe index_html videos/

Serving (Development)
=====================

While developing this tool or playing with parameters it's useful to serve
the files from ``http://`` as the ``file://`` will have some issues with
video files (security limitations). By default serves at ``--port=8000``.

.. code-block:: console

$ pf-video-transcribe serve videos/

Development
-----------

Install the project with development dependencies:

.. code-block:: console

$ poetry install --with dev
Installing dependencies from lock file
...
Installing the current project: pf-video-transcribe

Install `pre-commit `_ in your machine, then install the GIT Hooks:

.. code-block:: console

$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
pre-commit installed at .git/hooks/pre-push
pre-commit installed at .git/hooks/pre-merge-commit

Used tools:

* Code Formatter: `Black `_
* Static Type Checker: `MyPy `_
* Style Enforcement/Linter: `Flake8 `_