Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/oelmekki/yt-ttml2txt

Convert Youtube's TTML subtitle files to plain text
https://github.com/oelmekki/yt-ttml2txt

subtitles ttml youtube youtube-dl yt-dlp

Last synced: 17 days ago
JSON representation

Convert Youtube's TTML subtitle files to plain text

Awesome Lists containing this project

README

        

# yt-ttml2txt

> Note to GitHub users : development is happening on [GitLab](https://gitlab.com/oelmekki/yt-ttml2txt),
> please go there if you want to open issues or submit merge request.

Converts a Youtube TTML subtitle file to a text file.

## Install

```
make # build with gcc
sudo make install # install in /usr/local/bin/

# Alternatives
make CC=clang # build with an other compiler, here clang
sudo make install PREFIX=/usr/bin/ # install in an other place, here /usr/bin/
```

## Usage

```
yt-ttml2txt [-1]

Converts a TTML subtitle file to a text file, written on STDOUT.

Options:

`-1`: print all in one line (eg to facilitate grepping).
```

Note that this is very simple parsing tested only against Youtube's TTML
files and probably only working with them.

My goal is not to fully support any valid TTML file, just what Youtube
produces. If this simple parsing turns out to be unstable, I'll rewrite it
into a full blown AST parser.

## Why?

I wrote this to be able to dump text content for Youtube videos and then
grep them, providing local full text search for Youtube videos I care
about. Here is how I do it (you need [yt-dlp](https://github.com/yt-dlp/yt-dlp)
or similar for that):

```
yt-dlp --skip-download --write-auto-sub --sub-format ttml
yt-ttml2txt -1 > /
grep -r "" /
```

You now can grep your favorite videos content locally for the low price of
a few text files in storage.

## Credits

This idea came after reading Jeff Atwood mention
[his use of Youtube subtitles to access content](https://infosec.exchange/@codinghorror/111433268487559172).