https://github.com/fedecalendino/pysub-parser

Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).
https://github.com/fedecalendino/pysub-parser

extract parser parsing pysub-parser srt ssa sub subtitle subtitles timestamp

Last synced: 5 months ago
JSON representation

Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).

Host: GitHub
URL: https://github.com/fedecalendino/pysub-parser
Owner: fedecalendino
License: mit
Created: 2018-10-14T05:13:26.000Z (over 7 years ago)
Default Branch: main
Last Pushed: 2024-03-20T16:53:53.000Z (over 2 years ago)
Last Synced: 2026-01-11T00:15:55.271Z (6 months ago)
Topics: extract, parser, parsing, pysub-parser, srt, ssa, sub, subtitle, subtitles, timestamp
Language: Python
Homepage: https://pypi.org/project/pysub-parser/
Size: 85 KB
Stars: 53
Watchers: 4
Forks: 4
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          ## pysub-parser

[![Version](https://img.shields.io/pypi/v/pysub-parser?logo=pypi)](https://pypi.org/project/pysub-parser)

[![Quality Gate Status](https://img.shields.io/sonar/alert_status/fedecalendino_pysub-parser?logo=sonarcloud&server=https://sonarcloud.io)](https://sonarcloud.io/dashboard?id=fedecalendino_pysub-parser)

[![CodeCoverage](https://img.shields.io/sonar/coverage/fedecalendino_pysub-parser?logo=sonarcloud&server=https://sonarcloud.io)](https://sonarcloud.io/dashboard?id=fedecalendino_pysub-parser)

Utility to extract the contents of a subtitle file.

Supported types:

* `ass`: [Advanced SubStation Alpha](https://en.wikipedia.org/wiki/SubStation_Alpha#Advanced_SubStation_Alpha)

* `ssa`: [SubStation Alpha](https://en.wikipedia.org/wiki/SubStation_Alpha)

* `srt`: [SubRip](https://en.wikipedia.org/wiki/SubRip)

* `sub`: [MicroDVD](https://en.wikipedia.org/wiki/MicroDVD)

* `txt`: [Sub Viewer](https://en.wikipedia.org/wiki/SubViewer)

> For more information: http://write.flossmanuals.net/video-subtitling/file-formats

### Usage

The method parse requires the following parameters:

* `path`: location of the subtitle file.

* `subtype`: one of the supported file types, by default file extension is used.

* `encoding`: encoding of the file, `utf-8` by default.

* `**kwargs`: optional parameters.

  * `fps`: framerate (only used by `sub` files), `23.976` by default.

```python

from pysubparser import parser

subtitles = parser.parse('./files/space-jam.srt')

for subtitle in subtitles:

    print(subtitle)

```

Output:

```text

0 > [BALL BOUNCING]

1 > Michael?

2 > What are you doing out here, son? It's after midnight.

3 > MICHAEL: Couldn't sleep, Pops.

```

___

### Subtitle Class

Each line of a dialogue is represented with a `Subtitle` object with the following properties:

* `index`: position in the file.

* `start`: timestamp of the start of the dialog.

* `end`: timestamp of the end of the dialog.

* `text`: dialog contents.

```python

for subtitle in subtitles:

    print(f'{subtitle.start} > {subtitle.end}')

    print(subtitle.text)

    print()

```

Output:

```text

00:00:36.328000 > 00:00:38.329000

[BALL BOUNCING]

00:01:03.814000 > 00:01:05.189000

Michael?

00:01:08.402000 > 00:01:11.404000

What are you doing out here, son? It's after midnight.

00:01:11.572000 > 00:01:13.072000

MICHAEL: Couldn't sleep, Pops.

```

### Cleaners

Currently, 4 cleaners are provided:

* `ascii` will translate every unicode character to its ascii equivalent.

* `brackets` will remove anything between them (e.g., `[BALL BOUNCING]`)

* `formatting` will remove formatting keys like `` and ``.

* `lower_case` will lower case all text. 

```python

from pysubparser.cleaners import ascii, brackets, formatting, lower_case

subtitles = brackets.clean(

    lower_case.clean(

        subtitles

    )

)

for subtitle in subtitles:

    print(subtitle)

```

```text

0 > 

1 > michael?

2 > what are you doing out here, son? it's after midnight.

3 > michael: couldn't sleep, pops.

```

### Writers

Given any list of `Subtitle` and a path it will output those subtitles in a `srt` format.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fedecalendino/pysub-parser

Awesome Lists containing this project

README