Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fedecalendino/pysub-parser
Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).
https://github.com/fedecalendino/pysub-parser
extract parser parsing pysub-parser srt ssa sub subtitle subtitles timestamp
Last synced: 2 months ago
JSON representation
Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).
- Host: GitHub
- URL: https://github.com/fedecalendino/pysub-parser
- Owner: fedecalendino
- License: mit
- Created: 2018-10-14T05:13:26.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2024-03-20T16:53:53.000Z (10 months ago)
- Last Synced: 2024-05-18T18:03:27.606Z (8 months ago)
- Topics: extract, parser, parsing, pysub-parser, srt, ssa, sub, subtitle, subtitles, timestamp
- Language: Python
- Homepage: https://pypi.org/project/pysub-parser/
- Size: 85 KB
- Stars: 51
- Watchers: 4
- Forks: 4
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## pysub-parser
[![Version](https://img.shields.io/pypi/v/pysub-parser?logo=pypi)](https://pypi.org/project/pysub-parser)
[![Quality Gate Status](https://img.shields.io/sonar/alert_status/fedecalendino_pysub-parser?logo=sonarcloud&server=https://sonarcloud.io)](https://sonarcloud.io/dashboard?id=fedecalendino_pysub-parser)
[![CodeCoverage](https://img.shields.io/sonar/coverage/fedecalendino_pysub-parser?logo=sonarcloud&server=https://sonarcloud.io)](https://sonarcloud.io/dashboard?id=fedecalendino_pysub-parser)Utility to extract the contents of a subtitle file.
Supported types:
* `ass`: [Advanced SubStation Alpha](https://en.wikipedia.org/wiki/SubStation_Alpha#Advanced_SubStation_Alpha)
* `ssa`: [SubStation Alpha](https://en.wikipedia.org/wiki/SubStation_Alpha)
* `srt`: [SubRip](https://en.wikipedia.org/wiki/SubRip)
* `sub`: [MicroDVD](https://en.wikipedia.org/wiki/MicroDVD)
* `txt`: [Sub Viewer](https://en.wikipedia.org/wiki/SubViewer)> For more information: http://write.flossmanuals.net/video-subtitling/file-formats
### Usage
The method parse requires the following parameters:
* `path`: location of the subtitle file.
* `subtype`: one of the supported file types, by default file extension is used.
* `encoding`: encoding of the file, `utf-8` by default.
* `**kwargs`: optional parameters.
* `fps`: framerate (only used by `sub` files), `23.976` by default.```python
from pysubparser import parsersubtitles = parser.parse('./files/space-jam.srt')
for subtitle in subtitles:
print(subtitle)
```Output:
```text
0 > [BALL BOUNCING]
1 > Michael?
2 > What are you doing out here, son? It's after midnight.
3 > MICHAEL: Couldn't sleep, Pops.
```___
### Subtitle Class
Each line of a dialogue is represented with a `Subtitle` object with the following properties:
* `index`: position in the file.
* `start`: timestamp of the start of the dialog.
* `end`: timestamp of the end of the dialog.
* `text`: dialog contents.```python
for subtitle in subtitles:
print(f'{subtitle.start} > {subtitle.end}')
print(subtitle.text)
print()
```Output:
```text
00:00:36.328000 > 00:00:38.329000
[BALL BOUNCING]00:01:03.814000 > 00:01:05.189000
Michael?00:01:08.402000 > 00:01:11.404000
What are you doing out here, son? It's after midnight.00:01:11.572000 > 00:01:13.072000
MICHAEL: Couldn't sleep, Pops.
```### Cleaners
Currently, 4 cleaners are provided:
* `ascii` will translate every unicode character to its ascii equivalent.
* `brackets` will remove anything between them (e.g., `[BALL BOUNCING]`)
* `formatting` will remove formatting keys like `` and ``.
* `lower_case` will lower case all text.```python
from pysubparser.cleaners import ascii, brackets, formatting, lower_casesubtitles = brackets.clean(
lower_case.clean(
subtitles
)
)for subtitle in subtitles:
print(subtitle)
``````text
0 >
1 > michael?
2 > what are you doing out here, son? it's after midnight.
3 > michael: couldn't sleep, pops.
```### Writers
Given any list of `Subtitle` and a path it will output those subtitles in a `srt` format.