{"id":24178774,"url":"https://github.com/peterk/srt_equalizer","last_synced_at":"2025-04-09T13:10:19.228Z","repository":{"id":152580806,"uuid":"613851920","full_name":"peterk/srt_equalizer","owner":"peterk","description":"A Python module to transform subtitle line lengths, splitting into multiple subtitle fragments if necessary.","archived":false,"fork":false,"pushed_at":"2025-03-03T03:22:19.000Z","size":58,"stargazers_count":30,"open_issues_count":1,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T09:07:02.342Z","etag":null,"topics":["closed-captions","srt","subtitles","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/peterk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-14T11:59:30.000Z","updated_at":"2025-02-26T12:56:54.000Z","dependencies_parsed_at":"2024-04-02T20:36:55.600Z","dependency_job_id":"ba0b9290-635c-4b5e-9d9e-96893a1876b7","html_url":"https://github.com/peterk/srt_equalizer","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peterk%2Fsrt_equalizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peterk%2Fsrt_equalizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peterk%2Fsrt_equalizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peterk%2Fsrt_equalizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/peterk","download_url":"https://codeload.github.com/peterk/srt_equalizer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248045245,"owners_count":21038554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["closed-captions","srt","subtitles","whisper"],"created_at":"2025-01-13T05:13:23.421Z","updated_at":"2025-04-09T13:10:19.210Z","avatar_url":"https://github.com/peterk.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/peterk/srt_equalizer/badge)](https://securityscorecards.dev/viewer/?uri=github.com/peterk/srt_equalizer) ![PyPI - Downloads](https://img.shields.io/pypi/dm/srt_equalizer)\n\n\n# SRT Equalizer\n\nA Python module to transform subtitle line lengths, splitting into multiple subtitle\nfragments if necessary. Useful to adjust automatic speech recognition outputs from e.g. [Whisper](https://github.com/openai/whisper) to a more convenient size.\n\nThis library works for all languages where spaces separate words.\n\n## Installing\n\n`pip install srt_equalizer`\n\n## Example\n\nAn SRT file containing lines over a certain length can be adjusted to a maximum line length for better readability on screen.\n\n```\n1\n00:00:00,000 --\u003e 00:00:04,000\nGood evening. I appreciate you giving me a few minutes of your time tonight\n\n2\n00:00:04,000 --\u003e 00:00:11,000\nso I can discuss with you a complex and difficult issue, an issue that is one of the most profound of our time.\n```\n\nTo adjust line length to a maximum length of 42 chars you can use SRT equalizer like this:\n\n```python\n\nfrom srt_equalizer import srt_equalizer\n\nsrt_equalizer.equalize_srt_file(\"test.srt\", \"shortened.srt\", 42)\n```\n\n...they are split into multiple fragments and time code is adjusted to the\napproximate proportional length of each segment while staying inside the time\nslot for the fragment.\n\n```\n1\n00:00:00,000 --\u003e 00:00:02,132\nGood evening. I appreciate you giving me\n\n2\n00:00:02,132 --\u003e 00:00:04,000\na few minutes of your time tonight\n\n3\n00:00:04,000 --\u003e 00:00:06,458\nso I can discuss with you a complex and\n\n4\n00:00:06,458 --\u003e 00:00:08,979\ndifficult issue, an issue that is one of\n\n5\n00:00:08,979 --\u003e 00:00:11,000\nthe most profound of our time.\n```\n\n### Algorithms\nBy default, this script uses `greedy` algorithm which splits the text at the rightmost possible space.\n\nAn alternative splitting algorithm is `halving` which will split longer lines more evenly instead of always trying to use maximum line length. This prevents producing lines with isolated word remainders.\n\nAnother alternative is the `punctuation` algorithm that takes punctuation (commas, periods, etc.) into account. \n\n```python\n\nfrom srt_equalizer import srt_equalizer\n\n# use \"greedy\", \"halving\" or \"punctuation\" for the method parameter\nsrt_equalizer.equalize_srt_file(\"test.srt\", \"shortened.srt\", 42, method='halving')\n```\n\n## Adjust Whisper subtitle lengths\nIs is also possible to work with subtitle items produced from [Whisper](https://github.com/openai/whisper) with the following utility methods:\n\n```python\nsplit_subtitle(sub: srt.Subtitle, target_chars: int=42, start_from_index: int=1) -\u003e list[srt.Subtitle]:\n\nwhisper_result_to_srt(segments: list[dict]) -\u003e list[srt.Subtitle]:\n```\n\nHere is an example of how to reduce the lingth of subtitles created by Whisper. It assumes you have an audio file to transcribe called gwb.wav.\n\n```python\nimport whisper\nfrom srt_equalizer import srt_equalizer\nimport srt\nfrom datetime import timedelta\n\noptions_dict = {\"task\" : \"transcribe\", \"language\": \"en\"}\nmodel = whisper.load_model(\"small\")\nresult = model.transcribe(\"gwb.wav\", language=\"en\")\nsegments = result[\"segments\"]\nsubs = srt_equalizer.whisper_result_to_srt(segments)\n\n# Reduce line lenth in the whisper result to \u003c= 42 chars\nequalized = []\nfor sub in subs:\n    equalized.extend(srt_equalizer.split_subtitle(sub, 42))\n\nfor i in equalized:\n    print(i.content)\n```\n\n## Contributing\n\nThis library is built with [Poetry](https://python-poetry.org). Checkout this repo and run `poetry install` in the source folder. To run tests use `poetry run pytest tests`.\n\nTo build a new release, create a new tag, build it and publish to pypi:\n```\npoetry run pytest tests\ngit tag v0.1.2\npoetry build\npoetry publish\n```\n\nIf you want to explore the library start a `poetry shell`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeterk%2Fsrt_equalizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpeterk%2Fsrt_equalizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeterk%2Fsrt_equalizer/lists"}