{"id":13613533,"url":"https://github.com/oseiskar/autosubsync","last_synced_at":"2025-04-04T14:04:02.093Z","repository":{"id":57412961,"uuid":"150953681","full_name":"oseiskar/autosubsync","owner":"oseiskar","description":"Automatically synchronize subtitles with audio using machine learning","archived":false,"fork":false,"pushed_at":"2023-06-21T06:14:26.000Z","size":85,"stargazers_count":404,"open_issues_count":3,"forks_count":36,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-03-28T13:08:00.297Z","etag":null,"topics":["command-line-tool","ffmpeg","machine-learning","python","subtitles"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oseiskar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-09-30T09:41:16.000Z","updated_at":"2025-03-25T11:50:55.000Z","dependencies_parsed_at":"2024-01-16T23:30:32.875Z","dependency_job_id":"1c9967db-2cf5-4069-9163-932c59e0d429","html_url":"https://github.com/oseiskar/autosubsync","commit_stats":{"total_commits":57,"total_committers":2,"mean_commits":28.5,"dds":0.01754385964912286,"last_synced_commit":"ed9f53057505805394d0626d831bab6486c0658b"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oseiskar%2Fautosubsync","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oseiskar%2Fautosubsync/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oseiskar%2Fautosubsync/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oseiskar%2Fautosubsync/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oseiskar","download_url":"https://codeload.github.com/oseiskar/autosubsync/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247190242,"owners_count":20898702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","ffmpeg","machine-learning","python","subtitles"],"created_at":"2024-08-01T20:00:49.487Z","updated_at":"2025-04-04T14:04:02.069Z","avatar_url":"https://github.com/oseiskar.png","language":"Python","readme":"# Automatic subtitle synchronization tool\n\n[![PyPI](https://img.shields.io/pypi/v/autosubsync.svg)](https://pypi.python.org/pypi/autosubsync)\n\nDid you know that hundreds of movies, especially from the 1950s and '60s,\nare now in public domain and available online? Great! Let's download\n_Plan 9 from Outer Space_. As a non-native English speaker, I prefer watching\nmovies with subtitles, which can also be found online for free. However, sometimes\nthere is a problem: the subtitles are not in sync with the movie.\n\nBut fear not. This tool can resynchronize the subtitles without any human input.\nA correction for both shift and playing speed can be found automatically...\n[using \"AI \u0026 machine learning\"](#methods)\n\n## Installation\n\n### macOS / OSX\nPrerequisites: Install [Homebrew](https://brew.sh/) and [pip](https://stackoverflow.com/questions/17271319/how-do-i-install-pip-on-macos-or-os-x). Then install FFmpeg and this package\n\n```\nbrew install ffmpeg\npip install autosubsync\n```\n\n### Linux (Debian \u0026 Ubuntu)\n\nMake sure you have Pip, e.g., `sudo apt-get install python-pip`.\nThen install [FFmpeg](https://www.ffmpeg.org/) and this package\n```\nsudo apt install ffmpeg\nsudo apt install libsndfile1 # sometimes optional\nsudo pip install autosubsync\n```\n\nThe `libsndfile1` is sometimes but not always needed due to https://github.com/bastibe/python-soundfile/issues/258.\n\n## Usage\n\n```\nautosubsync [input movie] [input subtitles] [output subs]\n\n# for example\nautosubsync plan-9-from-outer-space.avi \\\n  plan-9-out-of-sync-subs.srt \\\n  plan-9-subtitles-synced.srt\n```\nSee `autosubsync --help` for more details.\n\n## Features\n\n * Automatic speed and shift correction\n * Typical synchronization accuracy ~0.15 seconds (see [performance](#performance))\n * Wide video format support through [ffmpeg](https://www.ffmpeg.org/)\n * Supports all reasonably encoded SRT files in any language\n * Should work with any language in the audio (only tested with a few though)\n * Quality-of-fit metric for checking sync success\n * Python API. Example (save as `batch_sync.py`):\n\n    ```python\n    \"Batch synchronize video files in a folder: python batch_sync.py /path/to/folder\"\n\n    import autosubsync\n    import glob, os, sys\n\n    if __name__ == '__main__':\n        for video_file in glob.glob(os.path.join(sys.argv[1], '*.mp4')):\n            base = video_file.rpartition('.')[0]\n            srt_file = base + '.srt'\n            synced_srt_file = base + '_synced.srt'\n\n            # see help(autosubsync.synchronize) for more details\n            autosubsync.synchronize(video_file, srt_file, synced_srt_file)\n    ```\n\n## Development\n\n### Training the model\n\n 1. Collect a bunch of well-synchronized video and subtitle files and put them\n    in a file called `training/sources.csv` (see `training/sources.csv.example`)\n 2. Run (and see) `train_and_test.sh`. This\n    * populates the `training/data` folder\n    * creates `trained-model.bin`\n    * runs cross-validation\n\n### Synchronization (predict)\n\nAssumes trained model is available as `trained-model.bin`\n\n    python3 autosubsync/main.py input-video-file input-subs.srt synced-subs.srt\n\n### Build and distribution\n\n * Create virtualenv: `python3 -m venv venvs/test-python3`\n * Activate venv: `source venvs/test-python3/bin/activate`\n * `pip install -e .`\n * `pip install wheel`\n * `python setup.py bdist_wheel`\n\n## Methods\n\nThe basic idea is to first detect speech on the audio track, that is, for each\npoint in time, _t_, in the film, to estimate if speech is heard. The method\n[described below](#speech-detection) produces this estimate as a probability\nof speech _p(t)_.\nAnother input to the program is the unsynchronized subtitle file containing the\ntimestamps of the actual subtitle intervals.\n\nSynchronization is done by finding a time transformation _t_ → _f(t)_ that\nmakes _s(f(t))_, the synchronized subtitles, best [match](#loss-function),\n_p(t)_, the detected speech. Here _s(t)_  is the (unsynchronized) subtitle\nindicator function whose value is 1 if any subtitles are visible at time _t_\nand 0 otherwise.\n\n### Speech detection (VAD)\n\n[Speech detection][4] is done by first computing a [spectrogram][2] of the audio,\nthat is, a matrix of features, where each column corresponds to a frame of\nduration _Δt_ and each row a certain frequency band. Additional features are\nengineered by computing a rolling maximum of the spectrogram with a few\ndifferent periods.\n\nUsing a collection of correctly synchronized media files, one can create a\ntraining data set, where the each feature column is associated with a correct\nlabel. This allows training a machine learning model to predict the labels, that\nis, detect speech, on any previously unseen audio track - as the probability of\nspeech _p_(_iΔt_) on frame number _i_.\n\nThe weapon of choice in this project is [logistic regression][3], a common\nbaseline method in machine learning, which is simple to implement.\nThe accuracy of speech detection achieved with this model is not very good, only\naround 72% (AURoC). However, the speech detection results are not the final\noutput of this program but just an input to the synchronization parameter\nsearch. As mentioned in the [performance](#performance) section, the overall\n_synchronization accuracy_ is quite fine even though the speech detection\nis not.\n\n### Synchronization parameter search\n\nThis program only searches for linear transformations of the form\n_f_(_t_) = _a t + b_, where _b_ is shift and _a_ is speed correction.\nThe optimization method is brute force grid search where _b_ is limited to a\ncertain range and _a_ is one of the [common skew factors](#speed-correction).\nThe parameters minimizing the loss function are selected.\n\n### Loss function\n\nThe data produced by the speech detection phase is a vector representing the\nspeech probabilities in frames of duration _Δt_. The metric used for evaluating\nmatch quality is expected linear loss:\n\n\u0026nbsp; \u0026nbsp; loss(_f_) = Σ\u003csub\u003e_i_\u003c/sub\u003e _s_(_f\u003csub\u003ei\u003c/sub\u003e_)\n(1 - _p\u003csub\u003ei\u003c/sub\u003e_) + (1 - _s_(_f\u003csub\u003ei\u003c/sub\u003e_)) _p\u003csub\u003ei\u003c/sub\u003e_,\n\nwhere _p\u003csub\u003ei\u003c/sub\u003e_ = _p_(_iΔt_) is the probability of speech and\n_s_(_f\u003csub\u003ei\u003c/sub\u003e_) = _s_(_f_(_iΔt_)) = _s_(_a iΔt + b_) is the subtitle\nindicator resynchronized using the transformation _f_ at frame number _i_.\n\n### Speed correction\n\nSpeed/skew detection is based on the assumption that an error in playing speed\nis not an arbitrary number but caused by frame rate mismatch, which constraints\nthe possible playing speed multiplier to be ratio of two common frame rates\nsufficiently close to one. In particular, it must be one of the following values\n\n * 24/23.976 = 30/29.97 = 60/59.94 = 1001/1000\n * 25/24\n * 25/23.976\n\nor the reciprocal (1/x).\n\nThe reasoning behind this is that if the frame rate of (digital) video footage\nneeds to be changed and the target and source frame rates are close enough,\nthe conversion is often done by skipping any re-sampling and just changing the\nnominal frame rate. This effectively changes the playing speed of the video\nand the pitch of the audio by a small factor which is the ratio\nof these frame rates.\n\n### Performance\n\nBased on somewhat limited testing, the typical shift error in auto-synchronization\nseems to be around 0.15 seconds (cross-validation RMSE) and generally below 0.5\nseconds. In other words, it seems to work well enough in most cases but could be\nbetter. [Speed correction](#speed-correction) errors did not occur.\n\nAuto-syncing a full-length movie currently takes about 3 minutes and utilizes\naround 1.5 GB of RAM.\n\n## References\n\nI first checked Google if someone had already tried to solve the same problem and found\n[this great blog post][1] whose author had implemented a solution using more or less the\nsame approach that I had in mind. The post also included good points that I had not realized,\nsuch as using correctly synchronized subtitles as training data for speech detection.\n\nInstead of starting from the code linked in that blog post I decided to implement my\nown version from scratch, since this might have been a good application for trying out\nRNNs, which turned out to be unnecessary, but this was a nice project nevertheless.\n\n  [1]: https://albertosabater.github.io/Automatic-Subtitle-Synchronization/\n  [2]: https://en.wikipedia.org/wiki/Spectrogram\n  [3]: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html\n  [4]: https://en.wikipedia.org/wiki/Voice_activity_detection\n\n### Other similar projects\n\n * https://github.com/tympanix/subsync Apparently based on the blog post above, looks good\n * https://github.com/smacke/subsync Newer project, uses WebRTC VAD\n    (instead of DIY machine learning) for speech detection\n * https://github.com/Koenkk/PyAMC/blob/master/autosubsync.py\n * https://github.com/pulasthi7/AutoSubSync-old \u0026 https://github.com/pulasthi7/AutoSubSync (looks inactive)\n","funding_links":[],"categories":["HarmonyOS"],"sub_categories":["Windows Manager"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foseiskar%2Fautosubsync","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foseiskar%2Fautosubsync","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foseiskar%2Fautosubsync/lists"}