{"id":13434171,"url":"https://github.com/feldberlin/timething","last_synced_at":"2025-03-17T14:30:51.980Z","repository":{"id":37790015,"uuid":"490308568","full_name":"feldberlin/timething","owner":"feldberlin","description":"Timething is a library for aligning text transcripts with their audio recordings.","archived":false,"fork":false,"pushed_at":"2024-12-03T10:09:28.000Z","size":31223,"stargazers_count":114,"open_issues_count":7,"forks_count":11,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-03T17:04:32.446Z","etag":null,"topics":["alignment","audio","cli","forced-alignment","huggingface","nlp","python","speech","speech-recognition","tts"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feldberlin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-05-09T14:04:06.000Z","updated_at":"2025-02-23T01:56:18.000Z","dependencies_parsed_at":"2023-11-12T23:39:41.527Z","dependency_job_id":null,"html_url":"https://github.com/feldberlin/timething","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feldberlin%2Ftimething","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feldberlin%2Ftimething/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feldberlin%2Ftimething/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feldberlin%2Ftimething/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feldberlin","download_url":"https://codeload.github.com/feldberlin/timething/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244050058,"owners_count":20389630,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","audio","cli","forced-alignment","huggingface","nlp","python","speech","speech-recognition","tts"],"created_at":"2024-07-31T02:01:48.549Z","updated_at":"2025-03-17T14:30:51.974Z","avatar_url":"https://github.com/feldberlin.png","language":"Jupyter Notebook","readme":"[![Build](https://github.com/feldberlin/timething/workflows/CI/badge.svg)](https://github.com/feldberlin/timething/actions)\n[![PyPI version](https://badge.fury.io/py/timething.svg)](https://badge.fury.io/py/timething)\n\n# Timething\n\nTimething is a library for aligning text transcripts with audio. You provide\nan audio file, as well as a text file with the complete text transcript.\nTimething will output a list of time-codes for each word and character that\nindicate when this word or letter was spoken in the audio you provided.\nTimething strives to be fast and accurate, and can run on both GPUs or CPUs.\n\nTimething uses powerful Wav2Vec based speech recognition models hosted by the\nHugging Face AI community. The approach is described in this [PyTorch\nTutorial](https://pytorch.org/audio/main/tutorials/forced_alignment_tutorial.html),\nas well as in this [paper](https://arxiv.org/abs/2007.09127).\n\n## Installation\n\nTo install Timething, you'll need an installation of Python 3.7 or 3.8. You\ncan then install it using pip:\n\n```bash\npip install timething\n```\n\n## Aligning recordings and transcripts\n\n### Long form alignment for media content\n\nThere are many cases where you might want to align long audio content with\na corresponding transcript. For example you might want to align a podcast with\nits transcription. The episode might be a few hours long, and the\ntranscription is given by the podcaster. In another case you might want to\nalign an audio book with it's written text.\n\n```\ntimething align-long \\\n    --audio-file fixtures/audio/keanu.mp3 \\\n    --transcript-file fixtures/keanu.cleaned.txt \\\n    --alignments-dir aligned \\\n    --batch-size 10 \\\n    --n-workers 5\n```\n\n### Short form alignment for machine learning\n\nTimething can align a dataset of utterance level audio snippets with their\ntext transcriptions. This is particularly useful in a machine learning\nsetting, where Timething can be used to clean up datasets.\n\nTimething currently expects to find a folder containing one or more chapters\nin the following form:\n\n\n    └── dir/\n        ├── text.csv\n        ├── aligned/\n        └── audio/\n            ├── chapter01.mp3\n            ├── chapter02.mp3\n            └── chapter03.mp3\n\n\nTimething can process many audio formats, including MP3, WAV, FLACC and\nOGG/VORBIS.\n\nThe file `text.csv` should contain one entry per wav file in the following\nformat:\n\n```csv\naudio/chapter01.mp3|The transcript for chapter01 on a single line here\naudio/chapter02.mp3|The transcript for chapter02 on a single line here\naudio/chapter03.mp3|The transcript for chapter03 on a single line here\n```\n\nYou can now run Timething on your CPU or GPU, for example:\n\n```bash\ntimething align-short --metadata text.csv --alignments-dir aligned\n```\n\nYou can also specify more options, e.g.:\n\n```bash\ntimething align-short \\\n  --language german \\\n  --metadata text.csv \\\n  --alignments-dir aligned \\\n  --batch-size 8 \\\n  --n-workers 8\n```\n\nRun `timething --help` for a full description.\n\n### Alignment results\n\nResults will be written into the given folder, e.g. `aligned`. They will be\nwritten into a single json file named after each audio id. Each file will\ncontain the character level and the word level alignments. For word level\nalignments, each word will have the starting time in seconds, the ending time\nin seconds, the confidence level for that word and the word label. Character\nlevel alignments have the corresponding results.\n\nYou can find an example dataset with alignments output in\n[`fixtures/`](https://github.com/feldberlin/timething/blob/main/fixtures).\nHere's what the alignment for \"one.mp3\", which contains only the word \"one\",\nlooks like:\n\n```json\n{\n    \"n_model_frames\": 72,\n    \"n_audio_samples\": 23392,\n    \"sampling_rate\": 16000,\n    \"chars\": [\n        {\n            \"label\": \"O\",\n            \"start\": 0.5888611111111111,\n            \"end\": 0.6497777777777777,\n            \"score\": 0.9999777873357137\n        },\n        {\n            \"label\": \"n\",\n            \"start\": 0.6497777777777777,\n            \"end\": 0.7106944444444444,\n            \"score\": 0.99994424978892\n        },\n        {\n            \"label\": \"e!\",\n            \"start\": 0.7106944444444444,\n            \"end\": 0.731,\n            \"score\": 0.9999799728393555\n        }\n    ],\n    \"chars_cleaned\": [\n        {\n            \"label\": \"o\",\n            \"start\": 0.5888611111111111,\n            \"end\": 0.6497777777777777,\n            \"score\": 0.9999777873357137\n        },\n        {\n            \"label\": \"n\",\n            \"start\": 0.6497777777777777,\n            \"end\": 0.7106944444444444,\n            \"score\": 0.99994424978892\n        },\n        {\n            \"label\": \"e\",\n            \"start\": 0.7106944444444444,\n            \"end\": 0.731,\n            \"score\": 0.9999799728393555\n        }\n    ],\n    \"words\": [\n        {\n            \"label\": \"One!\",\n            \"start\": 0.5888611111111111,\n            \"end\": 0.731,\n            \"score\": 0.9999637263161796\n        }\n    ],\n    \"words_cleaned\": [\n        {\n            \"label\": \"one\",\n            \"start\": 0.5888611111111111,\n            \"end\": 0.731,\n            \"score\": 0.9999637263161796\n        }\n    ]\n}\n```\n\n## Re-cutting recordings\n\nOnce you've run alignment, you can cut your files down to smaller files and\nwrite the results into a new folder. For example, if you don't want any of\nyour recordings to exceed 8 seconds, then you can create a new directory and\nre-cut your data into it like this:\n\n```bash\ntimething recut \\\n  --from-metadata text.csv \\\n  --to-metadata ~/smaller-recordings/text.csv \\\n  --alignments-dir alignments \\\n  --cut-threshold-seconds 8.0\n```\n\nResults in this example are written into ~/smaller-recordings.\n\n## Supported languages\n\nCurrently supported languages can be found [in\nmodels.yaml](https://github.com/feldberlin/timething/blob/main/src/timething/models.yaml).\nThis currently includes English, German, Dutch, Polish, Italian, Portuguese,\nSpanish, French, Russian, Japanese, Greek and Arabic models. We have only\ntested the German model so far.\n\nDue to the large number of CTC speech models available on the Hugging Face AI\ncommunity, new languages can be easily added to Timething. Alternatively,\nWav2Vec can be fine-tuned as described\n[here](https://huggingface.co/blog/fine-tune-wav2vec2-english), using any of\nthe [Common Voice](https://commonvoice.mozilla.org/en/languages) languages, 87\nat the time of writing.\n\nSupport for text cleaning is currently minimal, and may need to be extended\nfor new languages.\n\n## Alternatives\n\nThere are many mature libraries that can already do forced alignment like\nTimething, e.g. the Montreal forced aligner or Aeneas. One list of tools is\nmaintained [here](https://github.com/pettarin/forced-alignment-tools).\n\n## Thanks\n\nThanks to [why do birds](http://www.whydobirds.de) for allowing the initial\nwork on this library to be open sourced.\n","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeldberlin%2Ftimething","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeldberlin%2Ftimething","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeldberlin%2Ftimething/lists"}