{"id":13516203,"url":"https://github.com/pettarin/forced-alignment-tools","last_synced_at":"2026-02-14T20:48:24.946Z","repository":{"id":37451188,"uuid":"64680442","full_name":"pettarin/forced-alignment-tools","owner":"pettarin","description":"A collection of links and notes on forced alignment tools","archived":false,"fork":false,"pushed_at":"2021-11-10T13:47:24.000Z","size":31,"stargazers_count":871,"open_issues_count":6,"forks_count":86,"subscribers_count":38,"default_branch":"master","last_synced_at":"2024-11-01T20:36:20.465Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pettarin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-08-01T15:50:38.000Z","updated_at":"2024-11-01T20:29:14.000Z","dependencies_parsed_at":"2022-08-02T12:07:16.003Z","dependency_job_id":null,"html_url":"https://github.com/pettarin/forced-alignment-tools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pettarin%2Fforced-alignment-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pettarin%2Fforced-alignment-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pettarin%2Fforced-alignment-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pettarin%2Fforced-alignment-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pettarin","download_url":"https://codeload.github.com/pettarin/forced-alignment-tools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246429459,"owners_count":20775805,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T05:01:20.247Z","updated_at":"2026-02-14T20:48:24.917Z","avatar_url":"https://github.com/pettarin.png","language":"Python","readme":"# forced-alignment-tools\n\nA collection of links and notes on forced alignment tools\n\n* Version: 1.0.9\n* Date: 2018-07-10\n* Author: [Alberto Pettarin](http://www.albertopettarin.it/) ([contact](http://www.albertopettarin.it/contact.html))\n* License: [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/legalcode)\n\nDid I miss an aligner? Please open an issue or directly fork-commit-pullrequest.\n\n## Definition of Forced Alignment\n\nGiven an audio file containing speech,\nand the corresponding transcript,\ncomputing a **forced alignment** is the process of\ndetermining, for each fragment of the transcript,\nthe **time interval** (in the audio file)\ncontaining the spoken text of the fragment.\n\nA text fragment can have arbitrary granularity:\n\n* a paragraph,\n* a sentence,\n* a portion of a sentence (i.e., a group of words),\n* a word, or\n* a phoneme (i.e., a single sound).\n\nFor example, given\n[this text file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml)\nand\n[this audio file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3),\na force aligment at verse-level can be the following:\n\n```\n1                                                     =\u003e [00:00:00.000, 00:00:02.640]\nFrom fairest creatures we desire increase,            =\u003e [00:00:02.640, 00:00:05.880]\nThat thereby beauty's rose might never die,           =\u003e [00:00:05.880, 00:00:09.240]\nBut as the riper should by time decease,              =\u003e [00:00:09.240, 00:00:11.920]\nHis tender heir might bear his memory:                =\u003e [00:00:11.920, 00:00:15.280]\n...\nPity the world, or else this glutton be,              =\u003e [00:00:43.640, 00:00:48.080]\nTo eat the world's due, by the grave and thee.        =\u003e [00:00:48.080, 00:00:53.240]\n```\n\nTypical **applications** of forced alignment include\n[Audio-eBooks](https://www.readbeyond.it/audioebooks.html),\n[closed captioning](https://en.wikipedia.org/wiki/Closed_captioning),\nand automating the creation of training data\nfor automated speech recognition systems.\n\n\n## Programs and Libraries\n\nThe following matrix contains **open source** programs and libraries\nfor computing forced alignments\nthat have been actually **proven to install and run**\n(albeit the installation procedure for some of them is pretty complex).\n\nAll tools, except **aeneas**, are based on speech recognition algorithms;\nall tools, except **aeneas** and **gentle**,\nare maintained by research groups or individuals in academia.\n\nMost tools are based on the [HTK](http://htk.eng.cam.ac.uk/),\nwhich is not free for commercial purposes,\nalthough a commercial license can be purchased\nfrom the University of Cambridge.\n\nYou can also download the [raw data file in JSON format](data.json).\n\n| Name | Algorithm | Supported Language(s) | Interface | Code Language(s) | License | Documentation | Mailing List/Forum | Active | Notes |\n| ---- | --------- | --------------------- | --------- | ---------------- | ------- | ------------- | ------------------ | ------ | ----- |\n| [aeneas](https://www.readbeyond.it/aeneas/) | DTW | 30+ | CLI, LIB, Web | Python, C | AGPL | Y | Y | Y | Not based on ASR |\n| [CMU Sphinx](http://cmusphinx.sourceforge.net/) | HMM (own), RNN | 11 | CLI, LIB | C, Java, Python | MIT-like | Y | Y | Y |  |\n| [DARLA](http://darla.dartmouth.edu/cave) | HMM (HTK) | English | Web | ? | ? | Y | N | N? | Based on Prosodylab-Aligner or YouTube ASR |\n| [FAVE-align](https://github.com/JoFrhwld/FAVE/) | HMM (HTK) | English | CLI, (Web) | Python | GPL | Y | Y | Y | acustic models from P2FA; GitHub code updated more frequently than Web |\n| [Gentle](https://lowerquality.com/gentle/) | HMM (Kaldi) | English | CLI, Web | Python | MIT | N | N | Y | Based on Kaldi |\n| [Julius](http://julius.osdn.jp/en_index.php) | HMM (own) | English, Japanese | CLI, LIB | C | MIT-like | Y | Y | N? |  |\n| [Kaldi](http://kaldi-asr.org/) | HMM (own), DNN, RNN | English | CLI, LIB | C++ | Apache | Y | Y | Y | CUDA support |\n| [kaldi-dnn-ali-gop](https://github.com/tbright17/kaldi-dnn-ali-gop) | HMM(Kaldi), DNN(Kaldi nnet3) | English | CLI, LIB | Shell Script, C++, Python | GPL | N | N | Y | Work with other languages given kaldi acoustic models |\n| [LaBB-CAT](http://labbcat.sourceforge.net/) | HMM (HTK) | English | Web | Java | GPL | Y | Y | Y |  |\n| [MAUS](https://www.phonetik.uni-muenchen.de/forschung/Verbmobil/VM14.7eng.html) | HMM (HTK) | 21 | CLI, Web | C | All rights reserved | README | Y | Y |  |\n| [Montreal Forced Aligner](https://montrealcorpustools.github.io/Montreal-Forced-Aligner/) | HMM (Kaldi) | English | CLI | Python | MIT | Y | N | Y | Can train other languages |\n| [Penn Forced Aligner (P2FA)](https://www.ling.upenn.edu/phonetics/old_website_2015/p2fa/) | HMM (HTK) | English | CLI, Web | Python | ? | README, Tutorial | N | N? |  |\n| [Prosodylab-Aligner](http://prosodylab.org/tools/aligner/) | HMM (HTK) | English | CLI | Python | MIT | README, Tutorial | N | Y | Can train other languages |\n| [SailAlign](https://github.com/nassosoassos/sail_align) | HMM (HTK) | English, Greek, Spanish | CLI | Perl | GPL | README | N | N? |  |\n| [SPPAS](http://www.sppas.org/index.html) | HMM (Julius) | 12+ | CLI, GUI | Python | GPL | Y | Y | Y | Can train other language, several plugins |\n\n* AGPL: [GNU Affero General Public License](https://www.gnu.org/licenses/agpl-3.0.html)\n* Apache: [Apache License](http://www.apache.org/licenses/LICENSE-2.0)\n* CLI: command line interface\n* DNN: [Deep Neural Network](https://en.wikipedia.org/wiki/Deep_learning)\n* DTW: [Dynamic Time Warping](https://en.wikipedia.org/wiki/Dynamic_time_warping)\n* GPL: [GNU General Public License](https://www.gnu.org/licenses/gpl.html)\n* GUI: graphical interface\n* HMM: [Hidden Markov Model](https://en.wikipedia.org/wiki/Hidden_Markov_model)\n* LIB: library callable by third party software\n* MFCC: [Mel-frequency Cepstral Coefficients](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)\n* MIT: [MIT License](https://opensource.org/licenses/MIT)\n* RNN: [Recurrent Neural Network](https://en.wikipedia.org/wiki/Recurrent_neural_network)\n* Web: Web-based graphical interface, local and/or remote\n\n## Additional Pointers\n\n* [AZP2FA](https://github.com/myedibleenso/AZP2FA) (fork of P2FA)\n* [Automated Audio Segmentation Using Forced Alignment](http://www.voxforge.org/home/dev/autoaudioseg)\n* [Automatic and Accurate Captioning](http://www.nmsl.cs.ucsb.edu/proj/autocap/) (based on CMU Sphinx)\n* [Berkeley Phonetics Machine](http://linguistics.berkeley.edu/plab/guestwiki/index.php?title=Berkeley_Phonetics_Machine)\n* [Building Acoustic Models using Kaldi Voxforge recipe to obtain word level transcripts for long video files](http://forcedalignment.blogspot.it/2015/06/building-acoustic-models-using-kaldi.html)\n* [DARLA](http://darla.dartmouth.edu/cave)\n* [EasyAlign: phonetic alignment with Praat](http://latlcui.unige.ch/phonetique/easyalign.php)\n* [FAVE-align](http://fave.ling.upenn.edu/) (the Web interface for the Penn Forced Aligner)\n* [FAVE-align](https://github.com/JoFrhwld/FAVE/) (source code)\n* [Forced Alignment Overview (ISIP)](https://www.isip.piconepress.com/projects/speech/software/tutorials/production/fundamentals/v1.0/section_04/s04_04_p01.html)\n* [Forced Alignment and Speech Recognition Systems (Oxford)](http://www.phon.ox.ac.uk/jcoleman/BAAP_ASR.pdf)\n* [Forced Alignment of Spoken Audio](https://www.clarin.eu/sites/default/files/Joe_Fruehwald_Oxford_2016.pdf)\n* [Forced Alignment with InproTK (and Sphinx)](http://www.dsg-bielefeld.de/dsg_wp/forced-alignment-with-inprotk-and-sphinx/)\n* [Gentle](https://lowerquality.com/gentle/) (based on Kaldi)\n* [HTKBook](http://htk.eng.cam.ac.uk/docs/docs.shtml) (has a chapter on computing forced alignments with HTK, requires registration)\n* [InproTK](https://bitbucket.org/inpro/inprotk)\n* [Introduction to Speech Analysis with FAVE](https://jofrhwld.github.io/workshop/fave2015.html)\n* [Julius](http://julius.osdn.jp/en_index.php)\n* [Kaldi Forced Alignment](http://pages.jh.edu/~echodro1/tutorial/kaldi/kaldi-forcedalignment.html)\n* [Kaldi](http://kaldi-asr.org/)\n* [Korean Phonetic Aligner](http://korean.utsc.utoronto.ca/kpa/) (Web only, Korean only)\n* [LaBB-CAT](http://labbcat.sourceforge.net/)\n* [Long Audio Aligner Landed in Trunk (Sphinx)](http://cmusphinx.sourceforge.net/2014/07/long-audio-aligner-landed-in-trunk/)\n* [MAUS](https://www.phonetik.uni-muenchen.de/forschung/Verbmobil/VM14.7eng.html)\n* [Montreal Forced Aligner](https://montrealcorpustools.github.io/Montreal-Forced-Aligner/)\n* [Penn Forced Aligner](http://pages.jh.edu/~echodro1/tutorial/pfa/pfa-intro.html)\n* [Penn Forced Aligner](https://www.ling.upenn.edu/phonetics/old_website_2015/p2fa/)\n* [Praatalign: an interactive Praat plug-in for performing phonetic forced alignment](https://github.com/dopefishh/praatalign)\n* [ProsodyLab-Aligner](http://prosodylab.org/tools/aligner/)\n* [Robust Automatic Transcription of Speech (RATS)](http://opencatalog.darpa.mil/RATS.html)\n* [SPPAS Automatic Annotation of Speech](http://www.sppas.org/index.html) (based on Julius)\n* [Simple English Forced Alignment (UPenn LING521)](http://www.ling.upenn.edu/courses/ling521/NewAligner1a.html)\n* [VoxForge](http://www.voxforge.org/)\n* [WebMAUS](https://clarin.phonetik.uni-muenchen.de/BASWebServices/index.html#/services/WebMAUSBasic) (the Web interface for MAUS)\n* [What is forced alignment? (ICSI)](http://www1.icsi.berkeley.edu/Speech/faq/forcedalign.html)\n* [What is forced alignment? (VoxForge)](http://www.voxforge.org/home/docs/faq/faq/what-is-forced-alignment))\n* [aeneas](https://www.readbeyond.it/aeneas/)\n* [speech.zone](http://www.speech.zone/)\n\n\n\n","funding_links":[],"categories":["Technical","Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpettarin%2Fforced-alignment-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpettarin%2Fforced-alignment-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpettarin%2Fforced-alignment-tools/lists"}