{"id":13586288,"url":"https://github.com/readbeyond/aeneas","last_synced_at":"2025-05-14T09:07:28.110Z","repository":{"id":31852134,"uuid":"35419406","full_name":"readbeyond/aeneas","owner":"readbeyond","description":"aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)","archived":false,"fork":false,"pushed_at":"2024-06-22T01:35:46.000Z","size":30537,"stargazers_count":2645,"open_issues_count":73,"forks_count":249,"subscribers_count":70,"default_branch":"master","last_synced_at":"2025-05-14T09:06:55.734Z","etag":null,"topics":["alignment","audio","cli","dtw","espeak","espeak-ng","festival","ffmpeg","forced-alignment","linux","macos","nlp","python","smil","speech","srt","text","text-to-speech","tts","windows"],"latest_commit_sha":null,"homepage":"http://www.readbeyond.it/aeneas/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/readbeyond.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-05-11T11:19:46.000Z","updated_at":"2025-05-14T03:21:59.000Z","dependencies_parsed_at":"2024-09-25T00:30:31.936Z","dependency_job_id":null,"html_url":"https://github.com/readbeyond/aeneas","commit_stats":{"total_commits":190,"total_committers":8,"mean_commits":23.75,"dds":0.05789473684210522,"last_synced_commit":"4d200a050690903b30b3d885b44714fecb23f18a"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/readbeyond%2Faeneas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/readbeyond%2Faeneas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/readbeyond%2Faeneas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/readbeyond%2Faeneas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/readbeyond","download_url":"https://codeload.github.com/readbeyond/aeneas/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254110374,"owners_count":22016391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","audio","cli","dtw","espeak","espeak-ng","festival","ffmpeg","forced-alignment","linux","macos","nlp","python","smil","speech","srt","text","text-to-speech","tts","windows"],"created_at":"2024-08-01T15:05:27.115Z","updated_at":"2025-05-14T09:07:23.084Z","avatar_url":"https://github.com/readbeyond.png","language":"Python","funding_links":[],"categories":["Python","语音合成","Subtitles"],"sub_categories":["资源传输下载"],"readme":"# aeneas\n\n**aeneas** is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).\n\n* Version: 1.7.3\n* Date: 2017-03-15\n* Developed by: [ReadBeyond](http://www.readbeyond.it/)\n* Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/)\n* License: the GNU Affero General Public License Version 3 (AGPL v3)\n* Contact: [aeneas@readbeyond.it](mailto:aeneas@readbeyond.it)\n* Quick Links: [Home](http://www.readbeyond.it/aeneas/) - [GitHub](https://github.com/readbeyond/aeneas/) - [PyPI](https://pypi.python.org/pypi/aeneas/) - [Docs](http://www.readbeyond.it/aeneas/docs/) - [Tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html) - [Benchmark](https://readbeyond.github.io/aeneas-benchmark/) - [Mailing List](https://groups.google.com/d/forum/aeneas-forced-alignment) - [Web App](http://aeneasweb.org)\n\n\n## Goal\n\n**aeneas** automatically generates a **synchronization map**\nbetween a list of text fragments\nand an audio file containing the narration of the text.\nIn computer science this task is known as\n(automatically computing a) **forced alignment**.\n\nFor example, given\n[this text file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml)\nand\n[this audio file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3),\n**aeneas** determines, for each fragment, the corresponding time interval in the audio file:\n\n```\n1                                                     =\u003e [00:00:00.000, 00:00:02.640]\nFrom fairest creatures we desire increase,            =\u003e [00:00:02.640, 00:00:05.880]\nThat thereby beauty's rose might never die,           =\u003e [00:00:05.880, 00:00:09.240]\nBut as the riper should by time decease,              =\u003e [00:00:09.240, 00:00:11.920]\nHis tender heir might bear his memory:                =\u003e [00:00:11.920, 00:00:15.280]\nBut thou contracted to thine own bright eyes,         =\u003e [00:00:15.280, 00:00:18.800]\nFeed'st thy light's flame with self-substantial fuel, =\u003e [00:00:18.800, 00:00:22.760]\nMaking a famine where abundance lies,                 =\u003e [00:00:22.760, 00:00:25.680]\nThy self thy foe, to thy sweet self too cruel:        =\u003e [00:00:25.680, 00:00:31.240]\nThou that art now the world's fresh ornament,         =\u003e [00:00:31.240, 00:00:34.400]\nAnd only herald to the gaudy spring,                  =\u003e [00:00:34.400, 00:00:36.920]\nWithin thine own bud buriest thy content,             =\u003e [00:00:36.920, 00:00:40.640]\nAnd tender churl mak'st waste in niggarding:          =\u003e [00:00:40.640, 00:00:43.640]\nPity the world, or else this glutton be,              =\u003e [00:00:43.640, 00:00:48.080]\nTo eat the world's due, by the grave and thee.        =\u003e [00:00:48.080, 00:00:53.240]\n```\n\n![Waveform with aligned labels, detail](wiki/align.png)\n\nThis synchronization map can be output to file\nin several formats, depending on its application:\n\n* research: Audacity (AUD), ELAN (EAF), TextGrid;\n* digital publishing: SMIL for EPUB 3;\n* closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT);\n* Web: JSON;\n* further processing: CSV, SSV, TSV, TXT, XML.\n\n\n## System Requirements, Supported Platforms and Installation\n\n### System Requirements\n\n1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)\n2. [Python](https://python.org/) 2.7 (Linux, OS X, Windows) or 3.5 or later (Linux, OS X)\n3. [FFmpeg](https://www.ffmpeg.org/)\n4. [eSpeak](http://espeak.sourceforge.net/)\n5. Python packages `BeautifulSoup4`, `lxml`, and `numpy`\n6. Python headers to compile the Python C/C++ extensions (optional but strongly recommended)\n7. A shell supporting UTF-8 (optional but strongly recommended)\n\n### Supported Platforms\n\n**aeneas** has been developed and tested on **Debian 64bit**,\nwith **Python 2.7** and **Python 3.5**,\nwhich are the **only supported platforms** at the moment.\nNevertheless, **aeneas** has been confirmed to work on\nother Linux distributions, Mac OS X, and Windows.\nSee the\n[PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md)\nfor details.\n\nIf installing **aeneas** natively on your OS proves difficult,\nyou are strongly encouraged to use\n[aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant),\nwhich provides **aeneas** inside a virtualized Debian image\nrunning under\n[VirtualBox](https://www.virtualbox.org/)\nand\n[Vagrant](http://www.vagrantup.com/),\nwhich can be installed on any modern OS (Linux, Mac OS X, Windows).\n\n### Installation\n\nAll-in-one installers are available for Mac OS X and Windows,\nand a Bash script for deb-based Linux distributions (Debian, Ubuntu)\nis provided in this repository.\nIt is also possible to download a VirtualBox+Vagrant virtual machine.\nPlease see the\n[INSTALL file](https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md)\nfor detailed, step-by-step installation procedures for different operating systems.\n\nThe generic OS-independent procedure is simple:\n\n1. **Install**\n   [Python](https://python.org/) (2.7.x preferred),\n   [FFmpeg](https://www.ffmpeg.org/), and\n   [eSpeak](http://espeak.sourceforge.net/)\n\n2. Make sure the following **executables** can be called from your **shell**:\n   `espeak`, `ffmpeg`, `ffprobe`, `pip`, and `python`\n\n3. First install `numpy` with `pip` and then `aeneas` (this order is important):\n\n    ```bash\n    pip install numpy\n    pip install aeneas\n    ```\n\n4. To **check** whether you installed **aeneas** correctly, run:\n\n   ```bash\n    python -m aeneas.diagnostics\n    ```\n\n\n## Usage\n\n1. Run without arguments to get the **usage message**:\n\n    ```bash\n    python -m aeneas.tools.execute_task\n    python -m aeneas.tools.execute_job\n    ```\n\n    You can also get a list of **live examples**\n    that you can immediately run on your machine\n    thanks to the included files:\n\n    ```bash\n    python -m aeneas.tools.execute_task --examples\n    python -m aeneas.tools.execute_task --examples-all\n    ```\n\n2. To **compute a synchronization map** `map.json` for a pair\n   (`audio.mp3`, `text.txt` in\n   [plain](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN)\n   text format), you can run:\n\n    ```bash\n    python -m aeneas.tools.execute_task \\\n        audio.mp3 \\\n        text.txt \\\n        \"task_language=eng|os_task_file_format=json|is_text_type=plain\" \\\n        map.json\n    ```\n\n   (The command has been split into lines with `\\` for visual clarity;\n   in production you can have the entire command on a single line\n   and/or you can use shell variables.)\n\n   To **compute a synchronization map** `map.smil` for a pair\n   (`audio.mp3`,\n   [page.xhtml](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED)\n   containing fragments marked by `id` attributes like `f001`),\n   you can run:\n\n    ```bash\n    python -m aeneas.tools.execute_task \\\n        audio.mp3 \\\n        page.xhtml \\\n        \"task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric\" \\\n        map.smil\n    ```\n\n   As you can see, the third argument (the _configuration string_)\n   specifies the parameters controlling the I/O formats\n   and the processing options for the task.\n   Consult the\n   [documentation](http://www.readbeyond.it/aeneas/docs/)\n   for details.\n\n3. If you have several tasks to process,\n   you can create a **job container**\n   to batch process them:\n\n    ```bash\n    python -m aeneas.tools.execute_job job.zip output_directory\n    ```\n\n   File `job.zip` should contain a `config.txt` or `config.xml`\n   configuration file, providing **aeneas**\n   with all the information needed to parse the input assets\n   and format the output sync map files.\n   Consult the\n   [documentation](http://www.readbeyond.it/aeneas/docs/)\n   for details.\n\nThe\n[documentation](http://www.readbeyond.it/aeneas/docs/)\ncontains a highly suggested\n[tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html)\nwhich explains how to use the built-in command line tools.\n\n\n## Documentation and Support\n\n* Documentation:\n  [http://www.readbeyond.it/aeneas/docs/](http://www.readbeyond.it/aeneas/docs/)\n* Command line tools tutorial:\n  [http://www.readbeyond.it/aeneas/docs/clitutorial.html](http://www.readbeyond.it/aeneas/docs/clitutorial.html)\n* Library tutorial:\n  [http://www.readbeyond.it/aeneas/docs/libtutorial.html](http://www.readbeyond.it/aeneas/docs/libtutorial.html)\n* Old, verbose tutorial:\n  [A Practical Introduction To The aeneas Package](http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html)\n* Mailing list:\n  [https://groups.google.com/d/forum/aeneas-forced-alignment](https://groups.google.com/d/forum/aeneas-forced-alignment)\n* Changelog:\n  [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.readbeyond.it/aeneas/docs/changelog.html)\n* High level description of how aeneas works:\n  [HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md)\n* Development history:\n  [HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md)\n* Testing:\n  [TESTING](https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md)\n* Benchmark suite:\n  [https://readbeyond.github.io/aeneas-benchmark/](https://readbeyond.github.io/aeneas-benchmark/)\n\n\n## Supported Features\n\n* Input text files in `parsed`, `plain`, `subtitles`, or `unparsed` (XML) format\n* Multilevel input text files in `mplain` and `munparsed` (XML) format\n* Text extraction from XML (e.g., XHTML) files using `id` and `class` attributes\n* Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)\n* Input audio file formats: all those readable by `ffmpeg`\n* Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML\n* Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR\n* MFCC and DTW computed via Python C extensions to reduce the processing time\n* Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API\n* Default TTS (eSpeak) called via a Python C extension for fast audio synthesis\n* Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)\n* Batch processing of multiple audio/text pairs\n* Download audio from a YouTube video\n* In multilevel mode, recursive alignment from paragraph to sentence to word level\n* In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and TTS engine can be specified for each level independently\n* Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes\n* Adjustable splitting times, including a max character/second constraint for CC applications\n* Automated detection of audio head/tail\n* Output an HTML file for fine tuning the sync map manually (`finetuneas` project)\n* Execution parameters tunable at runtime\n* Code suitable for Web app deployment (e.g., on-demand cloud computing instances)\n* Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release\n\n\n## Limitations and Missing Features\n\n* Audio should match the text: large portions of spurious text or audio might produce a wrong sync map\n* Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications\n* No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM =\u003e max 2h audio; 16 GB RAM =\u003e max 10h audio)\n* [Open issues](https://github.com/readbeyond/aeneas/issues)\n\n### A Note on Word-Level Alignment\n\nA significant number of users runs **aeneas** to align audio and text\nat word-level (i.e., each fragment is a word).\nAlthough **aeneas** was not designed with word-level alignment in mind\nand the results might be inferior to\n[ASR-based forced aligners](https://github.com/pettarin/forced-alignment-tools)\nfor languages with good ASR models,\n**aeneas** offers some options to improve\nthe quality of the alignment at word-level:\n\n* multilevel text (since v1.5.1),\n* MFCC nonspeech masking (since v1.7.0, disabled by default),\n* use better TTS engines, like Festival or AWS/Nuance TTS API (since v1.5.0).\n\nIf you use the ``aeneas.tools.execute_task`` command line tool,\nyou can add ``--presets-word`` switch to enable MFCC nonspeech masking, for example:\n\n```bash\n$ python -m aeneas.tools.execute_task --example-words --presets-word\n$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word\n```\n\nIf you use **aeneas** as a library, just set the appropriate\n``RuntimeConfiguration`` parameters.\nPlease see the\n[command line tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html)\nfor details.\n\n\n## License\n\n**aeneas** is released under the terms of the\nGNU Affero General Public License Version 3.\nSee the\n[LICENSE file](https://github.com/readbeyond/aeneas/blob/master/LICENSE) for details.\n\nLicenses for third party code and files included in **aeneas**\ncan be found in the\n[licenses](https://github.com/readbeyond/aeneas/blob/master/licenses/README.md) directory.\n\nNo copy rights were harmed in the making of this project.\n\n\n## Supporting and Contributing\n\n### Sponsors\n\n* **July 2015**: [Michele Gianella](https://plus.google.com/+michelegianella/about) generously supported the development of the boundary adjustment code (v1.0.4)\n\n* **August 2015**: [Michele Gianella](https://plus.google.com/+michelegianella/about) partially sponsored the port of the MFCC/DTW code to C (v1.1.0)\n\n* **September 2015**: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0)\n\n* **October 2015**: an anonymous donation sponsored the development of the \"YouTube downloader\" option (v1.3.0)\n\n* **April 2016**: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0\n\n* **December 2016**: the [Centro Internazionale Del Libro Parlato \"Adriano Sernagiotto\"](http://www.libroparlato.org/) (Feltre, Italy) partially sponsored the development of the v1.7 series\n\n### Supporting\n\nWould you like supporting the development of **aeneas**?\n\nI accept sponsorships to\n\n* fix bugs,\n* add new features,\n* improve the quality and the performance of the code,\n* port the code to other languages/platforms, and\n* improve the documentation.\n\nFeel free to\n[get in touch](mailto:aeneas@readbeyond.it).\n\n### Contributing\n\nIf you think you found a bug\nor you have a feature request,\nplease use the\n[GitHub issue tracker](https://github.com/readbeyond/aeneas/issues)\nto submit it.\n\nIf you want to ask a question\nabout using **aeneas**,\nyour best option consists in sending an email to the\n[mailing list](https://groups.google.com/d/forum/aeneas-forced-alignment).\n\nFinally, code contributions are welcome!\nPlease refer to the\n[Code Contribution Guide](https://github.com/readbeyond/aeneas/blob/master/wiki/CONTRIBUTING.md)\nfor details about the branch policies and the code style to follow.\n\n\n## Acknowledgments\n\nMany thanks to **Nicola Montecchio**,\nwho suggested using MFCCs and DTW,\nand co-developed the first experimental code\nfor aligning audio and text.\n\n**Paolo Bertasi**, who developed the\nAPIs and Web application for ReadBeyond Sync,\nhelped shaping the structure of this package\nfor its asynchronous usage.\n\n**Chris Hubbard** prepared the files for\npackaging aeneas as a Debian/Ubuntu `.deb`.\n\n**Daniel Bair** prepared the `brew` formula\nfor installing **aeneas** and its dependencies on Mac OS X.\n\n**Daniel Bair**, **Chris Hubbard**, and **Richard Margetts**\npackaged the installers for Mac OS X and Windows.\n\n**Firat Ozdemir** contributed the `finetuneas`\nHTML/JS code for fine tuning sync maps in the browser.\n\n**Willem van der Walt** contributed the code snippet\nto output a sync map in TextGrid format.\n\n**Chris Vaughn** contributed the MacOS TTS wrapper.\n\nAll the mighty\n[GitHub contributors](https://github.com/readbeyond/aeneas/graphs/contributors),\nand the members of the\n[Google Group](https://groups.google.com/d/forum/aeneas-forced-alignment).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freadbeyond%2Faeneas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Freadbeyond%2Faeneas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freadbeyond%2Faeneas/lists"}