{"id":18439584,"url":"https://github.com/idiap/apam","last_synced_at":"2025-09-23T07:33:18.949Z","repository":{"id":144961364,"uuid":"335578798","full_name":"idiap/apam","owner":"idiap","description":"APAM toolkit is built on PyTorch and provides recipes to adapt pretrained acoustic models with a variety of sequence discriminative training criterions.","archived":false,"fork":false,"pushed_at":"2021-02-15T11:01:27.000Z","size":40,"stargazers_count":14,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-23T01:02:36.217Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idiap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-03T09:55:51.000Z","updated_at":"2023-06-05T06:38:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"2fd7bc6e-51bf-4779-b13d-27249c77c4f7","html_url":"https://github.com/idiap/apam","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fapam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fapam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fapam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fapam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idiap","download_url":"https://codeload.github.com/idiap/apam/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247732740,"owners_count":20986913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T06:25:37.586Z","updated_at":"2025-09-23T07:33:13.873Z","avatar_url":"https://github.com/idiap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# APAM - Adaptation of Pretrained Acoustic Models\n\nAPAM toolkit is built on PyTorch and provides recipes to adapt pretrained\nacoustic models with a variety of sequence discriminative training criterions.\n\n------------------------------------\n\nTable of Contents\n------------------------------------\n\n\u003c!--ts--\u003e\n   * [Table of contents](#table-of-contents)\n   * [Introduction](#introduction)\n   * [High-Level Library Structure](#high-level-library-structure)\n   * [Installation](#installation)\n       * [Dependencies](#dependencies)\n   * [Pretrained Models Supported](#pretrained-models-supported)\n       * [Masked Acoustic Model](#masked-acoustic-model)\n   * [Current Recipes](#current-recipes)\n       * [Librispeech 100h](#librispeech-100h)\n   * [References](#references)\n   * [Citation](#citation)\n\u003c!--te--\u003e\n\n\n------------------------------------\nIntroduction\n------------------------------------\nThe library structure is inspired from the [S3PRL\nlibrary](https://github.com/s3prl/s3prl/). In keeping up with the terminology\nin S3PRL, the pretrained models are referred to as *upstream* models. A\nseparate *downstream* model is added on the top of *upstream* model to be used\nas acoustic model for ASR training. \n\n\n------------------------------------\nHigh-Level Library Structure\n------------------------------------\nThe library provides various runners (trainers) that take care of training\nacoustic models.\n\nThe runner takes as input:\n\n*asr_config*: which defines parameters related to experiment such as learning\nrate, optimizers, epochs etc. \n*ckpt*: path to pretrained model ckpt \n*upconfig* configuration related to pretrained *upstream* model.\n*get_model* function which create the *upstream* and *downstream* model using\nthe above parameters\n\n\nThe idea is to re-use the various pretrained models such as TERA, wav2vec \nthrough decoupled *upstream* and *downstream* models. This is enabled by writing\nsimple scripts to load these pretrained models. Examples for these can be found \nin *pretrained* folder in the source code.\n\n\n------------------------------------\nInstallation\n------------------------------------\n\n### Dependencies\n- **Python** 3 or above\n- Required packages and their use are listed below:\n```\ntorch                        # deep neural networks\npytorch-fast-transformers    # fast clustered attention\npkwrap                       # lfmmi loss\nlibrosa                      # audio file reading\nyaml                         # config parser\n```\n\nWe recommend installing the latest version of [fast\ntransformers](https://github.com/idiap/fast-transformers) using the following\ncommand:\n```\npip install git+https://github.com/idiap/fast-transformers\n```\n\nTo install Pkwrap follow the instructions here\n[Pkwrap](https://github.com/idiap/pkwrap)\n\n\n------------------------------------\nPretrained Models Supported\n------------------------------------\n\nAt the moment we support the following pretrained models \n\n### Masked Acoustic Model\n\nWe provide the pretrained model for trained with masked language modeling\nobjective as described in [\"TERA: Self-Supervised Learning of Transformer\nEncoder Representation for Speech\"](https://arxiv.org/abs/2007.06028).\n\nThe pretrained model is available [here](https://zenodo.org/record/4541045#.YCpThmgzaiw).\n\n------------------------------------\nCurrent Recipes\n------------------------------------\n\nAt the moment, we only support [flat-start lattice-free\nMMI](https://www.danielpovey.com/files/2018_interspeech_end2end.pdf) training.\nThe following recipes can be found in the examples folder. For more details\non how to run, follow the steps in the README files in examples\n\n### Librispeech 100h\nWe provide recipes to train acoustic model using 100 hours of \nlibrispeech data and pretrained acoustic models based on \n\n1. Masked Acoustic Model\n\n\n------------------------------------\nCitation\n------------------------------------\nIf you found this library useful, please cite the relevant work(s) from below\n```bibtex\n@misc{vyas2020latticefree,\n    title={Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models}, \n    author={Apoorv Vyas and Srikanth Madikeri and Hervé Bourlard},\n    year={2020},\n    eprint={2012.14252},\n    archivePrefix={arXiv},\n    primaryClass={cs.LG}\n}\n```\n\n------------------------------------\nReferences\n------------------------------------\n\nPlease note that this list is not exhaustive. We are only providing\nreferences to a few key works which this library uses. For a more exhaustive list\nplease take a look at our published reports based on this library.\n\n\n```bibtex\n@inproceedings{paszke2019pytorch,\n    title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},\n    author = {Paszke, Adam et. al.},\n    booktitle = {Advances in Neural Information Processing Systems 32},\n    year = {2019},\n}\n```\n\n```bibtex\n@article{hadian2018flat,\n    author={Hossein Hadian and others},\n    title={Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR},\n    year={2018},\n    journal={IEEE ACM Transactions on Audio, Speech, and Language Processing},\n}\n```\n\n```bibtex\n@misc{madikeri2020pkwrap,\n    title={Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models}, \n    author={Srikanth Madikeri and Sibo Tong and Juan Zuluaga-Gomez and Apoorv Vyas and Petr Motlicek and Herv{\\'e} Bourlard},\n    year={2020},\n    eprint={2010.03466},\n    archivePrefix={arXiv},\n    primaryClass={eess.AS}\n}\n```\n\n```bibtex\n@inproceedings{vyas2020fast,\n    author = {Vyas, Apoorv and Katharopoulos, Angelos and Fleuret, Fran\\c{c}ois},\n    title = {Fast Transformers with Clustered Attention},\n    booktitle = {Proceedings of the international conference on Neural Information Processing Systems (NeurIPS)},\n    year = {2020}\n}\n```\n\n```bibtex\n@misc{\n    S3PRL,\n    author = {Andy T. Liu and Yang Shu-wen},\n    title = {S3PRL: The Self-Supervised Speech Pre-training and Representation Learning Toolkit},\n    year = {2020},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    url = {https://github.com/s3prl/s3prl}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fapam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidiap%2Fapam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fapam/lists"}