{"id":21299844,"url":"https://github.com/ddlbojack/mt4ssl","last_synced_at":"2025-07-11T19:30:36.670Z","repository":{"id":62412314,"uuid":"556641066","full_name":"ddlBoJack/MT4SSL","owner":"ddlBoJack","description":"Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets","archived":false,"fork":false,"pushed_at":"2024-03-25T04:25:01.000Z","size":356,"stargazers_count":38,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-03-25T05:28:57.864Z","etag":null,"topics":["multi-task-learning","self-supervised-learning","speech-recognition","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ddlBoJack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-10-24T08:27:18.000Z","updated_at":"2024-03-06T01:04:29.000Z","dependencies_parsed_at":"2024-03-25T05:28:42.764Z","dependency_job_id":"ab692cde-1b87-446f-969b-7a83af126afe","html_url":"https://github.com/ddlBoJack/MT4SSL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddlBoJack%2FMT4SSL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddlBoJack%2FMT4SSL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddlBoJack%2FMT4SSL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddlBoJack%2FMT4SSL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ddlBoJack","download_url":"https://codeload.github.com/ddlBoJack/MT4SSL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225750119,"owners_count":17518315,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["multi-task-learning","self-supervised-learning","speech-recognition","unsupervised-learning"],"created_at":"2024-11-21T15:06:28.558Z","updated_at":"2024-11-21T15:06:29.656Z","avatar_url":"https://github.com/ddlBoJack.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003ch1\u003e\n        MT4SSL\n    \u003c/h1\u003e\n    \u003cp\u003e\n    Official PyTorch implementation of \u003cb\u003e\u003cem\u003eMT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets\u003c/em\u003e\u003c/b\u003e\n    \u003c/p\u003e\n    \u003ca href=\"https://github.com/ddlBoJack/MT4SSL\"\u003e\u003cimg src=\"https://img.shields.io/badge/Platform-linux-lightgrey\" alt=\"version\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/ddlBoJack/MT4SSL\"\u003e\u003cimg src=\"https://img.shields.io/badge/Python-3.8-orange\" alt=\"version\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/ddlBoJack/MT4SSL\"\u003e\u003cimg src=\"https://img.shields.io/badge/PyTorch-1.12-brightgreen\" alt=\"python\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/ddlBoJack/MT4SSL\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-red.svg\" alt=\"mit\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n## Guides\n\nMT4SSL is a multi-task learning framework for speech-based self-supervised learning. \n\nMT4SSL optimizes the model with offline targets and online targets simultaneously. \n\nMT4SSL achieves good performance and convergence. Relatively low WER on the speech recognition task can be obtained with only a few pre-training steps. \n\n\n\n## Model\n\n![](./src/MT4SSL.png)\n\n\n\n## Implementation\n\n### Setup\n\nThe implementation is mainly based on the [fairseq](https://github.com/facebookresearch/fairseq) codebase. \n\n```bash\ngit clone https://github.com/pytorch/fairseq\ncd fairseq\npip install --editable ./\ngit clone https://github.com/ddlBoJack/MT4SSL\n```\n\n\n\n### Data Preparation\n\nWe use the [LibriSpeech](http://www.openslr.org/12) dataset for implementation.\n\nPlease follow the steps [here](https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec#prepare-training-data-manifest) to prepare `*.tsv` (sources), [here](https://github.com/facebookresearch/fairseq/tree/main/examples/hubert#data-preparation) to prepare  `*.km` (K-means targets), and [here](https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec#prepare-training-data-manifest) to prepare `*.ltr` (supervised targets). \n\n\n\n### Pre-train the proposed MT4SSL\n\n```bash\npython fairseq_cli/hydra_train.py \\\n    --config-dir MT4SSL/code/config/pretraining  \\\n    --config-name base_librispeech  \\\n    checkpoint.save_dir=${pretrain_dir}  \\\n    task.data=${data_dir}  \\\n    task.label_dir=${label_dir} \\\n    task.label_type=km \\\n    common.user_dir=MT4SSL/code \\\n```\n\nYou can simulate $16$ GPUs by using $k$ GPUs and adding command line parameters `distributed_training.distributed_world_size=k` `+optimization.update_freq='[x]'` where $x = 16/k$.\n\n\n\n### Fine-tune  a CTC model\n\n```bash\npython fairseq_cli/hydra_train.py  \\\n    --config-dir MT4SSL/code/config/finetuning \\\n    --config-name base_10h  \\\n    checkpoint.save_dir=${finetune_dir}  \\\n    task.data=${data_dir}   \\\n    model.w2v_path=${pretrain_dir} \\\n    common.user_dir=MT4SSL/code \\\n```\n\nYou can simulate $8$ GPUs by using $k$ GPUs and adding command line parameters `distributed_training.distributed_world_size=k` `+optimization.update_freq='[x]'` where $x = 8/k$.\n\n\n\n### Decode\n\nDecode with Viterbi algorithm:\n\n```bash\npython examples/speech_recognition/new/infer.py \\\n    --config-dir examples/speech_recognition/new/conf \\\n    --config-name infer \\\n    task=audio_finetuning \\\n    task.data=${data_dir} \\\n    task.labels=ltr \\\n    task.normalize=true \\\n    dataset.gen_subset=dev_clean,dev_other,test_clean,test_other \\\n    decoding.type=viterbi  \\\n    decoding.beam=1500  \\\n    common_eval.path=${finetune_dir}/checkpoint_best.pt \\\n    common.user_dir=MT4SSL/code \\\n```\n\n\n\nDecode with the 4-gram language model using [flashlight](https://github.com/flashlight/flashlight/tree/main/bindings/python) and [kenlm](https://github.com/kpu/kenlm): \n\n```bash\npython examples/speech_recognition/new/infer.py \\\n    --config-dir examples/speech_recognition/new/conf \\\n    --config-name infer \\\n    task=audio_finetuning \\\n    task.data=${data_dir} \\\n    task.labels=ltr \\\n    task.normalize=true \\\n    dataset.gen_subset=dev_clean,dev_other,test_clean,test_other \\\n    decoding.type=kenlm  \\\n    decoding.lmweight=2 decoding.wordscore=-1 decoding.silweight=0 \\\n    decoding.beam=1500 \\\n    decoding.lexicon=${lexicon_dir} \\\n    decoding.lmpath=${lm_dir} \\\n    common_eval.path=${finetune_dir}/checkpoint_best.pt \\\n    common.user_dir=MT4SSL/code \\\n```\n\n\n\n## Citation\n\n``` latex\n@inproceedings{ma2022mt4ssl,\n  title={MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets},\n  author={Ma, Ziyang and Zhen, Zhisheng and Tang, Changli and Wang, Yujin and Chen, Xie},\n  booktitle={Proc. Interspeech},\n  year={2023}\n}\n```\n\n## License\n\nThis repository is under the [MIT license](https://github.com/ddlBoJack/MT4SSL/blob/main/LICENSE). \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddlbojack%2Fmt4ssl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fddlbojack%2Fmt4ssl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddlbojack%2Fmt4ssl/lists"}