{"id":15691061,"url":"https://github.com/labbeti/sslh","last_synced_at":"2025-05-08T00:51:04.393Z","repository":{"id":60490382,"uuid":"301330454","full_name":"Labbeti/SSLH","owner":"Labbeti","description":"Deep Semi-Supervised Learning with Holistic methods for audio classification.","archived":false,"fork":false,"pushed_at":"2024-12-14T09:43:49.000Z","size":3171,"stargazers_count":10,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T23:22:06.276Z","etag":null,"topics":["audio-classification","deep-learning","machine-learning","pytorch","pytorch-lightning","semi-supervised"],"latest_commit_sha":null,"homepage":"https://doi.org/10.1186/s13636-022-00255-6","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Labbeti.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-10-05T07:45:04.000Z","updated_at":"2024-12-14T09:40:34.000Z","dependencies_parsed_at":"2025-01-21T00:36:39.708Z","dependency_job_id":"48bfb213-a627-4bbb-a57d-c5b0a038a812","html_url":"https://github.com/Labbeti/SSLH","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Labbeti%2FSSLH","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Labbeti%2FSSLH/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Labbeti%2FSSLH/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Labbeti%2FSSLH/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Labbeti","download_url":"https://codeload.github.com/Labbeti/SSLH/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252978686,"owners_count":21834913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-classification","deep-learning","machine-learning","pytorch","pytorch-lightning","semi-supervised"],"created_at":"2024-10-03T18:19:45.027Z","updated_at":"2025-05-08T00:51:04.359Z","avatar_url":"https://github.com/Labbeti.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- # -*- coding: utf-8 -*- --\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n# Deep Semi-Supervised Learning with Holistic methods (SSLH)\n\n\u003ca href=\"https://www.python.org/\"\u003e\u003cimg alt=\"Python\" src=\"https://img.shields.io/badge/-Python 3.9+-blue?style=for-the-badge\u0026logo=python\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pytorch.org/get-started/locally/\"\u003e\u003cimg alt=\"PyTorch\" src=\"https://img.shields.io/badge/-PyTorch 1.7.1-ee4c2c?style=for-the-badge\u0026logo=pytorch\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003ca href=\"https://black.readthedocs.io/en/stable/\"\u003e\u003cimg alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-black.svg?style=for-the-badge\u0026labelColor=gray\"\u003e\u003c/a\u003e\n\nUnofficial PyTorch and PyTorch-Lightning implementations of Deep Semi-Supervised Learning methods for audio tagging.\n\n\u003c/div\u003e\n\nThere is 4 SSL methods :\n- [FixMatch (FM)](https://arxiv.org/pdf/2001.07685.pdf) [1]\n- [MixMatch (MM)](https://arxiv.org/pdf/1905.02249.pdf) [2]\n- [ReMixMatch (RMM)](https://arxiv.org/pdf/1911.09785.pdf) [3]\n- [Unsupervised Data Augmentation (UDA)](https://arxiv.org/pdf/1904.12848.pdf) [4]\n\nFor the following datasets :\n- [CIFAR-10 (CIFAR10)](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf)\n- [ESC-10 (ESC10)](https://www.karolpiczak.com/papers/Piczak2015-ESC-Dataset.pdf)\n- [Google Speech Commands (GSC)](https://arxiv.org/pdf/1804.03209.pdf)\n- [Primate Vocalization Corpus (PVC)](https://arxiv.org/pdf/2101.10390.pdf)\n- [UrbanSound8k (UBS8K)](http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_urbansound_acmmm14.pdf)\n\nWith 3 models :\n- [WideResNet28 (WRN28)](https://arxiv.org/pdf/1605.07146.pdf)\n- [MobileNetV1 (MNV1)](https://arxiv.org/pdf/1704.04861.pdf)\n- [MobileNetV2 (MNV2)](https://arxiv.org/pdf/1801.04381.pdf)\n\n**IMPORTANT NOTE: The implementation of Mean Teacher (MT), Deep Co-Training (DCT) and Pseudo-Labeling (PL) are present in this repository but not fully tested.**\n\nYou can find a more stable version of MT and DCT at https://github.com/Labbeti/semi-supervised.\nThe datasets AudioSet and FSD50K are not officially supported.\n\nIf you meet problems to run experiments, you can contact me at `labbeti.pub@gmail.com`.\n\n\n## Installation\n#### Download \u0026 setup\n```bash\ngit clone https://github.com/Labbeti/SSLH\nconda env create -n env_sslh -f environment.yaml\nconda activate env_sslh\npip install -e SSLH --no-dependencies\n```\n\n#### Alternatives\n- As python package :\n```bash\npip install https://github.com/Labbeti/SSLH\n```\nThe dependencies will be automatically installed with pip instead of conda, which means the the build versions can be slightly different.\n\nThe project contains also a ```environment.yaml``` and ```requirements.txt``` for installing the packages respectively with conda or pip.\n- With **conda** environment file :\n```bash\nconda env create -n env_sslh -f environment.yaml\nconda activate env_sslh\npip install -e . --no-dependencies\n```\n\n- With **pip** requirements file :\n```bash\npip install -r requirements.txt\npip install -e . --no-dependencies\n```\n\n## Datasets\nCIFAR10, ESC10, GoogleSpeechCommands and FSD50K can be downloaded and installed.\nFor UrbanSound8k, please read the [README of leocances](https://github.com/leocances/UrbanSound8K/blob/master/README.md#prepare-the-dataset), in section \"Prepare the dataset\".\nAudioSet (ADS) and Primate Vocalize Corpus (PVC) cannot be installed automatically by now.\n\nTo download a dataset, you can use the `data.dm.download=true` option.\n\n[comment]: \u003c\u003e (TODO : For Audioset install !)\n[comment]: \u003c\u003e (TODO : For PVC install !)\n\n## Usage\nThis code use Hydra for parsing args. The syntax of setting an argument is \"name=value\" instead of \"--name value\".\n\nExample 1 : MixMatch on ESC10\n```bash\npython -m sslh.mixmatch data=ssl_esc10 data.dm.download=true\n```\n\nExample 2 : Supervised+Weak on GSC\n```bash\npython -m sslh.supervised data=sup_gsc aug@train_aug=weak data.dm.bsize=256 epochs=300 data.dm.download=true\n```\n\nExample 3 : FixMatch+MixUp on UBS8K\n```bash\npython -m sslh.fixmatch data=ssl_ubs8K pl=fixmatch_mixup data.dm.bsize_s=128 data.dm.bsize_u=128 epochs=300 data.dm.download=true\n```\n\nExample 4 : ReMixMatch on CIFAR-10\n```bash\npython -m sslh.remixmatch data=ssl_cifar10 model.n_input_channels=3 aug@weak_aug=img_weak aug@strong_aug=img_strong data.dm.download=true\n```\n\n## List of main arguments\n\n| Name | Description | Values | Default |\n| --- | --- | --- | --- |\n| data | Dataset used | (sup|ssl)_(ads|cifar10|esc10|fsd50k|gsc|pvc|ubs8k) | (sup|ssl)_esc10 |\n| pl | Pytorch Lightning training method (experiment) used | *(depends of the python script, see the filenames in config/pl/ folder)* | *(depends of the python script)* |\n| model | Pytorch model to use | mobilenetv1, mobilenetv2, vgg, wideresnet28 | wideresnet28 |\n| optim | Optimizer used | adam, sgd | adam |\n| sched | Learning rate scheduler | cosine, softcosine, none | softcosine |\n| epochs | Number of training epochs | int | 1 |\n| bsize | Batch size in SUP methods | int | 60 |\n| ratio | Ratio of the training data used in SUP methods | float in [0, 1] | 1.0 |\n| bsize_s | Batch size of supervised part in SSL methods | int | 30 |\n| bsize_u | Batch size of unsupervised part in SSL methods | int | 30 |\n| ratio_s | Ratio of the supervised training data used in SSL methods | float in [0, 1] | 0.1 |\n| ratio_u | Ratio of the unsupervised training data used in SSL methods | float in [0, 1] | 0.9 |\n\n\n## SSLH Package overview\n```\nsslh\n├── callbacks\n├── datamodules\n│     ├── supervised\n│     └── semi_supervised\n├── datasets\n├── pl_modules\n│     ├── deep_co_training\n│     ├── fixmatch\n│     ├── mean_teacher\n│     ├── mixmatch\n│     ├── mixup\n│     ├── pseudo_labeling\n│     ├── remixmatch\n│     ├── supervised\n│     └── uda\n├── metrics\n├── models\n├── transforms\n│     ├── get\n│     ├── image\n│     ├── other\n│     ├── pools\n│     ├── self_transforms\n│     ├── spectrogram\n│     └── waveform\n└── utils\n```\n\n## Authors\nThis repository has been created by Etienne Labbé (Labbeti on Github).\n\nIt contains also some code from the following authors :\n- Léo Cancès (leocances on github)\n  - For AudioSet, ESC10, GSC, PVC and UBS8K datasets base code.\n- Qiuqiang Kong (qiuqiangkong on Github)\n  - For MobileNetV1 \u0026 V2 model implementation from [PANN](https://github.com/qiuqiangkong/audioset_tagging_cnn).\n\n## Additional notes\n- This project has been made with Ubuntu 20.04 and Python 3.8.5.\n\n## Glossary\n| Acronym | Description |\n| --- | --- |\n| activation | Activation function |\n| ADS | AudioSet |\n| aug, augm, augment | Augmentation |\n| ce | Cross-Entropy |\n| expt | Experiment |\n| fm | FixMatch |\n| fn, func | Function |\n| GSC | Google Speech Commands dataset (with 35 classes) |\n| GSC12 | Google Speech Commands dataset (with 10 classes from GSC, 1 unknown class and 1 silence class) |\n| hparams | Hyperparameters |\n| js | Jensen-Shannon |\n| kl | Kullback-Leibler |\n| loc | Localisation |\n| lr | Learning Rate |\n| mm | MixMatch |\n| mse | Mean Squared Error |\n| pred | Prediction |\n| PVC | Primate Vocalize Corpus dataset |\n| rmm | ReMixMatch |\n| _s | Supervised |\n| sched | Scheduler |\n| SSL | Semi-Supervised Learning |\n| SUP | Supervised Learning |\n| _u | Unsupervised |\n| UBS8K | UrbanSound8K dataset |\n\n## References\n\n[1] K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A. Ku-\nrakin, H. Zhang, and C. Raffel, “FixMatch: Simplifying Semi-Supervised\nLearning with Consistency and Confidence,” p. 21.\n\n[2] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and\nC. Raffel, “MixMatch: A Holistic Approach to Semi-Supervised Learning,”\nOct. 2019, number: arXiv:1905.02249 arXiv:1905.02249 [cs, stat]. [Online].\nAvailable: http://arxiv.org/abs/1905.02249\n\n[3] D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn,\nH. Zhang, and C. Raffel, “ReMixMatch: Semi-Supervised Learning\nwith Distribution Alignment and Augmentation Anchoring,” Feb. 2020,\nnumber: arXiv:1911.09785 arXiv:1911.09785 [cs, stat]. [Online]. Available:\nhttp://arxiv.org/abs/1911.09785\n\n[4] Q. Xie, Z. Dai, E. Hovy, M.-T. Luong, and Q. V. Le, “Unsu-\npervised Data Augmentation for Consistency Training,” Nov. 2020,\nnumber: arXiv:1904.12848 arXiv:1904.12848 [cs, stat]. [Online]. Available:\nhttp://arxiv.org/abs/1904.12848\n\n\u003c!-- Cances, L., Labbé, E. \u0026 Pellegrini, T. Comparison of semi-supervised deep learning algorithms for audio classification. J AUDIO SPEECH MUSIC PROC. 2022, 23 (2022). https://doi.org/10.1186/s13636-022-00255-6 --\u003e\n\n## Cite this repository\nIf you use this code, you can cite the following paper associated :\n```\n@article{cances_comparison_2022,\n\ttitle        = {Comparison of semi-supervised deep learning algorithms for audio classification},\n\tauthor       = {Cances, Léo and Labbé, Etienne and Pellegrini, Thomas},\n\tyear         = 2022,\n\tmonth        = sep,\n\tjournal      = {EURASIP Journal on Audio, Speech, and Music Processing},\n\tvolume       = 2022,\n\tnumber       = 1,\n\tpages        = 23,\n\tdoi          = {10.1186/s13636-022-00255-6},\n\tissn         = {1687-4722},\n\turl          = {https://doi.org/10.1186/s13636-022-00255-6},\n\tabstract     = {In this article, we adapted five recent SSL methods to the task of audio classification. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two collaborative neural networks. The three other algorithms, called MixMatch (MM), ReMixMatch (RMM), and FixMatch (FM), are single-model methods that rely primarily on data augmentation strategies. Using the Wide-ResNet-28-2 architecture in all our experiments, 10\\% of labeled data and the remaining 90\\% as unlabeled data for training, we first compare the error rates of the five methods on three standard benchmark audio datasets: Environmental Sound Classification (ESC-10), UrbanSound8K (UBS8K), and Google Speech Commands (GSC). In all but one cases, MM, RMM, and FM outperformed MT and DCT significantly, MM and RMM being the best methods in most experiments. On UBS8K and GSC, MM achieved 18.02\\% and 3.25\\% error rate (ER), respectively, outperforming models trained with 100\\% of the available labeled data, which reached 23.29\\% and 4.94\\%, respectively. RMM achieved the best results on ESC-10 (12.00\\% ER), followed by FM which reached 13.33\\%. Second, we explored adding the mixup augmentation, used in MM and RMM, to DCT, MT, and FM. In almost all cases, mixup brought consistent gains. For instance, on GSC, FM reached 4.44\\% and 3.31\\% ER without and with mixup. Our PyTorch code will be made available upon paper acceptance at https://github.com/Labbeti/SSLH.}\n}\n```\n\n## Contact\nMaintainer:\n- Etienne Labbé \"Labbeti\": labbeti.pub@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flabbeti%2Fsslh","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flabbeti%2Fsslh","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flabbeti%2Fsslh/lists"}