{"id":18439449,"url":"https://github.com/idiap/ssl-caller-detection","last_synced_at":"2025-04-07T21:32:29.252Z","repository":{"id":189729651,"uuid":"648179814","full_name":"idiap/ssl-caller-detection","owner":"idiap","description":"Source code for the paper 'Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?' by E. Sarkar and M. Magimai Doss (2023).","archived":false,"fork":false,"pushed_at":"2024-03-07T10:46:23.000Z","size":2404,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-23T01:01:51.769Z","etag":null,"topics":["bio-acoustics","machine-learning","representation-learning","self-supervised-learning","signal-processing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idiap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-01T11:37:38.000Z","updated_at":"2024-04-12T01:03:55.000Z","dependencies_parsed_at":"2023-08-21T14:47:05.123Z","dependency_job_id":"cc40c7d1-76cb-4e60-a645-36cbd1db0cf3","html_url":"https://github.com/idiap/ssl-caller-detection","commit_stats":null,"previous_names":["idiap/ssl-caller-detection"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fssl-caller-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fssl-caller-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fssl-caller-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fssl-caller-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idiap","download_url":"https://codeload.github.com/idiap/ssl-caller-detection/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247732684,"owners_count":20986901,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bio-acoustics","machine-learning","representation-learning","self-supervised-learning","signal-processing"],"created_at":"2024-11-06T06:24:50.097Z","updated_at":"2025-04-07T21:32:24.233Z","avatar_url":"https://github.com/idiap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? \n\n[[Paper]](https://www.isca-speech.org/archive/interspeech_2023/sarkar23_interspeech.html)\n[[Video]](https://youtu.be/fU_Pt_OuW1U)\n[[Slides]](https://eklavyafcb.github.io/docs/Sarkar_Interspeech_2023_Presentation.pdf)\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://paperswithcode.com/sota/caller-detection-on-infantmarmosetsvox?p=can-self-supervised-neural-networks-pre\"\u003e\n        \u003cimg alt=\"License\" src=\"https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/can-self-supervised-neural-networks-pre/caller-detection-on-infantmarmosetsvox\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/idiap/ssl-caller-detection/blob/main/LICENSE\"\u003e\n        \u003cimg alt=\"License\" src=\"https://img.shields.io/badge/License-GPLv3-blue.svg\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/idiap/ssl-caller-detection\"\u003e\n        \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/badge/GitHub-Open%20source-green\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/psf/black\"\u003e\n        \u003cimg alt=\"Black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cimg src=\"img/figure.jpg\" alt=\"header\" width=\"1000\"/\u003e\n\n## Cite\n\nThis repository contains the source code for the Interspeech accepted paper [Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?](https://www.isca-speech.org/archive/interspeech_2023/sarkar23_interspeech.html) by E. Sarkar and M. Magimai Doss (2023).\n\nPlease cite the original authors for their work in any publication(s) that uses this work:\n\n```bib\n@inproceedings{sarkar23_interspeech,\n  author={Eklavya Sarkar and Mathew Magimai.-Doss},\n  title={{Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?}},\n  year=2023,\n  booktitle={Proc. INTERSPEECH 2023},\n  pages={1189--1193},\n  doi={10.21437/Interspeech.2023-1968}\n}\n```\n\n## Dataset\n\nInfantMarmosetsVox is a dataset for multi-class call-type and caller identification. It contains audio recordings of different individual marmosets and their call-types. The dataset contains a total of 350 files of precisely labelled 10-minute audio recordings across all caller classes. The audio was recorded from five pairs of infant marmoset twins, each recorded individually in two separate sound-proofed recording rooms at a sampling rate of 44.1 kHz. The start and end time, call-type, and marmoset identity of each vocalization are provided, labeled by an experienced researcher. It contains a total of 169,318 labeled audio segments, which amounts to 72,921 vocalization segments once removing the \"Silence\" and \"Noise\" classes. There are 11 different call-types (excluding \"Silence\" and \"Noise\") and 10 different caller identities. \n\nThe dataset is publicly available [here](https://www.idiap.ch/en/dataset/infantmarmosetsvox/index_html), and contains a usable Pytorch `Dataset` and `Dataloader`. Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of InfantsMarmosetVox **must cite** this [paper](https://www.isca-speech.org/archive/interspeech_2023/sarkar23_interspeech.html). \n\nMore information on the usage is provided in the `README.txt` file included in the dataset.\n\n## Installation\n\nThis package has very few requirements. \nTo create a new conda/mamba environment, install [conda](https://conda.io), then [mamba](https://mamba.readthedocs.io/en/latest/installation.html#existing-conda-install), and simply follow the next steps:\n\n```\nmamba env create -f environment.yml           # Create env\nmamba activate marmosets                      # Activate env\n```\n\n## Experiments\nThe following run compute the stated computations:\n\nPreprocessing:\n- `extract_features.py` extracts SSL embeddings.\n- `extract_baselines.py` extracts handcrafted features.\n- `embeddings2cid_pickles.py` converts the variable-length features to fixed-length functionals.\n\nStudy I - Caller Discrimination Analysis:\n- `functionals2distributions.py` comptes the KL-divergence and Bhattacharya distance between extracted embeddings.\n\nStudy II - Caller Detection Study:\n- `classifier_caller_groups.py` classifies the functionals using a ML classifier (SVM, RF, AB).\n- `compile_results.py` compiles all the results computed from `classifier_caller_groups.py`.\n\nMisc:\n- `utils.py` contains utility functions such as loading the SSL embeddings or SSL functionals.\n\nNote that the protocols of experiments above are defined in `marmoset_lists` which contains the sets splits and other mappings in `.pkl` files.\n\n## Usage\nThe scripts above are independent, and need various parameters. To run any of the above experiments, see all the necessary requirements with:\n\n```bash\npython src/file.py -h\n```\n\nThis will only run the permutation selected with the parameter variables.\nTo run all the experiments one would have to run a grid search across all possible values.\nNote that the experiments in the paper were only run with the task `-t marmosetID` parameter.\n\n## Directory Structure\n\nThe structure of this directory is organized as the following:\n\n```\n.\n├── dataset                 # Dataset config files\n├── environment.yml         # Environment file\n├── img                     # Images\n├── LICENSE                 # License\n├── MANIFEST.in             # Setup\n├── marmoset_lists          # Protocol lists and pickles\n├── pkl                     # Pickles\n├── pyproject.toml          # Setup\n├── README.md               # This file\n├── src                     # Python source code\n└── version.txt             # Version\n```\n\n## Contact\n\nFor questions or reporting issues to this software package, kindly contact the first [author](mailto:eklavya.sarkar@idiap.ch).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fssl-caller-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidiap%2Fssl-caller-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fssl-caller-detection/lists"}