{"id":18439462,"url":"https://github.com/idiap/zff_vad","last_synced_at":"2025-10-05T15:29:06.647Z","repository":{"id":144963969,"uuid":"569298028","full_name":"idiap/zff_vad","owner":"idiap","description":"Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering","archived":false,"fork":false,"pushed_at":"2023-10-19T15:07:27.000Z","size":646,"stargazers_count":19,"open_issues_count":0,"forks_count":1,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-23T01:01:51.804Z","etag":null,"topics":["audio-processing","machine-learning","noise-robust","signal-processing","speech-activity-detection","voice-activity-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idiap.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-22T14:12:07.000Z","updated_at":"2025-03-13T22:14:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"5e414e5a-52d7-45c6-9c6f-9fd28bc53aab","html_url":"https://github.com/idiap/zff_vad","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fzff_vad","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fzff_vad/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fzff_vad/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fzff_vad/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idiap","download_url":"https://codeload.github.com/idiap/zff_vad/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247732687,"owners_count":20986902,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-processing","machine-learning","noise-robust","signal-processing","speech-activity-detection","voice-activity-detection"],"created_at":"2024-11-06T06:24:52.015Z","updated_at":"2025-10-05T15:29:06.562Z","avatar_url":"https://github.com/idiap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"================================================================================================================\nZFF VAD\n================================================================================================================\n\n\n[Paper_]\n[Poster_]\n[Video_]\n[Slides_]\n\n|License| |OpenSource| |BlackFormat| |BanditSecurity| |iSortImports|\n\n\n.. image:: img/figure.jpg\n  :alt: Pipeline\n\nUnsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering\n---------------------------------------------------------------------------------------------------------------\n\nThis repository contains the code developed for the Interspeech accepted paper: `Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering`__ by E. Sarkar, R. Prasad, and M. Magimai Doss (2022).\n\nPlease cite the original authors for their work in any publication(s) that uses this work:\n\n.. code:: bib\n\n    @inproceedings{sarkar22_interspeech,\n    author    = {Eklavya Sarkar and RaviShankar Prasad and Mathew Magimai Doss},\n    title     = {{Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering}},\n    year      = {2022},\n    booktitle = {Proc. Interspeech 2022},\n    pages     = {4626--4630},\n    doi       = {10.21437/Interspeech.2022-10535}\n    }\n\nApproach\n---------\n\nWe jointly model voice source and vocal tract system information using zero-frequency filtering technique for the purpose of voice activity detection. This is computed by combining the ZFF filter outputs together to compose a composite signal carrying salient source and system information, such as the fundamental frequency :math:`$f_0$` and formants :math:`$F_1$` and :math:`$F_2$`, and then applying a dynamic threshold after spectral entropy-based weighting. Our approach operates purely in the time domain, is robust across a range of SNRs, and is much more computationally efficient than other neural methods. \n\nInstallation\n------------\n\nThis package has very few requirements. \nTo create a new conda/mamba environment, install conda_, then mamba_ and simply follow the next steps:\n\n.. code:: bash\n\n    mamba env create -f environment.yml   # Create environment\n    conda activate zff                    # Activate environment\n    make install clean                    # Install packages\n\nCommand-line Usage\n-------------------\n\nTo segment a single audio file into a .csv file:\n\n.. code:: bash\n\n    segment -w path/to/audio.wav -o path/to/save/segments\n\nTo segment a folder of audio files:\n\n.. code:: bash\n\n    segment -f path/to/folder/of/audio/files -o path/to/save/segments\n\nFor more options check:\n\n.. code:: bash\n\n    segment -h\n\n*Note*: depending on the conditions of the given data, it will be necessary tune the smoothing and theta parameters.\n\nPython Usage\n-------------\n\nTo compute VAD on a given audio file:\n\n.. code:: python\n\n    from zff import utils\n    from zff.zff import zff_vad\n\n    # Read audio at native sampling rate\n    sr, audio = utils.load_audio(\"audio.wav\")\n\n    # Get segments\n    boundary = zff_vad(audio, sr)\n\n    # Smooth\n    boundary = utils.smooth_decision(boundary, sr)\n\n    # Convert from sample to time domain\n    segments = utils.sample2time(audio, sr, boundary)\n\n    # Save as .csv file\n    utils.save_segments(\"segments\", \"audio\", segments)\n\nTo extract the composite signal from a given audio file:\n\n.. code:: python\n\n    from zff.zff import zff_cs\n    from zff import utils\n\n    # Read audio at native sampling rate\n    fs, audio = utils.load_audio(\"audio.mp3\")\n\n    # Get composite signal\n    composite = zff_cs(audio, sr)\n    \n    # Get all signals\n    composite, y0, y1, y2, gcis = zff_cs(audio, sr, verbose=True)\n\n\nRepository Structure\n-----------------------------\n\n.. code:: bash\n\n    .\n    ├── environment.yml          # Environment\n    ├── img                      # Images\n    ├── LICENSE                  # License\n    ├── Makefile                 # Setup\n    ├── MANIFEST.in              # Setup\n    ├── pyproject.toml           # Setup\n    ├── README.rst               # README\n    ├── requirements.txt         # Setup\n    ├── setup.py                 # Setup\n    ├── version.txt              # Version\n    └── zff                      # Source code folder\n        ├── arguments.py            # Arguments parser\n        ├── segment.py              # Main method\n        ├── utils.py                # Utility methods\n        └── zff.py                  # ZFF methods\n\n\nContact\n-------\nFor questions or reporting issues to this software package, kindly contact the first author_.\n    \n.. _author: eklavya.sarkar@idiap.ch\n.. _Paper: https://www.isca-speech.org/archive/interspeech_2022/sarkar22_interspeech.html\n.. _Poster: https://eklavyafcb.github.io/docs/Sarkar_Interspeech_2022_Poster_Landscape.pdf\n.. _Video: https://youtu.be/hIHLu_7ESfM\n.. _Slides: https://eklavyafcb.github.io/docs/Sarkar_Interspeech_2022_Presentation.pdf\n.. _conda: https://conda.io\n.. _mamba: https://mamba.readthedocs.io/en/latest/installation.html#existing-conda-install\n__ https://www.isca-speech.org/archive/interspeech_2022/sarkar22_interspeech.html\n.. |License| image:: https://img.shields.io/badge/License-GPLv3-blue.svg\n    :target: https://github.com/idiap/ZFF_VAD/blob/master/LICENSE\n    :alt: License\n\n.. |OpenSource| image:: https://img.shields.io/badge/GitHub-Open%20source-green\n    :target: https://github.com/idiap/ZFF_VAD/\n    :alt: Open-Source\n\n.. |BlackFormat| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n    :target: https://github.com/psf/black\n    :alt: Style\n\n.. |BanditSecurity| image:: https://img.shields.io/badge/security-bandit-yellow.svg\n    :target: https://github.com/PyCQA/bandit\n    :alt: Security\n\n.. |iSortImports| image:: https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026labelColor=ef8336\n    :target: https://pycqa.github.io/isort\n    :alt: Imports\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fzff_vad","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidiap%2Fzff_vad","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fzff_vad/lists"}