{"id":13449863,"url":"https://github.com/amsehili/auditok","last_synced_at":"2025-03-22T23:31:49.543Z","repository":{"id":46266615,"uuid":"42661996","full_name":"amsehili/auditok","owner":"amsehili","description":"An audio/acoustic activity detection and audio segmentation tool","archived":false,"fork":false,"pushed_at":"2024-10-29T18:32:38.000Z","size":4359,"stargazers_count":740,"open_issues_count":7,"forks_count":95,"subscribers_count":27,"default_branch":"master","last_synced_at":"2024-10-29T19:05:02.609Z","etag":null,"topics":["audio-activities","audio-data","audio-segmentation","vad","voice-activity-detection","voice-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amsehili.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":"auditok/__init__.py","citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-09-17T14:46:20.000Z","updated_at":"2024-10-29T13:12:46.000Z","dependencies_parsed_at":"2024-12-03T09:39:30.896Z","dependency_job_id":null,"html_url":"https://github.com/amsehili/auditok","commit_stats":{"total_commits":375,"total_committers":11,"mean_commits":34.09090909090909,"dds":0.09066666666666667,"last_synced_commit":"35489d0824dc1eff8485e08238bdb3449e028ae5"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amsehili%2Fauditok","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amsehili%2Fauditok/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amsehili%2Fauditok/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amsehili%2Fauditok/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amsehili","download_url":"https://codeload.github.com/amsehili/auditok/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245036111,"owners_count":20550661,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-activities","audio-data","audio-segmentation","vad","voice-activity-detection","voice-detection"],"created_at":"2024-07-31T06:01:00.159Z","updated_at":"2025-03-22T23:31:49.249Z","avatar_url":"https://github.com/amsehili.png","language":"Python","funding_links":[],"categories":["Uncategorized","Python","Audio ##"],"sub_categories":["Uncategorized"],"readme":".. image:: doc/figures/auditok-logo.png\n    :align: center\n    :alt: Build status\n\n.. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master\n    :target: https://travis-ci.org/amsehili/auditok\n\n.. image:: https://readthedocs.org/projects/auditok/badge/?version=latest\n    :target: http://auditok.readthedocs.org/en/latest/?badge=latest\n    :alt: Documentation status\n\n``auditok`` is an **Audio Activity Detection** tool that can process online data\n(read from an audio device or from standard input) as well as audio files.\nIt can be used as a command-line program or by calling its API.\n\nThe latest version of the documentation can be found on\n`readthedocs. \u003chttps://auditok.readthedocs.io/en/latest/\u003e`_\n\n\nInstallation\n------------\n\nA basic version of ``auditok`` will run with standard Python (\u003e=3.4). However,\nwithout installing additional dependencies, ``auditok`` can only deal with audio\nfiles in *wav* or *raw* formats. if you want more features, the following\npackages are needed:\n\n- `pydub \u003chttps://github.com/jiaaro/pydub\u003e`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.\n- `pyaudio \u003chttps://people.csail.mit.edu/hubert/pyaudio\u003e`_ : read audio data from the microphone and play audio back.\n- `tqdm \u003chttps://github.com/tqdm/tqdm\u003e`_ : show progress bar while playing audio clips.\n- `matplotlib \u003chttps://matplotlib.org/stable/index.html\u003e`_ : plot audio signal and detections.\n- `numpy \u003chttps://numpy.org/\u003e`_ : required by matplotlib. Also used for some math operations instead of standard python if available.\n\nInstall the latest stable version with pip:\n\n\n.. code:: bash\n\n    sudo pip install auditok\n\n\nInstall the latest development version from github:\n\n.. code:: bash\n\n    pip install git+https://github.com/amsehili/auditok\n\nor\n\n.. code:: bash\n\n    git clone https://github.com/amsehili/auditok.git\n    cd auditok\n    python setup.py install\n\n\nBasic example\n-------------\n\n.. code:: python\n\n    import auditok\n\n    # split returns a generator of AudioRegion objects\n    audio_regions = auditok.split(\n        \"audio.wav\",\n        min_dur=0.2,     # minimum duration of a valid audio event in seconds\n        max_dur=4,       # maximum duration of an event\n        max_silence=0.3, # maximum duration of tolerated continuous silence within an event\n        energy_threshold=55 # threshold of detection\n    )\n\n    for i, r in enumerate(audio_regions):\n\n        # Regions returned by `split` have 'start' and 'end' metadata fields\n        print(\"Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s\".format(i=i, r=r))\n\n        # play detection\n        # r.play(progress_bar=True)\n\n        # region's metadata can also be used with the `save` method\n        # (no need to explicitly specify region's object and `format` arguments)\n        filename = r.save(\"region_{meta.start:.3f}-{meta.end:.3f}.wav\")\n        print(\"region saved as: {}\".format(filename))\n\noutput example:\n\n.. code:: bash\n\n    Region 0: 0.700s -- 1.400s\n    region saved as: region_0.700-1.400.wav\n    Region 1: 3.800s -- 4.500s\n    region saved as: region_3.800-4.500.wav\n    Region 2: 8.750s -- 9.950s\n    region saved as: region_8.750-9.950.wav\n    Region 3: 11.700s -- 12.400s\n    region saved as: region_11.700-12.400.wav\n    Region 4: 15.050s -- 15.850s\n    region saved as: region_15.050-15.850.wav\n\n\nSplit and plot\n--------------\n\nVisualize audio signal and detections:\n\n.. code:: python\n\n    import auditok\n    region = auditok.load(\"audio.wav\") # returns an AudioRegion object\n    regions = region.split_and_plot(...) # or just region.splitp()\n\noutput figure:\n\n.. image:: doc/figures/example_1.png\n\n\nLimitations\n-----------\n\nCurrently, the core detection algorithm is based on the energy of audio signal.\nWhile this is fast and works very well for audio streams with low background\nnoise (e.g., podcasts with few people talking, language lessons, audio recorded\nin a rather quiet environment, etc.) the performance can drop as the level of\nnoise increases. Furthermore, the algorithm makes no distinction between speech\nand other kinds of sounds, so you shouldn't use it for Voice Activity Detection\nif your audio data also contain non-speech events.\n\nLicense\n-------\nMIT.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famsehili%2Fauditok","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famsehili%2Fauditok","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famsehili%2Fauditok/lists"}