{"id":13910505,"url":"https://github.com/MaxStrange/AudioSegment","last_synced_at":"2025-07-18T09:32:06.849Z","repository":{"id":24287672,"uuid":"101108109","full_name":"MaxStrange/AudioSegment","owner":"MaxStrange","description":"Wrapper for pydub AudioSegment objects","archived":false,"fork":false,"pushed_at":"2022-12-27T15:34:01.000Z","size":70590,"stargazers_count":95,"open_issues_count":4,"forks_count":13,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-11-25T06:17:04.969Z","etag":null,"topics":["audio","pydub","python","sound"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MaxStrange.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-22T21:17:07.000Z","updated_at":"2024-11-09T00:17:52.000Z","dependencies_parsed_at":"2023-01-14T07:45:58.135Z","dependency_job_id":null,"html_url":"https://github.com/MaxStrange/AudioSegment","commit_stats":null,"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxStrange%2FAudioSegment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxStrange%2FAudioSegment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxStrange%2FAudioSegment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxStrange%2FAudioSegment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MaxStrange","download_url":"https://codeload.github.com/MaxStrange/AudioSegment/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226388640,"owners_count":17617310,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","pydub","python","sound"],"created_at":"2024-08-07T00:01:30.007Z","updated_at":"2024-11-25T19:31:14.545Z","avatar_url":"https://github.com/MaxStrange.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# AudioSegment\n\n[![Build Status](https://travis-ci.org/MaxStrange/AudioSegment.svg?branch=master)](https://travis-ci.org/MaxStrange/AudioSegment)\n\nWrapper for [pydub](https://github.com/jiaaro/pydub) AudioSegment objects. An audiosegment.AudioSegment object wraps\na pydub.AudioSegment object. Any methods or properties it has, this also has.\n\n[Docs](https://maxstrange.github.io/AudioSegment/) are hosted by GitHub Pages, but are currently hideous. I've got to do\nsomething about them as soon as I find some time. You can also try [Read The Docs](https://audiosegment.readthedocs.io/en/latest/),\nthough the docs there don't seem to be building for some reason.... also something I need to look into. Up-to-date docs\nare also built and pushed and are in the docs folder of this repository.\n\n## Notes\n\nThere is a hidden dependency on the command line program 'sox'. Pip will not install it for you.\nYou will have to install sox by:\n\n- Debian/Ubuntu: `sudo apt-get install sox`\n- Mac OS X: `brew install sox`\n- Windows: `choco install sox`\n\nAlso, I use [librosa](https://github.com/librosa/librosa) and [scipy](https://www.scipy.org/), for some of the functionality.\nThese dependencies are hefty, and I have decided to make them optional. If you do not install them, you may get warnings\nwhen using audiosegment.\n\nSo, a full installation on Debian/Ubuntu would like like this:\n\n```bash\nsudo apt-get install sox\npip3 install --user audiosegment\n\n# To get scipy, you will need some lapack/blas resources:\nsudo apt-get install libatlas-base-dev gfortran\npip3 install --user scipy\n\n# To get librosa, you will need numba, which requires LLVMlite, which requires LLVM.\nsudo apt-get install llvm\npip3 install --user librosa\n```\n\nMake suitable adjustments to fit your own OS's package management system.\n\n## TODO\n\nThe following is the list of items I plan on implementing.\n\n- Finish implementing auditory scene analysis (a.k.a blind source separation)\n- Add voice-pass filtering and make voice activity detection better\n- Add language classification for English and Chinese (and show how to do it for other languages)\n- Add more examples to README (especially filterbank)\n- Finish removing the SOX dependency\n\nI am open to other suggestions. Open an issue if you have requests, or better yet, if you can do it yourself and open\na pull request, I'll take a look and merge in if I think it makes sense.\n\n## Example Usage\n\n### Basic information\n\n```python\nimport audiosegment\n\nprint(\"Reading in the wave file...\")\nseg = audiosegment.from_file(\"whatever.wav\")\n\nprint(\"Information:\")\nprint(\"Channels:\", seg.channels)\nprint(\"Bits per sample:\", seg.sample_width * 8)\nprint(\"Sampling frequency:\", seg.frame_rate)\nprint(\"Length:\", seg.duration_seconds, \"seconds\")\n```\n\n### Voice Detection\n\n```python\n# ...\nprint(\"Detecting voice...\")\nseg = seg.resample(sample_rate_Hz=32000, sample_width=2, channels=1)\nresults = seg.detect_voice()\nvoiced = [tup[1] for tup in results if tup[0] == 'v']\nunvoiced = [tup[1] for tup in results if tup[0] == 'u']\n\nprint(\"Reducing voiced segments to a single wav file 'voiced.wav'\")\nvoiced_segment = voiced[0].reduce(voiced[1:])\nvoiced_segment.export(\"voiced.wav\", format=\"WAV\")\n\nprint(\"Reducing unvoiced segments to a single wav file 'unvoiced.wav'\")\nunvoiced_segment = unvoiced[0].reduce(unvoiced[1:])\nunvoiced_segment.export(\"unvoiced.wav\", format=\"WAV\")\n```\n\n### Silence Removal\n\n```python\nimport matplotlib.pyplot as plt\n\n# ...\nprint(\"Plotting before silence...\")\nplt.subplot(211)\nplt.title(\"Before Silence Removal\")\nplt.plot(seg.get_array_of_samples())\n\nseg = seg.filter_silence(duration_s=0.2, threshold_percentage=5.0)\noutname_silence = \"nosilence.wav\"\nseg.export(outname_silence, format=\"wav\")\n\nprint(\"Plotting after silence...\")\nplt.subplot(212)\nplt.title(\"After Silence Removal\")\n\nplt.tight_layout()\nplt.plot(seg.get_array_of_samples())\nplt.show()\n```\n\n![alt text](docs/images/silencecompare.png \"Silence Removal\")\n\n### FFT\n\n```python\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n#...\n# Do it just for the first 3 seconds of audio\nhist_bins, hist_vals = seg[1:3000].fft()\nhist_vals_real_normed = np.abs(hist_vals) / len(hist_vals)\nplt.plot(hist_bins / 1000, hist_vals_real_normed)\nplt.xlabel(\"kHz\")\nplt.ylabel(\"dB\")\nplt.show()\n```\n\n![alt text](docs/images/fft.png \"FFT of Fur Elise\")\n\n### Spectrogram\n\n```python\nimport matplotlib.pyplot as plt\n\n#...\nfreqs, times, amplitudes = seg.spectrogram(window_length_s=0.03, overlap=0.5)\namplitudes = 10 * np.log10(amplitudes + 1e-9)\n\n# Plot\nplt.pcolormesh(times, freqs, amplitudes)\nplt.xlabel(\"Time in Seconds\")\nplt.ylabel(\"Frequency in Hz\")\nplt.show()\n```\n\n![alt text](docs/images/spectrogram.png \"Spectrogram of voice\")\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMaxStrange%2FAudioSegment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMaxStrange%2FAudioSegment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMaxStrange%2FAudioSegment/lists"}