{"id":20040788,"url":"https://github.com/vadimkantorov/readaudio","last_synced_at":"2025-05-05T08:32:05.222Z","repository":{"id":146466565,"uuid":"246450104","full_name":"vadimkantorov/readaudio","owner":"vadimkantorov","description":"Read audio with FFmpeg into NumPy/PyTorch via ctypes (standard library module)","archived":false,"fork":false,"pushed_at":"2020-08-12T21:09:44.000Z","size":67,"stargazers_count":11,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-08T19:47:06.245Z","etag":null,"topics":["audio","ctypes","dlpack","ffmpeg","numpy","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vadimkantorov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-11T01:53:30.000Z","updated_at":"2024-07-19T08:26:00.000Z","dependencies_parsed_at":"2023-08-15T02:20:35.404Z","dependency_job_id":null,"html_url":"https://github.com/vadimkantorov/readaudio","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vadimkantorov%2Freadaudio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vadimkantorov%2Freadaudio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vadimkantorov%2Freadaudio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vadimkantorov%2Freadaudio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vadimkantorov","download_url":"https://codeload.github.com/vadimkantorov/readaudio/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252466727,"owners_count":21752423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","ctypes","dlpack","ffmpeg","numpy","python","pytorch"],"created_at":"2024-11-13T10:43:54.346Z","updated_at":"2025-05-05T08:32:05.214Z","avatar_url":"https://github.com/vadimkantorov.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Work In Progress\n\nThis repo is a primer in reading audio (via ffmpeg) into NumPy/PyTorch arrays without copying data or process launching. Interfacing with FFmpeg is done in pure C code in [decode_audio.c](./decode_audio.c). Python wrapper is implemented in [decode_audio.py](./decode_audio.py) using a standard library module ctypes. C code returns a plain C structure [Audio](./decode_audio.c#L12-L20). This structure is then interpeted and wrapped by NumPy or PyTorch without copy. \n\nAt the bottom is an example of alternative solution using process launching. The first solution is preferable if you must load huge amounts of audio in various formats (for reading `*.wav` files, there exists a standard Python [`wave`](https://docs.python.org/3/library/wave.html) module and [`scipy.io.wavfile.read`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html)).\n\nIt is also a simple primer on FFmpeg audio decoding loop and basic ctypes usage for interfacing C code and NumPy/PyTorch (without creating a full-blown PyTorch C++ extension).\n\n### Usage\n```shell\n# install dependencies: ffmpeg executables and shared libraries on ubuntu\napt-get install -y ffmpeg libavcodec-dev libavformat-dev libavfilter-dev\n```\n\n```shell\n# create sample audio test.wav\nffmpeg -f lavfi -i \"sine=frequency=1000:duration=5\" -c:a pcm_s16le -ar 8000 test.wav\n\n# convert audio to raw format\nffmpeg -i test.wav -f s16le -acodec pcm_s16le golden.raw\n\n# play a raw file\nffplay -f s16le -ac 1 -ar 8000 golden.raw\n\n# compile executable for testing\nmake decode_audio_ffmpeg\n\n# convert audio to raw format and compare to golden\n./decode_audio_ffmpeg test.wav bin.raw\ndiff golden.raw bin.raw\n\n# compile a shared library for interfacing with NumPy and PyTorch\nmake decode_audio_ffmpeg.so\n\n# convert audio to raw format (NumPy) and compare to golden\npython3 decode_audio.py -i test.wav -o numpy.raw\ndiff golden.raw numpy.raw\n\n# convert audio to raw format (PyTorch) and compare to golden\npython3 decode_audio.py -i test.wav -o torch.raw\ndiff golden.raw torch.raw\n\n# convert audio to raw format (PyTorch / DLPack) and compare to golden\npython3 decode_audio.py -i test.wav -o dlpack.raw\ndiff golden.raw dlpack.raw\n```\n\n```python\n# read audio using subprocess\n# python3 decode_audio_subprocess.py test.wav\n\nimport sys\nimport subprocess\nimport struct\n\nformat_ffmpeg, format_struct = [('s16le', 'h'), ('f32le', 'f'), ('u8', 'B'), ('s8', 'b')][0]\nsample_rate = 8_000 # resample\nnum_channels = 1 # force mono\n\naudio = memoryview(subprocess.check_output(['ffmpeg', '-nostdin', '-hide_banner', '-nostats', '-loglevel', 'quiet', '-i', sys.argv[1], '-f', format_ffmpeg, '-ar', str(sample_rate), '-ac', str(num_channels), '-']))\naudio = audio.cast(format_struct, shape = [len(audio) // num_channels // struct.calcsize(format_struct), num_channels])\n\nprint('shape', audio.shape, 'itemsize', audio.itemsize, 'format', audio.format)\n# shape (40000, 1) itemsize 2 format h\n```\n\n### TODO\n- SOX backend ( https://github.com/pytorch/audio/blob/master/torchaudio/torch_sox.cpp)\n- ffmpeg audio filter graph\n- decode from a buffer\n- non-allocating version that keeps allocations in Python for simpler memory management\n- probe function\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvadimkantorov%2Freadaudio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvadimkantorov%2Freadaudio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvadimkantorov%2Freadaudio/lists"}