https://github.com/transcriptaze/wav2png
Renders an audio WAV file as a PNG image.
https://github.com/transcriptaze/wav2png
waveform webgpu
Last synced: 5 months ago
JSON representation
Renders an audio WAV file as a PNG image.
- Host: GitHub
- URL: https://github.com/transcriptaze/wav2png
- Owner: transcriptaze
- License: mit
- Created: 2020-01-19T23:37:51.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2024-06-26T19:56:19.000Z (almost 2 years ago)
- Last Synced: 2024-06-27T00:17:48.971Z (almost 2 years ago)
- Topics: waveform, webgpu
- Language: Go
- Homepage:
- Size: 78.7 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README

# wav2png
Renders a WAV file as a PNG image, with options to draw a grid, custom colouring and anti-aliasing.
There are three implementations:
- A command line version (in the _go_ subdirectory), mostly intended for scripting and batch processing
- An online version implemented by compiling this library to WASM, which can be found
[here](https://transcriptaze.github.io/W2P.html).
- A WebGPU version in the _webgpu_ directory) for a faster interactive experience,
hosted on [CloudFlare Pages](https://wav2png.pages.dev).
## Raison d'être
_wav2png_ was originally created as a Go utility library to render an audio file as an anti-aliased waveform for
a WASM project - it just sseemed like a good idea to add a standalone command line version, and now a WebGPU
implementation for better interactivity.
[
](gallery/cli/acoustic.png)
[
](gallery/webgpu/line-bw.png)
[
](gallery/webgpu/gradient-yellow.png)
[
](gallery/webgpu/gradient3E.png)
## Releases
| *Version* | *Description* |
| --------- | -------------------------------------------------------------------------------------------------------- |
| v1.1.0 | Added `wav2mp4` command line utility |
| v1.0.0 | Initial release |
## Installation
Platform specific executables can be downloaded from the [releases](https://github.com/transcriptaze/wav2png/releases)
page. Installation is straightforward - download and extract the archive for your platform and place the executables in
a directory in your PATH.
### Building from source
#### Command line
Assuming you have `Go` and `make` installed:
```
git clone https://github.com/transcriptaze/wav2png.git
cd wav2png/go
make build
```
If you prefer not to use `make`:
```
git clone https://github.com/transcriptaze/wav2png.git
cd wav2png/go
mkdir bin
go build -o bin ./...
```
#### WebGPU
## Usage
### wav2png
Command line:
```
wav2png [--debug] [options] [--out ]
WAV file to render.
--out File path for PNG file - if is a directory, the WAV file name is
used. Defaults to the WAV file base path.
--debug Displays occasionally useful diagnostic information.
Options:
--settings JSON file with the default settings for the height, width, etc. Defaults to
.settings.json if not specified, falling back to internal default values if
.settings.json does not exist.
--width Width (in pixels) of the PNG image. Valid values are in the range 32 to 8192,
defaults to 645px.
--height Height (in pixels) of the PNG image. Valid values are in the range 32 to 8192,
defaults to 395px.
--padding Padding (in pixels) between the border of the PNG and the extent of the rendered
waveform. Valid values are -16 to +32, defaults to 2px.
--palette Palette used to colour the waveform. May be the name of one of the internal colour
palettes or a user provided PNG file. Defaults to 'ice'
--fill Fill specification for the background colour, in the form type:colour
e.g. solid:#0000ffff. Currently the only fill type supported is 'solid', defaults
to solid:#000000ff.
--grid Grid specification for an optional rectilinear grid, in the form
type:colour:size:overlay, e.g.
- none
- square:#008000ff:~64
- rectangle:#008000ff:~64x48:overlay
The size may preceded by a 'fit':
- ~ approximate
- = exact
- ≥ at least
- ≤ at most
- > greater than
- < less than
If gridspec includes :overlay, the grid is rendered 'in front' of the waveform.
The default gridspec is 'square:#008000ff:~64'
--antialias The antialising kernel applied to soften the rendered PNG. Valid values are:
- none no antialiasing
- horizontal blurs horizontal edges
- vertical blurs vertical edges
- soft blurs both horizontal and vertical edges
The default kernel is 'vertical'.
--scale A vertical scaling factor to size the height of the rendered waveform. The valid
range is 0.2 to 5.0, defaults to 1.0.
--mix Specifies how to combine channels from a stereo WAV file. Valid values are:
- 'L' Renders the left channel only
- 'R' Renders the right channel only
- 'L+R' Combines the left and right channels
Defaults to 'L+R'.
--start
--end
Example:
wav2png --debug \
--settings 'settings.json' \
--height 390 \
--width 641 \
--padding 0 \
--palette 'amber.png' \
--fill 'solid:#0000ffff' \
--grid 'rectangular:#800000ff:~32x128:overlay' \
--antialias 'soft' \
--scale 0.5 \
--start 0.5s \
--end 1.5s \
--mix 'L+R' \
--out example.png \
example.wav
```
## wav2mp4
Command line:
```
wav2mp4 [--debug] [options] [--out ] --window --fps --cursor
WAV file to render.
--out File path for MP4 file - if is a directory, the WAV file name is
used and defaults to the WAV file base path. wav2mp4 generates a set of ffmpeg
frames files in the 'frames' subdirectory of the out file directory.
--window The interval allotted to a single frame, in Go time format e.g. --window 1s.
The window interval must be less than the duration of the MP4.
--fps Frame rate for the MP4 in frames per second e.g. --fps 30
--cursor Cursor to indicate for the current play position. A cursor is specified by the
image source and dynamic:
--cursor :
where image may be:
- none
- red (internal 'red' cursor)
- blue (internal 'blue' cursor)
- a PNG file with a custom cursor image
The cursor 'dynamic' defaults to 'sweep' if not specified, but may be one of
the following:
- sweep Moves linearly from left to right over the duration of the MP4
- left Fixed on left side
- right Fixed on right side
- center Fixed in center of frame
- ease Migrates from the left to center of the frame, before moving to the
right side to finish
- erf Moves 'sigmoidally' from left to right over the duration of the MP4,
with the sigmoid defined by the inverse error function
--debug Displays occasionally useful diagnostic information.
Options:
--settings JSON file with the default settings for the height, width, etc. Defaults to
.settings.json if not specified, falling back to internal default values if
.settings.json does not exist.
--width Width (in pixels) of the PNG image. Valid values are in the range 32 to 8192,
defaults to 645px.
--height Height (in pixels) of the PNG image. Valid values are in the range 32 to 8192,
defaults to 395px.
--padding Padding (in pixels) between the border of the PNG and the extent of the
rendered waveform. Valid values are -16 to +32, defaults to 2px.
--palette Palette used to colour the waveform. May be the name of one of the internal
colour palettes or a user provided PNG file. Defaults to 'ice'
--fill Fill specification for the background colour, in the form type:colour
e.g. solid:#0000ffff. Currently the only fill type supported is 'solid',
defaults to solid:#000000ff.
--grid Grid specification for an optional rectilinear grid, in the form
type:colour:size:overlay, e.g.
- none
- square:#008000ff:~64
- rectangle:#008000ff:~64x48:overlay
The size may preceded by a 'fit':
- ~ approximate
- = exact
- ≥ at least
- ≤ at most
- > greater than
- < less than
If gridspec includes :overlay, the grid is rendered 'in front' of the
waveform.
The default gridspec is 'square:#008000ff:~64'
--antialias The antialising kernel applied to soften the rendered PNG. Valid values are:
- none no antialiasing
- horizontal blurs horizontal edges
- vertical blurs vertical edges
- soft blurs both horizontal and vertical edges
The default kernel is 'vertical'.
--scale A vertical scaling factor to size the height of the rendered waveform. The
valid range is 0.2 to 5.0, defaults to 1.0.
--mix Specifies how to combine channels from a stereo WAV file. Valid values are:
- 'L' Renders the left channel only
- 'R' Renders the right channel only
- 'L+R' Combines the left and right channels
Defaults to 'L+R'.
--start
--end
Example:
wav2mp4 --debug --window 1s --fps 30 --cursor red:sweep --out ./example/chirp.mp4 ./samples/chirp.wav
cd ./example/frames
ffmpeg -framerate 30 -i frame-%05d.png -c:v libx264 -pix_fmt yuv420p -crf 23 -y out.mp4
ffmpeg -i out.mp4 -i ./samples/chirp.wav -c:v copy -c:a aac -y ../chirp.mp4
```
## References & Related Projects
1. [Audio File Format Specifications](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html)
2. [SoX](http://sox.sourceforge.net)
3. [WaveFile Gem](https://wavefilegem.com/how_wave_files_work.html)
4. [DirectMusic: wav2png](https://directmusic.me/wav2png)
5. [Shulz Audio: wav2png](https://schulz.audio/products/wav2png)
6. [iconmonstr](https://iconmonstr.com/sound-wave-2-png)
7. [BBC](https://github.com/bbc/audiowaveform)
8. [BBC CLI](https://github.com/marc7806/bbc-audiowaveform-cli-wrapper)
9. [stackoverflow](https://stackoverflow.com/questions/4468157/how-can-i-create-a-waveform-image-of-an-mp3-in-linux)
10. [github:waveform](https://github.com/andrewrk/waveform)