{"id":18752991,"url":"https://github.com/spotify/realbook","last_synced_at":"2025-04-04T12:06:29.628Z","repository":{"id":64843575,"uuid":"578213651","full_name":"spotify/realbook","owner":"spotify","description":"Easier audio-based machine learning with TensorFlow.","archived":false,"fork":false,"pushed_at":"2025-02-06T21:42:52.000Z","size":85,"stargazers_count":120,"open_issues_count":2,"forks_count":7,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-28T11:09:32.449Z","etag":null,"topics":["audio","cqt","librosa","machine-learning","mel-spectrogram","spectrograms","stft","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spotify.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-14T14:21:58.000Z","updated_at":"2025-03-23T10:07:48.000Z","dependencies_parsed_at":"2025-02-25T07:00:31.951Z","dependency_job_id":"4aba1b4a-f4bd-4a8b-93c8-a8b66d137274","html_url":"https://github.com/spotify/realbook","commit_stats":{"total_commits":21,"total_committers":3,"mean_commits":7.0,"dds":0.2857142857142857,"last_synced_commit":"9a1a54b7aba3b3af06bdb1fb02a4cd25ebacdb1e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify%2Frealbook","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify%2Frealbook/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify%2Frealbook/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify%2Frealbook/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spotify","download_url":"https://codeload.github.com/spotify/realbook/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247174415,"owners_count":20896078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","cqt","librosa","machine-learning","mel-spectrogram","spectrograms","stft","tensorflow"],"created_at":"2024-11-07T17:23:38.907Z","updated_at":"2025-04-04T12:06:29.609Z","avatar_url":"https://github.com/spotify.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/realbook)\n![Supported Platforms](https://img.shields.io/badge/platforms-macOS%20%7C%20Windows%20%7C%20Linux-green)\n![Lifecycle](https://img.shields.io/badge/lifecycle-production-1ed760.svg)\n\n\n# realbook 📒\n\nRealbook is a Python library for easier training of audio deep learning models with [Tensorflow](https://tensorflow.org) made by Spotify's [Spotify's Audio Intelligence Lab](https://research.atspotify.com/audio-intelligence/). Realbook provides callbacks (e.g., spectrogram visualization) and well-tested [Keras layers](https://keras.io/api/layers/) (e.g., STFT, ISTFT, magnitude spectrogram) that we often use when training. These functions have helped standardized consistency across all of our models we and hope realbook will do the same for the open source community.\n\n# Notable Features\n\nBelow are a few highlights of what we have written so far.\n\n## Keras Layers\n\n- `FrozenGraphLayer` - Allows you to use a TF V1 graph as a Keras layer.\n- `CQT` - Constant-Q transform layers ported from [nnAudio](https://kinwaicheuk.github.io/nnAudio/index.html).\n- `Stft`, `Istft`, `MelSpectrogram`, `Spectrogram`, `Magnitude`, `Phase` and `MagnitudeToDecibel` - Layers that perform common audio feature preprocessing. All checked for correctness against [librosa](https://librosa.org/).\n\n## Callbacks\n\n- `Spectrogram visualization` - Allows you to write spectrogram output layers to TensorBoard.\n- `Training Speed` - Allows you to visualize on TensorBoard how fast each epoch of training is taking.\n- `Utilization` - Allows you to plot on TensorBoard CPU, CPU Memory, GPU and GPU Memory utilization as you train.\n\n## Installation\n\n```shell\npip install realbook\n\n# Or, if using any TensorBoard-related callbacks, install additional dependencies:\npip install realbook[tensorboard]\n```\n\nThen, in your code:\n\n```python\nimport realbook.callbacks.spectrogram_visualization # a nifty TensorBoard callback\n```\n\n# Example\n\n## A Binary Classifier With Audio Input\n\nLet's use realbook to train a binary classifier that takes in audio, converts the audio to a spectrogram and then \nruns the spectorgram output through two trainable Dense layers.\n\n```python3\nimport tensorflow as tf\nfrom realbook.layers.signal import STFT\n\ntrain_ds = tf.data.TFRecordDataset(training_filenames)\nval_ds = tf.data.TFRecordDataset(validation_filenames)\n\n# Create a sequential model\nmodel = tf.keras.Sequential([\n    tf.keras.layers.InputLayer((22050,)),\n    signal.Stft(fft_length=1024, hop_length=512), 1_266_384),\n    tf.keras.layers.Dense(1024, activation=\"relu\"),\n    tf.keras.layers.Dense(2),\n])\n\n# Compile the model\nmodel.compile(optimizer='adam',\n              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),\n              metrics=['accuracy'])\n\n# Now train!\nmodel.fit(\n  train_ds,\n  validation_data=val_ds,\n  epochs=epochs\n)\n```\n\n## A Binary Classifier With Audio Input and CPU Memory Utilization Measurement\n\nBelow is the previous binary classifier example, but we're now going to add a realbook\ncallback to the model's callback list.\n\n```python3\nimport tensorflow as tf\nfrom realbook.layers.signal import STFT\nfrom realbook.callbacks.utilization import MemoryUtilizationCallback\n\ntrain_ds = tf.data.TFRecordDataset(training_filenames)\nval_ds = tf.data.TFRecordDataset(validation_filenames)\n\n# Create a sequential model\nmodel = tf.keras.Sequential([\n    tf.keras.layers.InputLayer((22050,)),\n    signal.Stft(fft_length=1024, hop_length=512), 1_266_384),\n    tf.keras.layers.Dense(1024, activation=\"relu\"),\n    tf.keras.layers.Dense(2),\n])\n\n# Compile the model\nmodel.compile(optimizer='adam',\n              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),\n              metrics=['accuracy'])\n\nwriter = tf.summary.create_file_writer(tensorboard_output_location)\n\n# Now train!\nmodel.fit(\n  train_ds,\n  validation_data=val_ds,\n  epochs=epochs,\n  callbacks=[MemoryUtilizationCallback(writer))]\n)\n```\n\n## Metrics\n\nRealbook contains a number of layers that convert audio data (i.e.: waveforms)\ninto various spectral representations (i.e.: spectrograms). For convenience, the amount of memory\nrequired for the most commonly used layers is provided below.\n\nUsing an FFT length of 1024 and a hop length of 512, processing one second of audio at 22050Hz requires:\n\n| Layer                                                   | Memory High Watermark |\n| ------------------------------------------------------- | --------------------- |\n| `realbook.layers.signal.STFT`                           | 1,266,384 bytes       |\n| `realbook.layers.signal.Spectrogram`                    | 1,264,324 bytes       |\n| `realbook.layers.signal.MelSpectrogram`                 | 1,262,784 bytes       |\n| `realbook.layers.nnaudio.CQT`                           | 1,047,216 bytes       |\n\n### GPU Utilization Callbacks\n\nGPU resource utilization callbacks are included as part of the tensorboard extra installable.\nThese callbacks expect the program `nvidia-smi` to be installed. A program which is only\navailable on Linux. For example, on Ubuntu, you can install this program with\n\n```shell\napt-get update\napt-get install -y nvidia-utils-\u003cCUDA version number\u003e\n```\n\nWhere CUDA version number is the version of CUDA installed on your machine e.g. 450.\n\n## Setup Development (of `realbook`)\n\nCreate a new virtual environment with for your supported Python version and clone this repo. Within that virtualenv:\n\n```shell\n$ pip install -e .[dev]\n```\n\nThis will install development dependencies, followed by installing this package itself as [\"editable\"](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs).\n\n## Run Tests\n\nTests can be invoked in two ways: `pytest` and `tox`.\n\n### Run tests via `pytest`\n\nThis must be done within the virtualenv. Note that `pytest` will automatically pick up the config set in `tox.ini`. Comment it out if you want to skip coverage and/or ignore verbosity while iterating.\n\n```sh\n# for all tests\n(env) $ pytest tests/\n\n# for one module of tests\n(env) $ pytest tests/layers/signal.py\n\n# for one specific test\n(env) $ pytest tests/layers/signal.py::test_stft\n```\n\nMore info about pytest can be found [here](https://docs.pytest.org/en/latest/).\n\n### Run tests via `tox`\n\n`tox` should be run **outside** of a virtualenv. This is because `tox` will create separate virtual environments for each test environment. A test environment could be based on python versions, or could be specific to documentation, or whatever else. See `tox.ini` as an example for mulutiple different test environments including: running tests for Python, linting, and checking `MANIFEST.in` to assert a proper setup.\n\n```sh\n# run all environments\n$ tox\n\n# run a specific environment\n$ tox -e check-formatting\n$ tox -e py38\n```\n\n### Formatting files\n\nBefore committing PR's please format your files using tox as some of the formatting options realboook uses is different than the defaults of the [Black](https://black.readthedocs.io/en/stable/) formatter:\n\n```sh\ntox -e format\n```\n\nSee [tox's documentation](https://tox.readthedocs.io/en/latest/) for more information.\n\n## Copyright and License\nrealbook is Copyright 2022 Spotify AB.\n\nThis software is licensed under the Apache License, Version 2.0 (the \"Apache License\"). You may choose either license to govern your use of this software only upon the condition that you accept all of the terms of either the Apache License.\n\nYou may obtain a copy of the Apache License at:\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the Apache License or the GPL License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Apache License for the specific language governing permissions and limitations under the Apache License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspotify%2Frealbook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspotify%2Frealbook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspotify%2Frealbook/lists"}