{"id":13861476,"url":"https://github.com/drscotthawley/panotti","last_synced_at":"2025-04-24T17:31:06.382Z","repository":{"id":55308100,"uuid":"85496265","full_name":"drscotthawley/panotti","owner":"drscotthawley","description":"A multi-channel neural network audio classifier using Keras","archived":false,"fork":false,"pushed_at":"2021-07-29T04:58:20.000Z","size":1457,"stargazers_count":269,"open_issues_count":17,"forks_count":69,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-13T20:44:09.323Z","etag":null,"topics":["audio-classification","convolutional-neural-networks","keras","music-tagging","neural-network","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/drscotthawley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-03-19T17:43:58.000Z","updated_at":"2025-04-04T11:25:42.000Z","dependencies_parsed_at":"2022-08-14T20:22:20.090Z","dependency_job_id":null,"html_url":"https://github.com/drscotthawley/panotti","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fpanotti","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fpanotti/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fpanotti/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fpanotti/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/drscotthawley","download_url":"https://codeload.github.com/drscotthawley/panotti/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250674268,"owners_count":21469188,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-classification","convolutional-neural-networks","keras","music-tagging","neural-network","tensorflow"],"created_at":"2024-08-05T06:01:23.260Z","updated_at":"2025-04-24T17:31:05.833Z","avatar_url":"https://github.com/drscotthawley.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1275605.svg)](https://doi.org/10.5281/zenodo.1275605)\n\n# Panotti: A Convolutional Neural Network classifier for multichannel audio waveforms\n\n\u003cimg src=\"https://upload.wikimedia.org/wikipedia/commons/a/af/Panoteanen.jpg\" alt=\"Panotti image\" height=\"200\"\u003e\n\u003ci\u003e(Image of large-eared Panotti people, Wikipedia)\u003c/i\u003e\u003cbr\u003e\n\n\nThis is a version of the [audio-classifier-keras-cnn](https://github.com/drscotthawley/audio-classifier-keras-cnn) repo (which is a hack of **@keunwoochoi**'s compact_cnn code).  Difference with Panotti is, it has been generalized beyond mono audio, to include stereo or even more \"channels.\"  And it's undergone many refinements.\n\n*NOTE: The  majority of issues people seem to have in using this utility, stem from inconsistencies in their audio datasets. This is to the point where I hesitate to delve into such reports. I suggest trying the binaural audio example and see if your same problems arise.* -SH\n\n\n## Installation \n\n*UPDATE June 9, 2020: There is an updated version of Panotti that works with TensorFlow 2, currently in the panotti branch called 'tf2'.  I'm not ready to merge that branch with master until [Vibrary](https://github.com/artandlogic/vibrary) is also updated for TF2.*\n\n### Preface: Requirements\nProbably Mac OS X or Linux. (Windows users: I have no experience to offer you.)\nNot everything is required, here's a overview:\n\n* Required: \n\t* Python 3.5\n\t* numpy\n\t* keras\n\t* tensorflow \n\t* librosa\n\t* matplotlib\n\t* h5py\n* Optional: \n\t* sox (\"Sound eXchange\": command-line utility for examples/binaural. Install via \"apt-get install sox\")\n\t* pygame (for exampes/headgames.py)\n\t* For sorting-hat: flask, kivy kivy-garden\n\t\n...the `requirements.txt` file method is going to try to install both required and optional packages.\n\n### Installation:\n`git clone https://github.com/drscotthawley/panotti.git`\n\n`cd panotti`\n\n`pip install -r requirements.txt`\n\n\n## Demo\nI'm not shipping this with any audio but you can generate some for the 'fake binaural' example (requires sox):\n\n    cd examples\n    ./binaural_setup.sh\n    cd binaural\n    ../../preprocess_data.py --dur=2 --clean\n    ../../train_network.py\n\n\n## Quick Start\n* Make a folder called `Samples/` and inside it create sub-folders with the names of each category you want to train on. Place your audio files in these sub-folders accordingly. \n* run `python preprocess_data.py`\n* run `python train_network.py`\n* run `python eval_network.py`  - This applies the trained network to the testing dataset and gives you accuracy reports.\n\n\n\n## Data Preparation\n### Data organization:\nSound files should go into a directory called `Samples/` that is local off wherever the scripts are being run.  Within `Samples`, you should have subdirectories which divide up the various classes.\n\nExample: for the [IDMT-SMT-Audio-Effects database](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/audio_effects.html), using their monophonic guitar audio clips...\n\n    $ ls -F Samples/\n    Chorus/  Distortion/  EQ/  FeedbackDelay/  Flanger/   NoFX/  Overdrive/  Phaser/  Reverb/  SlapbackDelay/\n    Tremolo/  Vibrato/\n    $\n(Within each subdirectory of `Samples`, there are loads of .wav or .mp3 files that correspond to each of those classes.)\n\n*\"Is there any sample data that comes with this repo?\"*  Not the data itself, but check out the `examples/` directory. ;-)\n\n\n### Data augmentation \u0026 preprocessing:\n\n#### (Optional) Augmentation:\n\nThe \"augmentation\" will [vary the speed, pitch, dynamics, etc.](https://bmcfee.github.io/papers/ismir2015_augmentation.pdf) of the sound files (\"data\") to try to \"bootstrap\" some extra data with which to train.  If you want to augment, then you'll run it as\n\n`$ python augment_data.py \u003cN\u003e  Samples/*/*`\n\nwhere *N* is how many augmented copies of each file you want it to create.  It will place all of these in the Samples/ directory with some kind of \"_augX\" appended to the filename (where X just counts the number of the augmented data files).\nFor augmentation it's assumed that all data files have the same length \u0026 sample rate.\n\n#### (Required) Preprocessing:\nWhen you preprocess, the data-loading will go *much* faster (e.g., 100 times faster) the next time you try to train the network. So, preprocess.\n\nPreprocessing will pad the files with silence to fit the length to the length of the longest file and the number of channels to the file with the most channels. It will then generate mel-spectrograms of all data files, and create a \"new version\" of `Samples/` called `Preproc/`.\n\nIt will do an 80-20 split of the dataset, so within `Preproc/` will be the subdirectories `Train/` and `Test/`. These will have the same subdirectory names as `Samples/`, but all the .wav and .mp3 files will have \".npy\" on the end now.  Datafiles will be randomly assigned to `Train/` or `Test/`, and there they shall remain.\n\nTo do the preprocessing you just run\n\n`$ python preprocess_data.py`\n\n\n## Training \u0026 Evaluating the Network\n`$ python train_network.py`\nThat's all you need.  (I should add command-line arguments to adjust the layer size and number of layers...later.)\n\nIt will perform an 80-20 split of training vs. testing data, and give you some validation scores along the way.  \n\nIt's set to run for 2000 epochs, feel free to shorten that or just ^C out at some point.  It automatically does checkpointing by saving(/loading) the network weights via a new file `weights.hdf5`, so you can interrupt \u0026 resume the training if you need to.\n\nAfter training, more diagnostics -- ROC curves, AUC -- can be obtained by running\n\n`$ python eval_network.py`\n\n*(Changing the `batch_size` variable between training and evaluation may not be a good idea.  It will probably screw up the Batch Normalization...but maybe you'll get luck.)*\n\n\n\n## Results\nOn the [IDMT Audio Effects Database](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/audio_effects.html) using the 20,000 monophonic guitar samples across 12 effects classes, this code achieved 99.7% accuracy and an AUC of 0.9999. Specifically, 11 mistakes were made out of about 4000 testing examples; 6 of those were for the 'Phaser' effect, 3 were for EQ, a couple elsewhere, and most of the classes had zero mistakes. (No augmentation was used.)\n\n\u003ca href=\"url\"\u003e\u003cimg src=\"http://i.imgur.com/nWHqAWy.png\" width=\"400\"\u003e\u003c/a\u003e\n\nThis accuracy is comparable to the [original 2010 study by Stein et al.](http://www.ece.rochester.edu/courses/ECE472/resources/Papers/Stein_2010.pdf), who used a Support Vector Machine.\n\nThis was achieved by running for 10 hours on [our workstation with an NVIDIA GTX1080 GPU](https://pcpartpicker.com/b/4xLD4D). \n\n## Extra Tricks\n- We have multi-GPU training.  The saving \u0026 loading means we get warning messages from Keras. Ignore those. It's because if we compile both the parallel model and its\nserial counterpart, it breaks things. So we leave the serial one uncompiled and that's the one we have to save. I regard this problem as a 'bug' in the Keras multi-gpu protocols.\n- Speaking of saving \u0026 loading, we encode the names of the output classes in the weights.hdf5 file using a HDF5 attribute 'class_names'.\n\n\n\n\u003chr\u003e\n-- [@drscotthawley](https://drscotthawley.github.io)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrscotthawley%2Fpanotti","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrscotthawley%2Fpanotti","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrscotthawley%2Fpanotti/lists"}