{"id":19426416,"url":"https://github.com/drscotthawley/ml-audio-start","last_synced_at":"2025-10-13T21:02:37.179Z","repository":{"id":50710756,"uuid":"225984712","full_name":"drscotthawley/ml-audio-start","owner":"drscotthawley","description":"Suggestions for those interested in developing audio applications of machine learning ","archived":false,"fork":false,"pushed_at":"2023-05-28T03:00:40.000Z","size":198,"stargazers_count":193,"open_issues_count":1,"forks_count":15,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-09-21T23:37:30.934Z","etag":null,"topics":["audio","career-development","machine-learning","students","tutorials"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/drscotthawley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-05T00:47:13.000Z","updated_at":"2025-09-02T12:09:36.000Z","dependencies_parsed_at":"2025-01-07T19:35:57.598Z","dependency_job_id":"29e19ff7-65e9-4c32-a5df-204ee2ab0063","html_url":"https://github.com/drscotthawley/ml-audio-start","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/drscotthawley/ml-audio-start","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fml-audio-start","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fml-audio-start/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fml-audio-start/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fml-audio-start/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/drscotthawley","download_url":"https://codeload.github.com/drscotthawley/ml-audio-start/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drscotthawley%2Fml-audio-start/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279016885,"owners_count":26085912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","career-development","machine-learning","students","tutorials"],"created_at":"2024-11-10T14:07:34.582Z","updated_at":"2025-10-13T21:02:37.158Z","avatar_url":"https://github.com/drscotthawley.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Getting Started in 'ML-Audio'\nSuggestions for students.\n\n## About\nAudio and acoustics students sometimes ask \"How do I get started learning  machine learning?\"  Not everyone gets their start in a major research environment, so this page is intended to serve as a series of suggestions for those who may find themselves \"on their own\" in their interest in this area.  It was started by @drscotthawley and [Ryan Miller](https://www.linkedin.com/in/ryan-miller-1aa8355a/), but is intended to serve and evolve with the community. \n\n- ***This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests.***  *(In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)*\n\n## Active Practictioners to Follow\nMany of us learn about and contribue to news of new developments, papers, conferences, grants, and networking opportunities via [Twitter](http://www.twitter.com).\n\n* [Audio ML Twitter list](https://twitter.com/i/lists/1260142677884121088) by Fabian-Robert Stöter (@faroit).  **\u003c-- Follow these people!**\n\n## Quick Quotes\n* [Justin Salomon](https://twitter.com/justin_salamon/status/1202016519720300545): \"Anyone working in ML, *anyone*, should be *obliged* to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)\"\n\n## Best Practices\n[\"Tips for Publishing Research Code\"](https://github.com/paperswithcode/releasing-research-code) courtesy of  [Papers with Code](https://paperswithcode.com/)\n\n## General Reference Information\n* [Machine Learning Glossary](https://ml-cheatsheet.readthedocs.io/en/latest/index.html) - A reference resource for common ML math topics, definitions, concepts, etc.\n* [Notes on Music Information Retreival](https://musicinformationretrieval.com/)\n\n## Online Training (ML+audio Specific)\n* [Valerio Velardo's \"Deep Learning for Audio\"](https://www.youtube.com/watch?v=fMqL5vckiU0\u0026list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf)  \n* [Jordi Pons' \"Deep neural networks for music\"](http://www.jordipons.me/apps/teaching-materials/) teaching materials\n\n## Online Training (More General, Courses)\n* Rebecca Fiebrink's [Machine Learning for Musicians and Artists](https://www.kadenze.com/courses/machine-learning-for-musicians-and-artists/info) on Kadenze -- No actual audio DSP, but great for concepts, interactive and fun (no math!)\n* [Advanced Digital Signal Processing](https://github.com/GuitarsAI/ADSP_Tutorials) series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with [videos](https://www.youtube.com/playlist?list=PL6QnpHKwdPYjbDezYkAE-sAQ5MOpYeqM6) and acommpanying Jupyter notebooks by [Renato Profeta](https://twitter.com/guitars_ai) \n* [Andrew Ng's ML Course](https://www.coursera.org/learn/machine-learning) on Coursera (Good all-around ML course)\n* [Fast.ai](https://www.fast.ai) (Can get you up and running fast)\n* [Neural Network Programming - Deep Learning with PyTorch](https://deeplizard.com/learn/video/v5cngxo4mIg). Learn how to code an image predictor neural network in Pytorch. Provides practical NN fundamentals\n* [Foundations of Machine Learning](https://bloomberg.github.io/foml/#home) taught by [David Rosenberg](https://bloomberg.github.io/foml/#people)\n\n## Tutorials\n* Andrew Trask's [\"Anyone Can Learn To Code an LSTM-RNN in Python\"](https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/) \n* [Machine Learning \u0026 Deep Learning Fundamentals](https://deeplizard.com/learn/video/gZmobeGL0Yg) (Good high level intro to ML concepts and how neural networks operate)\n\n## Talks (at conferences)\nTalks we found helpful/inspiring (and are hopefully still relevant). TODO: add more recent talks! \n* Paris Smaragdis at SANE 2015: [\"NMF? Neural Nets? It’s all the same...\"](https://www.youtube.com/watch?v=wfmpViJIjWw)\n* Ron Weiss at SANE 2015: [\"Training neural network acoustic models on waveforms\"](https://www.youtube.com/watch?v=sI_8EA0_ha8)\n* Jordi Pons at DLBCN 2018: [\"Training neural audio classifiers with few data\"](https://www.youtube.com/watch?v=AJ-XM07wSjg)\n* Sander Dieleman at ISMIR 2019: [\"Generating Music in the Waveform Domain\"](https://benanne.github.io/2020/03/24/audio-generation.html?fbclid=IwAR3nw8WLrT3ZVFrz4AsxNeoufUllqHQO4wFvH3bafqfjJDJArnm5s2VOxLM)\n\n## Key Papers / Codes\n(Let's try to list \"representative\" or \"landmark\" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) )\n* Keunwoo Choi et al, [\"Automatic tagging using deep convolutional neural networks\"](https://arxiv.org/abs/1606.00298) (ISMIR 2016 Best Paper)\n* [SampleRNN](https://arxiv.org/abs/1612.07837)\n* [WaveNet](https://arxiv.org/pdf/1609.03499.pdf)\n* [WaveRNN, i.e. \"Efficient Neural Audio Synthesis\"](https://arxiv.org/abs/1802.08435)\n* [GANSynth](https://magenta.tensorflow.org/gansynth)\n* [Wave-U-Net](https://arxiv.org/pdf/1806.03185.pdf)\n\n## Demos\n(Not sure if this only means \"deployed models you can play with in your browser,\" or if other things should count as demos)\n* Chris Donahue's [WaveGAN Demo](https://chrisdonahue.com/wavegan/)\n* Scott Hawley's [SignalTrain Demo](http://signaltrain.herokuapp.com/)\n* Neil Zeghidour and David Grangier's [Wavesplit](https://soundcloud.com/wavesplitdemo)\n* David Samuel, Aditya Ganeshan, and Jason Naradowsky's [Meta-TasNet](https://paperswithcode.com/paper/meta-learning-extractors-for-music-source)\n\n\n## Packages \u0026 Libraries\n* [awesome-python-scientific-audio](https://github.com/faroit/awesome-python-scientific-audio) Curated list of python software and packages related to scientific research in audio\n* [Librosa](https://librosa.github.io/librosa/) Great package for various kinds of audio analysis and manipulation\n* [Audiomentations, data augmentation for audio](https://github.com/iver56/audiomentations)\n* [tf.signal: signal processing for TensorFlow](https://www.tensorflow.org/api_docs/python/tf/signal)\n* [fastai_audio](https://github.com/mogwai/fastai_audio) (and [fastai2_audio](https://github.com/rbracco/fastai2_audio)), audio libraries for [Fast.ai](https://www.fast.ai) library/[MOOC](https://course.fast.ai). Primarily for image, text \u0026 tabular data processing, there are efforts to add audio. (Work in progress.)\n\n## Tools / GUIs / Gists\n* Jesse Engel's [gist to plot \"rainbowgrams\"](https://gist.github.com/jesseengel/e223622e255bd5b8c9130407397a0494)\n\n## Books\n\n* [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/) online book.  How drscotthawley first started reading.\n* [Open-Source Tools \u0026 Data for Music Source Separation](https://source-separation.github.io/tutorial/landing.html) by By Ethan Manilow, Prem Seetharaman, and Justin Salamon (2020). An online, interactive book with Python examples! \n* [List of Books Recommended by ML expert Juergen Schmidthuber](https://www.reddit.com/r/MachineLearning/comments/2xcyrl/i_am_jürgen_schmidhuber_ama/cp5c0py/) for students entering his lab. (Probably pretty demanding material.)\n\n\n## Computer-Related Topics\nPython: \n* [learnpython.org](https://www.learnpython.org/)\n* Python notebooks for [fundamentals of music processing](https://www.audiolabs-erlangen.de/resources/MIR/FMP/C0/C0.html)\n\n\n## Signal Processing Topics\n* [Advanced Digital Signal Processing](https://github.com/GuitarsAI/ADSP_Tutorials) series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with [videos](https://www.youtube.com/playlist?list=PL6QnpHKwdPYjbDezYkAE-sAQ5MOpYeqM6) and acommpanying Jupyter notebooks by [Renato Profeta](https://twitter.com/guitars_ai) \n* [An Interactive Introduction to Fourier Transforms](http://www.jezzamon.com/fourier/) by Jez Swanson. (so good!)\n* Yuge Shi's [\"Gaussian Processes, Not Quite for Dummies\"](https://thegradient.pub/gaussian-process-not-quite-for-dummies/) (GPs get used for *much* more than signal processing, but are also promising there; feel free to suggest a different category for this content)\n\n\n## Statistics / Math Topics\n* Gradient Descent \n  * https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html ,\n  * https://en.wikipedia.org/wiki/Gradient_descent ,\n  * https://www.kdnuggets.com/2017/04/simple-understand-gradient-descent-algorithm.html,\n  * [\"Following Gravity\"](https://drscotthawley.github.io/Following-Gravity/) by @drscotthawley\n* Principal Component Analysis:  [\"PCA From Scratch\"](https://drscotthawley.github.io/PCA-From-Scratch/) by @drscotthawley\n\n\n## Datasets (raw audio)\nOne finds that many supposed \"audio datasets\" are really only features or even just metadata!  Here are some \"raw audio\" datasets:\n* [NSynth](https://magenta.tensorflow.org/datasets/nsynth) Musical Instruments\n* [GTZAN Genre Collection](http://marsyas.info/downloads/datasets.html)  (Note [critique by Bob Sturm](https://arxiv.org/abs/1306.1461))\n* [Fraunhofer IDMT Guitar/Bass Effects](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/audio_effects.html)\n* [Urban Sound Dataset](serv.cusp.nyu.edu/projects/urbansounddataset)\n* [FreeSound Annotator](https://annotator.freesound.org/) (formerly FreeSound Datasets)\n* [FSD50K dataset (from FreeSound)](https://annotator.freesound.org/fsd/release/FSD50K/)\n* [AudioSet](https://research.google.com/audioset)\n* [Birdvox-Full-Night](https://wp.nyu.edu/birdvox/birdvox-full-night/)\n* [SignalTrain LA2A](https://zenodo.org/record/3348083)\n* [Kaggle Heartbeat Sounds](https://www.kaggle.com/kinguistics/heartbeat-sounds)\n* [Electric Guitars](http://www.rptecnologias.com/research/guitar-dataset) by Renato Profeta (\"Guitars AI\") of Fraunhofer IDMT\n* Search for other [audio datasets at Kaggle](https://www.kaggle.com/datasets?tags=16072-audio+data) (list)\n* [A collated list of MIR datasets can be found here](https://github.com/ismir/mir-datasets), which is the source for [audiocontentanalysis.org](https://www.audiocontentanalysis.org/data-sets/),but only some are raw audio\n* Another [list of \"audio datasets\" by Christopher Dossman](https://towardsdatascience.com/a-data-lakes-worth-of-audio-datasets-b45b88cd4ad)\n* ...your dataset here...\n\n## DIY Audio Dataset-Making:\n\n(Inspired by [Nathan Sepulveda](https://twitter.com/nateysepy))\n\n### Searchable resources:\n* FreeSound: https://freesound.org/\n\n* Internet Archive audio: https://archive.org/details/audio\n\n* https://search.audioburst.com/ - speech only. you're searching transcripts.\n\n* https://www.mp3juices.cc/ - searches YouTube, lets you download MP3 by pressing a button for each one.\n\n* https://sounds.com/ from National Instruments, but it won't be free!\n\n* https://www.findsounds.com/ meh.\n\n### Scrapers\n\n* https://github.com/carlthome/audio-scraper: \"Scrape audio from YouTube and SoundCloud with a simple command-line interface\", e.g. `audio-scraper \"acoustic guitar\"`. It's 5 years old, but it still works in 2021!\n\n### Other DIY Audio Dataset Tricks\n\n* Depending on your application, you might be able to get away with using *samples* produced by virtual instruments (i.e. MIDI). \n* If you don't have a lot of labels or targets, you can still pretrain your represenations \u0026 weights using autoregressive predictions (even for different audio domains) -- this amounts to doing your own Transfer Learning even without a pretrained model. (This strategy was used by FastAI's text language model system \"ULMFit\")\n\n### Cleaning Audio Datasets?\nWith images, you can quickly look at many of them almost at once. \nWith audio, you have to listen to *each one*.  But take a cue from fast.ai's Jeremy Howard: \n\n\u003e \"It's easier to clean a dataset once you've trained a model.\"  \n\nSo we can train the model, and then look for high-loss / low-confidence ratings for certain samples: those should be the ones we should check *first*. \n\nCould even start with someone else's pretrained model and look for anomalies when running inference on your data, i.e. similar inputs *should* yield similar outputs, so if they don't...?\n\n#### Length of audio?\nYou might be able to find short samples of exactly what you need, but it's also common to have the desired audio be just a part of a much longer clip.  How to segment it and keep just what you want? You could use other people's models, e.g. for detecting speech or guitars:\n* Delete what you don't want: Audio you might get off YouTube needs to be segmented in order to make it useful -- the stuff you don't want needs to be cut out.  If you're looking for musical audio, you could use a *speech detector* (there are lots of them available) and then *delete or ignore all the speech*. \n* What if all you want is the guitar solo, not the whole song?  Someone else's pretrained model for detecting guitars could help you.\n\n#### Are we classifying or regressing?\n\nStandards are a lot higher for regression systems, e.g. phase errors / time alignment issues probably won't matter to a classifier, but might for a regression model, depending on the goal. What about clipping, distortion,...?  This will depend on what you're trying to do. \n\n\n## \"Major\" ML-Audio Research/Development Groups\n#### Universities:\n(or, \"Where should I apply for grad school?\")\n* QMUL (London)\n* UPF (Barcelona)\n* CRRMA (Stanford, San Francisco)\n* IRCAM (Paris)\n* NYU (New York)\n\n#### Industry:\n(\"Where can I get an internship/job\"?)\n* [Google Magenta](https://magenta.tensorflow.org)\n* [Google Perception](https://research.google/teams/perception/) ([speech publications](https://research.google/pubs/?team=perception\u0026area=speech-processing))\n* Adobe\n* Spotify\n* Increasingly, everywhere. ;-)\n\n## Conferences\n(\"Which conference(s) should I go to?\" -- asked by student on the day this doc began)\n#### Audio-Specific\n**Long list of Music Technology specific conferences [https://conferences.smcnetwork.org/](https://conferences.smcnetwork.org/) - which is references from here [https://github.com/MTG/conferences](https://github.com/MTG/conferences)\n* Audio Engineering Society (AES)\n* ASA\n* Digital Audio Effects (DAFx)\n* ICASSP\n* ISMIR\n* SANE\n* Web Audio Conference (WAC)\n* SMC\n* LVA/ICA\n* Audio Mostly\n* WIMP\n* DCASE\n* CSMC\n* MuMe\n* ICMC\n* CMMR\n* IBAC\n* MLSP\n* Interspeech\n* FMA\n\n#### General ML\n* ICLR\n* ICML\n* NeurIPS\n* IJCNN\n* \n\n## Journals\n(\"Where can I get published?\")\n\n* [IEEE TASLP](https://signalprocessingsociety.org/publications-resources/ieeeacm-transactions-audio-speech-and-language-processing)\n* [JAES](http://www.aes.org/journal/)\n* [CMJ](http://computermusicjournal.org/)\n* [JNMR](https://www.tandfonline.com/toc/nnmr20/current)\n* [TISMIR](https://transactions.ismir.net/)\n* [JASA](https://asa.scitation.org/journal/jas)\n* EURASIP Journal on Audio Speech and Music Processing \n\nIn addition, in machine learning specifically, the tendency is for conference papers to be peer-reviewed and to \"count\" as journal publications.\n\n## Competitions / Benchmarks\nSome are yearly, some may be defunct but still interesting.\n* [MIREX](https://www.music-ir.org/mirex/wiki/MIREX_HOME)\n* [SiSEC](https://sisec18.unmix.app/#/) (Signal Separation Evaluation Campaign)\n* [Kaggle Heartbeat Sounds](https://www.kaggle.com/kinguistics/heartbeat-sounds)\n\n\n## Contributors\n[Ryan Miller](https://www.linkedin.com/in/ryan-miller-1aa8355a/), RJ Skerry-Ryan, Dave Moffat, Jesse Engel, Iver Jordal\n\n*If you want your name listed here, you may. ;-)* \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrscotthawley%2Fml-audio-start","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrscotthawley%2Fml-audio-start","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrscotthawley%2Fml-audio-start/lists"}