{"id":13628143,"url":"https://github.com/chen0040/mxnet-audio","last_synced_at":"2025-08-15T22:31:07.333Z","repository":{"id":31835181,"uuid":"128719234","full_name":"chen0040/mxnet-audio","owner":"chen0040","description":"Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet","archived":false,"fork":false,"pushed_at":"2022-12-08T01:00:34.000Z","size":16449,"stargazers_count":54,"open_issues_count":4,"forks_count":15,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-08T18:46:21.475Z","etag":null,"topics":["audio-classification","music-recommendation","music-search","mxnet","song-recommender"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chen0040.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-04-09T05:32:15.000Z","updated_at":"2024-11-07T11:08:24.000Z","dependencies_parsed_at":"2023-01-14T20:01:01.687Z","dependency_job_id":null,"html_url":"https://github.com/chen0040/mxnet-audio","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fmxnet-audio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fmxnet-audio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fmxnet-audio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fmxnet-audio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chen0040","download_url":"https://codeload.github.com/chen0040/mxnet-audio/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229964387,"owners_count":18152034,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-classification","music-recommendation","music-search","mxnet","song-recommender"],"created_at":"2024-08-01T22:00:46.806Z","updated_at":"2024-12-16T13:15:44.515Z","avatar_url":"https://github.com/chen0040.png","language":"Python","funding_links":[],"categories":["\u003ca name=\"Speech\"\u003e\u003c/a\u003e4. Speech"],"sub_categories":["2.14 Misc"],"readme":"# mxnet-audio\n\nImplementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet\n\n\n# Principles\n\n* The classifier [ResNetV2AudioClassifier](mxnet_audio/library/resnet_v2.py) converts audio into mel-spectrogram and uses a simplified\n resnet DCnn architecture to classifier audios based on its associated labels. \n* The classifier [Cifar10AudioClassifier](mxnet_audio/library/cifar10.py) converts audio into mel-spectrogram and uses the cifar-10\nDCnn architecture to classifier audios based on its associated labels. \n\nThe classifiers differ from those used in image classification in that:\n* they use softrelu instead relu. \n* they have elongated max pooling shape (as the mel-spectrogram is elongated \"image\")\n* Dropout being added \n\n\n# Usage\n\n### Dependencies\n\nMake sure you have the right dependencies in your python environment by running:\n\n```bash\npip install -r requirements.txt\n```\n\n### Train a deep learning model\n\nThe audio training uses [Gtzan](http://opihi.cs.uvic.ca/sound/genres.tar.gz) data set to train the\nmusic classifier to recognize the genre of songs. \n\nThe training works by converting audio or song file into a mel-spectrogram which can be thought of\na 3-dimension tensor in a similar manner to an image. With the trained model, it is possible to build other interesting\napplication such as music recommendation, music search, audio2vec, etc.\n\nTo train on the Gtzan data set, run the following command:\n\n```bash\ncd demo\npython cifar10_train.py\n```\n\nThe [sample codes](demo/cifar10_train.py) below show how to train Cifar10AudioClassifier to classify songs\nbased on its genre labels:\n\n```python\nfrom mxnet_audio.library.cifar10 import Cifar10AudioClassifier\nfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_found\nimport mxnet\n\n\ndef load_audio_path_label_pairs(max_allowed_pairs=None):\n    download_gtzan_genres_if_not_found('./very_large_data/gtzan')\n    audio_paths = []\n    with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            audio_path = './very_large_data/' + line.strip()\n            audio_paths.append(audio_path)\n    pairs = []\n    with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            label = int(line)\n            if max_allowed_pairs is None or len(pairs) \u003c max_allowed_pairs:\n                pairs.append((audio_paths[len(pairs)], label))\n            else:\n                break\n    return pairs\n\n\ndef main():\n    audio_path_label_pairs = load_audio_path_label_pairs()\n    print('loaded: ', len(audio_path_label_pairs))\n\n    classifier = Cifar10AudioClassifier(model_ctx=mxnet.gpu(0), data_ctx=mxnet.gpu(0))\n    batch_size = 8\n    epochs = 100\n    history = classifier.fit(audio_path_label_pairs, model_dir_path='./models',\n                             batch_size=batch_size, epochs=epochs,\n                             checkpoint_interval=2)\n\n\nif __name__ == '__main__':\n    main()\n```\n\nAfter training, the trained models are saved to [demo/models](demo/models). \n\nTo test the trained Cifar10AudioClassifier model, run the following command:\n\n```bash\ncd demo\npython cifar10_predict.py\n```\n\n\n### Model Comparison\n\nBelow compares training quality of \n[ResNetV2AudioClassifier](mxnet_audio/library/resnet_v2.py) and [Cifar10AudioClassifier](mxnet_audio/library/cifar10.py):\n\n![training-comppare](demo/models/training-history-comparison.png)\n\n\n### Predict Music Genres\n\nThe [sample codes](demo/cifar10_predict.py) shows how to use the trained Cifar10AudioClassifier model to predict the\nmusic genres:\n\n```python\nfrom random import shuffle\n\nfrom mxnet_audio.library.cifar10 import Cifar10AudioClassifier\nfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_found, gtzan_labels\n\n\ndef load_audio_path_label_pairs(max_allowed_pairs=None):\n    download_gtzan_genres_if_not_found('./very_large_data/gtzan')\n    audio_paths = []\n    with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            audio_path = './very_large_data/' + line.strip()\n            audio_paths.append(audio_path)\n    pairs = []\n    with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            label = int(line)\n            if max_allowed_pairs is None or len(pairs) \u003c max_allowed_pairs:\n                pairs.append((audio_paths[len(pairs)], label))\n            else:\n                break\n    return pairs\n\n\ndef main():\n    audio_path_label_pairs = load_audio_path_label_pairs()\n    shuffle(audio_path_label_pairs)\n    print('loaded: ', len(audio_path_label_pairs))\n\n    classifier = Cifar10AudioClassifier()\n    classifier.load_model(model_dir_path='./models')\n\n    for i in range(0, 20):\n        audio_path, actual_label_id = audio_path_label_pairs[i]\n        predicted_label_id = classifier.predict_class(audio_path)\n        print(audio_path)\n        predicted_label = gtzan_labels[predicted_label_id]\n        actual_label = gtzan_labels[actual_label_id]\n        \n        print('predicted: ', predicted_label, 'actual: ', actual_label)\n\n\nif __name__ == '__main__':\n    main()\n\n```\n\n### Audio to Vector\n\nThe [sample codes](demo/cifar10_encode_audio.py) shows how to use the trained Cifar10AudioClassifier model to encode an\naudio file into a fixed-length numerical vector:\n\n```python\nfrom random import shuffle\n\nfrom mxnet_audio.library.cifar10 import Cifar10AudioClassifier\nfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_found\n\n\ndef load_audio_path_label_pairs(max_allowed_pairs=None):\n    download_gtzan_genres_if_not_found('./very_large_data/gtzan')\n    audio_paths = []\n    with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            audio_path = './very_large_data/' + line.strip()\n            audio_paths.append(audio_path)\n    pairs = []\n    with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            label = int(line)\n            if max_allowed_pairs is None or len(pairs) \u003c max_allowed_pairs:\n                pairs.append((audio_paths[len(pairs)], label))\n            else:\n                break\n    return pairs\n\n\ndef main():\n    audio_path_label_pairs = load_audio_path_label_pairs()\n    shuffle(audio_path_label_pairs)\n    print('loaded: ', len(audio_path_label_pairs))\n\n    classifier = Cifar10AudioClassifier()\n    classifier.load_model(model_dir_path='./models')\n\n    for i in range(0, 20):\n        audio_path, actual_label_id = audio_path_label_pairs[i]\n        audio2vec = classifier.encode_audio(audio_path)\n        print(audio_path)\n\n        print('audio-to-vec: ', audio2vec)\n\n\nif __name__ == '__main__':\n    main()\n\n```\n\n### Music Search Engine\n\nThe [sample codes](demo/cifar10_search_music.py) shows how to use Cifar10AudioSearch with the trained model to search for\nsimilar musics given a music file:\n\n```python\nfrom mxnet_audio.library.cifar10 import Cifar10AudioSearch\nfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_found\n\n\ndef load_audio_path_label_pairs(max_allowed_pairs=None):\n    download_gtzan_genres_if_not_found('./very_large_data/gtzan')\n    audio_paths = []\n    with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            audio_path = './very_large_data/' + line.strip()\n            audio_paths.append(audio_path)\n    pairs = []\n    with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            label = int(line)\n            if max_allowed_pairs is None or len(pairs) \u003c max_allowed_pairs:\n                pairs.append((audio_paths[len(pairs)], label))\n            else:\n                break\n    return pairs\n\n\ndef main():\n    search_engine = Cifar10AudioSearch()\n    search_engine.load_model(model_dir_path='./models')\n    for path, _ in load_audio_path_label_pairs():\n        search_engine.index_audio(path)\n\n    query_audio = './data/audio_samples/example.mp3'\n    search_result = search_engine.query(query_audio, top_k=10)\n\n    for idx, similar_audio in enumerate(search_result):\n        print('result #%s: %s' % (idx+1, similar_audio))\n\n\nif __name__ == '__main__':\n    main()\n\n```\n\n### Recommend Songs\n\nThe [sample codes](demo/cifar10_recommend_music.py) shows how to use Cifar10AudioRecommender with the trained model to \nrecommend songs based on user's listening history:\n\n### \n\n```python\nfrom random import shuffle\n\nfrom mxnet_audio.library.cifar10 import Cifar10AudioRecommender\nfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_found\n\n\ndef load_audio_path_label_pairs(max_allowed_pairs=None):\n    download_gtzan_genres_if_not_found('./very_large_data/gtzan')\n    audio_paths = []\n    with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            audio_path = './very_large_data/' + line.strip()\n            audio_paths.append(audio_path)\n    pairs = []\n    with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file:\n        for line in file:\n            label = int(line)\n            if max_allowed_pairs is None or len(pairs) \u003c max_allowed_pairs:\n                pairs.append((audio_paths[len(pairs)], label))\n            else:\n                break\n    return pairs\n\n\ndef main():\n    music_recommender = Cifar10AudioRecommender()\n    music_recommender.load_model(model_dir_path='./models')\n    music_archive = load_audio_path_label_pairs()\n    for path, _ in music_archive:\n        music_recommender.index_audio(path)\n\n    # create fake user history on musics listening to\n    shuffle(music_archive)\n    for i in range(30):\n        song_i_am_listening = music_archive[i]\n        music_recommender.track(song_i_am_listening)\n\n    for idx, similar_audio in enumerate(music_recommender.recommend(limits=10)):\n        print('result #%s: %s' % (idx+1, similar_audio))\n\n\nif __name__ == '__main__':\n    main()\n\n```\n\n\n\n# Note\n\n### On pre-processing\n\nTo pre-generate the mel-spectrograms from the audio files for classification, one can also first run the following scripts\nbefore starting training, which will make the training faster:\n\n```bash\ncd demo/utility\npython gtzan_loader.py\n```\n\n### audioread.NoBackend\n\nThe audio processing depends on librosa version 0.6 which depends on audioread.  \n\nIf you are on Windows and sees the error \"audioread.NoBackend\", go to [ffmpeg](https://ffmpeg.zeranoe.com/builds/)\nand download the shared linking build, unzip to a local directory and then add the bin folder of the \nffmpeg to the Windows $PATH environment variable. Restart your cmd or powershell, Python should now be\nable to locate the backend for audioread in librosa\n\n### Training with GPU\n\nNote that the default training scripts in the [demo](demo) folder use GPU for training, therefore, you must configure your\ngraphic card for this (or remove the \"model_ctx=mxnet.gpu(0)\" in the training scripts). \n\n\n* Step 1: Download and install the [CUDA® Toolkit 9.0](https://developer.nvidia.com/cuda-90-download-archive) (you should download CUDA® Toolkit 9.0)\n* Step 2: Download and unzip the [cuDNN 7.0.4 for CUDA@ Toolkit 9.0](https://developer.nvidia.com/cudnn) and add the\nbin folder of the unzipped directory to the $PATH of your Windows environment \n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchen0040%2Fmxnet-audio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchen0040%2Fmxnet-audio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchen0040%2Fmxnet-audio/lists"}