{"id":21332502,"url":"https://github.com/mdeff/dlaudio","last_synced_at":"2025-07-12T10:31:36.469Z","repository":{"id":31921318,"uuid":"35490623","full_name":"mdeff/dlaudio","owner":"mdeff","description":"Master thesis: Structured Auto-Encoder with application to Music Genre Recognition (code)","archived":false,"fork":false,"pushed_at":"2020-04-18T14:25:34.000Z","size":89,"stargazers_count":15,"open_issues_count":0,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-05-01T15:04:03.180Z","etag":null,"topics":["auto-encoders","deep-learning","graphs","manifold-learning","music-information-retrieval","sparse"],"latest_commit_sha":null,"homepage":"https://infoscience.epfl.ch/record/218019","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mdeff.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-12T13:47:05.000Z","updated_at":"2023-07-25T13:56:12.000Z","dependencies_parsed_at":"2022-08-24T14:22:26.472Z","dependency_job_id":null,"html_url":"https://github.com/mdeff/dlaudio","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdeff%2Fdlaudio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdeff%2Fdlaudio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdeff%2Fdlaudio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdeff%2Fdlaudio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mdeff","download_url":"https://codeload.github.com/mdeff/dlaudio/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225814861,"owners_count":17528295,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-encoders","deep-learning","graphs","manifold-learning","music-information-retrieval","sparse"],"created_at":"2024-11-21T22:51:42.875Z","updated_at":"2024-11-21T22:51:43.576Z","avatar_url":"https://github.com/mdeff.png","language":"Jupyter Notebook","readme":"# Master thesis: Structured Auto-Encoder with application to Music Genre Recognition\n\n[Michaël Defferrard](https://deff.ch).\nSupervized by [Xavier Bresson](https://www.ntu.edu.sg/home/xbresson),\n[Johan Paratte](https://www.linkedin.com/in/johan-paratte-a2070039),\n[Pierre Vandergheynst](https://people.epfl.ch/pierre.vandergheynst).\n\n\u003e In this work, we present a technique that learns discriminative audio\n\u003e features for Music Information Retrieval (MIR). The novelty of the proposed\n\u003e technique is to design auto-encoders that make use of data structures to\n\u003e learn enhanced sparse data representations. The data structure is borrowed\n\u003e from the Manifold Learning field, that is data are supposed to be sampled\n\u003e from smooth manifolds, which are here represented by graphs of proximities of\n\u003e the input data. As a consequence, the proposed auto-encoders finds sparse\n\u003e data representations that are quite robust w.r.t. perturbations. The model is\n\u003e formulated as a non-convex optimization problem. However, it can be\n\u003e decomposed into iterative sub-optimization problems that are convex and for\n\u003e which well-posed iterative schemes are provided in the context of the Fast\n\u003e Iterative Shrinkage-Thresholding (FISTA) framework. Our numerical experiments\n\u003e show two main results. Firstly, our graph-based auto-encoders improve the\n\u003e classification accuracy by 2% over the auto-encoders without graph structure\n\u003e for the popular GTZAN music dataset. Secondly, our model is significantly\n\u003e more robust as it is 8% more accurate than the standard model in the presence\n\u003e of 10% of perturbations.\n\n## Content\n\nThis repository contains the code developed during my master thesis.\n\nRelated resources:\n* Report: \u003chttps://infoscience.epfl.ch/record/218019\u003e\n* Slides: \u003chttps://deff.ch/dlaudio_slides.pdf\u003e\n* Code: \u003chttps://github.com/mdeff/dlaudio\u003e\n* Experimental results: \u003chttp://nbviewer.jupyter.org/github/mdeff/dlaudio_results\u003e\n* Latex sources of the report: \u003chttps://github.com/mdeff/dlaudio_report\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdeff%2Fdlaudio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmdeff%2Fdlaudio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdeff%2Fdlaudio/lists"}