{"id":18512012,"url":"https://github.com/eric-canas/drums-app","last_synced_at":"2025-04-09T05:33:03.834Z","repository":{"id":65421249,"uuid":"374773445","full_name":"Eric-Canas/Drums-app","owner":"Eric-Canas","description":"Play drums in your browser with your webcam","archived":false,"fork":false,"pushed_at":"2023-04-10T20:13:24.000Z","size":87212,"stargazers_count":10,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-09T04:29:56.273Z","etag":null,"topics":["browser-game","computer-vision","deep-learning","keras","music-generation","neural-network","tensorflow-js"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Eric-Canas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-07T19:07:19.000Z","updated_at":"2025-02-06T19:56:20.000Z","dependencies_parsed_at":"2023-02-12T18:01:04.747Z","dependency_job_id":null,"html_url":"https://github.com/Eric-Canas/Drums-app","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eric-Canas%2FDrums-app","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eric-Canas%2FDrums-app/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eric-Canas%2FDrums-app/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eric-Canas%2FDrums-app/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Eric-Canas","download_url":"https://codeload.github.com/Eric-Canas/Drums-app/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247987057,"owners_count":21028891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["browser-game","computer-vision","deep-learning","keras","music-generation","neural-network","tensorflow-js"],"created_at":"2024-11-06T15:31:45.119Z","updated_at":"2025-04-09T05:32:58.758Z","avatar_url":"https://github.com/Eric-Canas.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003ca href=\"https://drums-app.com/\" target=\"_blank\"\u003eDrums-app\u003c/a\u003e\nPlay Drums in your Browser.\n\n\u003ca href=\"www.drums-app.com\" target=\"_blank\"\u003eDrums-app\u003c/a\u003e allows you to simulate in your browser any percussion instrument, by using only your Webcam. All machine learning models run locally, so no user information is sent to the server.  \n\nCheck the demo at \u003ca href=\"https://drums-app.com/\" target=\"_blank\"\u003edrums-app.com\u003c/a\u003e\n\n### Quick Start\n\nSimply run the \u003ca href=\"./src/index.html\" target=\"_blank\"\u003e src/index.html \u003c/a\u003e in server mode, or enter at \u003ca href=\"https://drums-app.com/\"\u003edrums-app.com\u003c/a\u003e.  \n\nSelect **Set Template** for building your own drums template by uploading some images and attaching your sounds to them.\n\n\u003cimg alt=\"Set Template\" title=\"Set Template\" src=\"./documentation/SetTemplate.gif\" height=350\u003e\n\nTurn on your **webcam** and enjoy it!\n\n\u003cimg alt=\"Play!\" title=\"Play!\" src=\"./documentation/DrumsPlay.gif\" height=350\u003e\n*\u003ci\u003eNo cats were harmed during this recording\u003c/i\u003e\n\n# Implementation Details\n\nThis web application is built with \u003ca href=\"https://google.github.io/mediapipe/getting_started/javascript.html\" target=\"_blank\"\u003e\u003cimg alt=\"MediaPipe\" title=\"MediaPipe\" src=\"https://mediapipe.dev/images/mediapipe_small.png\" height=16\u003e\u003c/a\u003e and \u003ca href=https://www.tensorflow.org/js target=\"_blank\"\u003e\u003cimg alt=\"TensorFlow.js\" title=\"TensorFlow.js\" src=\"https://img.shields.io/static/v1?label=\u0026message=Tensorflow.js\u0026color=FF6000\u0026logo=TensorFlow\u0026logoColor=FFFFFF\" height=18\u003e\u003c/a\u003e.  \nThe pipeline uses two Machine Learning models.\n\u003cul\u003e\n  \u003cli\u003e \u003ca href=\"https://google.github.io/mediapipe/solutions/hands#javascript-solution-api\" target=\"_blank\"\u003e\u003cb\u003eHands Model\u003c/b\u003e\u003c/a\u003e: A Computer Vision model offered by \u003ca href=\"https://google.github.io/mediapipe/getting_started/javascript.html\" target=\"_blank\"\u003e\u003cimg alt=\"MediaPipe\" title=\"MediaPipe\" src=\"https://mediapipe.dev/images/mediapipe_small.png\" height=16\u003e\u003c/a\u003e for detecting 21 landmarks for each hand (x, y, z).\u003c/li\u003e\n  \u003cli\u003e \u003cb\u003e\u003ca href=\"./src/DrumsApp/js/Resources/HitNetJS\" target=\"_blank\"\u003eHitNet\u003c/a\u003e\u003c/b\u003e: An LSTM model that has been developed in \u003ca href=\"https://keras.io/\" target=\"_blank\"\u003e\u003cimg alt=\"Keras\" title=\"Keras\" src=\"https://img.shields.io/badge/Keras-%23D00000.svg?style=flat\u0026logo=Keras\u0026logoColor=white\" height=20\u003e\u003c/a\u003e for this application and then converted to \u003ca href=https://www.tensorflow.org/js target=\"_blank\"\u003e\u003cimg alt=\"TensorFlow.js\" title=\"TensorFlow.js\" src=\"https://img.shields.io/static/v1?label=\u0026message=Tensorflow.js\u0026color=FF6000\u0026logo=TensorFlow\u0026logoColor=FFFFFF\" height=18\u003e\u003c/a\u003e. It takes the last N positions of a hand and predicts the probability of this sequence to correspond with a Hit.\u003c/li\u003e\n\u003c/ul\u003e\n\n\n## HitNet Details\n\n### Building the Dataset\n\nThe dataset used for training has been built in the following way:\n\u003col\u003e\n  \u003cli\u003e A representative landmark (\u003ci\u003eIndex Finger Dip [\u003cb\u003eY\u003c/b\u003e]\u003c/i\u003e) of each detected hand is plotted in an interactive chart, using \u003ca href=https://www.chartjs.org/ target=\"_blank\"\u003e\u003cimg alt=\"Chart.js\" title=\"Chart.js\" src=https://img.shields.io/static/v1?label=\u0026message=Chart.js\u0026color=FF6384\u0026logo=chart.js\u0026logoColor=FFFFFF\u003e\u003c/a\u003e.\u003c/li\u003e\n  \u003cli\u003e Any time that a key is pressed, a grey mark is plotted on the same chart. \u003c/li\u003e\n  \u003cli\u003e I start to play drums with one hand while pressing a key on the keyboard (with the other hand) every time that I beat an imaginary drum. [\u003cb\u003eGif Left\u003c/b\u003e]\u003c/li\u003e\n  \u003cli\u003e I use the mouse for selecting in the chart those points that should be considered as a hit. [\u003cb\u003eGif Right\u003c/b\u003e]\u003c/li\u003e\n  \u003cli\u003e When click the \"\u003cbutton\u003eSave Dataset\u003c/button\u003e\" button, all hand positions together with their correspondent tags (\u003cb\u003e1\u003c/b\u003e if the frame was considered a hit or \u003cb\u003e0\u003c/b\u003e otherwise) are downloaded as a \u003ca href=\"./src/DrumsApp/python/dataset\"\u003e \u003ci\u003eJSON\u003c/i\u003e file \u003c/a\u003e.\u003c/li\u003e\n\u003c/ol\u003e\n  \u003cimg alt=\"DatasetGeneration\" title=\"DatasetGeneration\" src=\"/documentation/DatasetGeneration.gif\" height=350 align=left\u003e\n  \u003cimg alt=\"DataTag\" title=\"DataTag\" src=\"/documentation/DataTag.gif\" height=350\u003e\n  \n### Defining the Architecture\n\nHitNet has been built in \u003ca href=https://www.python.org/ target=\"_blank\"\u003e\u003cimg alt=\"Python\" title=\"Python\" src=\"https://img.shields.io/static/v1?label=\u0026message=Python\u0026color=3C78A9\u0026logo=python\u0026logoColor=FFFFFF\" height=18\u003e\u003c/a\u003e, using \u003ca href=\"https://keras.io/\" target=\"_blank\"\u003e\u003cimg alt=\"Keras\" title=\"Keras\" src=\"https://img.shields.io/badge/Keras-%23D00000.svg?style=flat\u0026logo=Keras\u0026logoColor=white\" height=20\u003e\u003c/a\u003e, and then exported to \u003ca href=https://www.tensorflow.org/js target=\"_blank\"\u003e\u003cimg alt=\"TensorFlow.js\" title=\"TensorFlow.js\" src=\"https://img.shields.io/static/v1?label=\u0026message=Tensorflow.js\u0026color=FF6000\u0026logo=TensorFlow\u0026logoColor=FFFFFF\" height=18\u003e\u003c/a\u003e. In order to not produce any dissonance between the hit on the drum and the produced sound **HitNet** must run as fast as possible, for this reason it implements an extremely simple architecture.\n\n\u003cimg alt=\"HitNet Architecture\" title=\"HitNet Architecture\" src=\"/documentation/HitNetArchitecture.png\"\u003e\n\nIt takes as input the 4 last detections of a hand [\u003ci\u003eFlatten version of its 21 landmarks (x,y,z)\u003c/i\u003e] and outputs the probability of this sequence to correspond with a hit. It is only composed by an \u003ci\u003eLSTM\u003c/i\u003e layer followed by a \u003ci\u003eReLU\u003c/i\u003e activation (using dropout with \u003ci\u003ep = 0.25\u003c/i\u003e) and a \u003ci\u003eDense\u003c/i\u003e output layer with only 1 unit, followed by a \u003ci\u003esigmoid\u003c/i\u003e activation.\n\n### Training the model\n\nHitNet has been trained in \u003ca href=\"https://keras.io/\" target=\"_blank\"\u003e\u003cimg alt=\"Keras\" title=\"Keras\" src=\"https://img.shields.io/badge/Keras-%23D00000.svg?style=flat\u0026logo=Keras\u0026logoColor=white\" height=20\u003e\u003c/a\u003e, using the following parameterization:\n\u003cul\u003e\n  \u003cli\u003e \u003cb\u003eEpochs\u003c/b\u003e: 3000.\u003c/li\u003e\n  \u003cli\u003e \u003cb\u003eOptimizer\u003c/b\u003e: \u003ca href=\"https://deepai.org/machine-learning-glossary-and-terms/adam-machine-learning\" target=\"_blank\"\u003eAdam\u003c/a\u003e.\u003c/li\u003e\n  \u003cli\u003e \u003cb\u003eLoss\u003c/b\u003e: Weighted \u003ca href=\"https://sparrow.dev/binary-cross-entropy/\" target=\"_blank\"\u003eBinary Cross Entropy\u003c/a\u003e*.\u003c/li\u003e\n  \u003cli\u003e \u003cb\u003eTraining/Val Split\u003c/b\u003e: 0.85-0.15.\u003c/li\u003e\n  \u003cli\u003e \u003cb\u003eData Augmentation\u003c/b\u003e: \u003c/li\u003e\n    \u003cul\u003e\n      \u003cli\u003e \u003cb\u003eMirroring\u003c/b\u003e: X axis.\u003c/li\u003e\n      \u003cli\u003e \u003ci\u003e\u003cb\u003eShift\u003c/b\u003e: Shift applied in block for the whole sequence.\u003c/i\u003e \u003c/li\u003e\n      \u003cul\u003e\n      \u003cli\u003e \u003cb\u003eX Shift\u003c/b\u003e: ±0.3.\u003c/li\u003e\n      \u003cli\u003e \u003cb\u003eY Shift\u003c/b\u003e: ±0.3.\u003c/li\u003e\n      \u003cli\u003e \u003cb\u003eZ Shift\u003c/b\u003e: ±0.5.\u003c/li\u003e\n      \u003c/ul\u003e\n      \u003cli\u003e \u003ci\u003e\u003cb\u003eInterframe Noise\u003c/b\u003e: Small shift applied independently to each frame of the sequence.\u003c/i\u003e \u003c/li\u003e\n      \u003cul\u003e\n      \u003cli\u003e \u003cb\u003eInterframe Noise X\u003c/b\u003e: ±0.01. \u003c/li\u003e\n      \u003cli\u003e \u003cb\u003eInterframe Noise Y\u003c/b\u003e: ±0.01. \u003c/li\u003e\n      \u003cli\u003e \u003cb\u003eInterframe Noise Z\u003c/b\u003e: ±0.0025. \u003c/li\u003e\n      \u003c/ul\u003e\n      \u003cli\u003e \u003ci\u003e\u003cb\u003eIntraframe Noise\u003c/b\u003e: Extremely small shift applied independently to each single part of a hand.\u003c/i\u003e \u003c/li\u003e\n      \u003cul\u003e\n      \u003cli\u003e \u003cb\u003eIntraframe Noise X\u003c/b\u003e: ±0.0025. \u003c/li\u003e\n      \u003cli\u003e \u003cb\u003eIntraframe Noise Y\u003c/b\u003e: ±0.0025. \u003c/li\u003e\n      \u003cli\u003e \u003cb\u003eIntraframe Noise Z\u003c/b\u003e: ±0.0001. \u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/ul\u003e\n   \u003c/li\u003e\n\u003c/ul\u003e\n        \nThe weights exported to \u003ca href=https://www.tensorflow.org/js target=\"_blank\"\u003e\u003cimg alt=\"TensorFlow.js\" title=\"TensorFlow.js\" src=\"https://img.shields.io/static/v1?label=\u0026message=Tensorflow.js\u0026color=FF6000\u0026logo=TensorFlow\u0026logoColor=FFFFFF\" height=18\u003e\u003c/a\u003e are not the ones of the last epoch, but the ones that maximized the Validation Loss at any intermediate epoch.  \n\n*\u003ci\u003eLoss is weighted since the positive class is extremely underrepresented in the training set.\u003c/i\u003e\n        \n### Analyzing Results\n\nConfusion matrices show that results are pretty high for both classes putting the confidence threshold at 0.5.\n\n\u003cimg alt=\"Train Confusion Matrix\" title=\"Train Confusion Matrix\" src=\"./src/DrumsApp/python/results/Train Confusion Matrix.png\" width=40% align=left\u003e\n\u003cimg alt=\"Validation Confusion Matrix\" title=\"Validation Confusion Matrix\" src=\"./src/DrumsApp/python/results/Val Confusion Matrix.png\" width=40%\u003e\n\nDespite these \u003ci\u003eFalse Positives\u003c/i\u003e and \u003ci\u003eFalse Negatives\u003c/i\u003e could worsen the user experience in a network that is executed several times each second, it does not really affect the playtime in a real situation. It is due to three factors:\n\u003col\u003e\n  \u003cli\u003e Most \u003ci\u003eFalse Positives\u003c/i\u003e come from the frames anterior or posterior to the hit. In practice, it is solved by emptying the sequence buffers every time that a hit is detected.\u003c/li\u003e\n  \u003cli\u003e The small amount of \u003ci\u003eFalse Negatives\u003c/i\u003e detected in the train set comes from \u003ci\u003eData Augmentation\u003c/i\u003e or because it is detected on the previous or the following frame. In real cases, these displacements does not affect to the experience.\u003c/li\u003e\n  \u003cli\u003e The rest of \u003ci\u003eFalse Positives\u003c/i\u003e does not use to appear in real cases since, during playtime, only the sequences including detections entering in the predefined drums are analyzed. In practice it works as \u003ci\u003edouble check\u003c/i\u003e for the positive cases.\u003c/li\u003e\n\u003c/ol\u003e\n\nEvolution of the \u003ci\u003eTrain/Validation Loss\u003c/i\u003e during training confirms that there has been no overfitting.\n\n\u003cimg alt=\"Loss\" title=\"Loss\" src=\"./src/DrumsApp/python/results/Loss.png\" width=40%\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feric-canas%2Fdrums-app","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feric-canas%2Fdrums-app","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feric-canas%2Fdrums-app/lists"}