Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eric-canas/drums-app
Play drums in your browser with your webcam
https://github.com/eric-canas/drums-app
browser-game computer-vision deep-learning keras music-generation neural-network tensorflow-js
Last synced: 3 months ago
JSON representation
Play drums in your browser with your webcam
- Host: GitHub
- URL: https://github.com/eric-canas/drums-app
- Owner: Eric-Canas
- License: gpl-3.0
- Created: 2021-06-07T19:07:19.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-22T19:55:12.000Z (almost 2 years ago)
- Last Synced: 2023-03-05T08:56:17.704Z (almost 2 years ago)
- Topics: browser-game, computer-vision, deep-learning, keras, music-generation, neural-network, tensorflow-js
- Language: JavaScript
- Homepage:
- Size: 83.2 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Drums-app
Play Drums in your Browser.Drums-app allows you to simulate in your browser any percussion instrument, by using only your Webcam. All machine learning models run locally, so no user information is sent to the server.
Check the demo at drums-app.com
### Quick Start
Simply run the src/index.html in server mode, or enter at drums-app.com.
Select **Set Template** for building your own drums template by uploading some images and attaching your sounds to them.
Turn on your **webcam** and enjoy it!
*No cats were harmed during this recording# Implementation Details
This web application is built with and .
The pipeline uses two Machine Learning models.
- Hands Model: A Computer Vision model offered by for detecting 21 landmarks for each hand (x, y, z).
- HitNet: An LSTM model that has been developed in for this application and then converted to . It takes the last N positions of a hand and predicts the probability of this sequence to correspond with a Hit.
## HitNet Details
### Building the Dataset
The dataset used for training has been built in the following way:
- A representative landmark (Index Finger Dip [Y]) of each detected hand is plotted in an interactive chart, using .
- Any time that a key is pressed, a grey mark is plotted on the same chart.
- I start to play drums with one hand while pressing a key on the keyboard (with the other hand) every time that I beat an imaginary drum. [Gif Left]
- I use the mouse for selecting in the chart those points that should be considered as a hit. [Gif Right]
- When click the "Save Dataset" button, all hand positions together with their correspondent tags (1 if the frame was considered a hit or 0 otherwise) are downloaded as a JSON file .
### Defining the Architecture
HitNet has been built in , using , and then exported to . In order to not produce any dissonance between the hit on the drum and the produced sound **HitNet** must run as fast as possible, for this reason it implements an extremely simple architecture.
It takes as input the 4 last detections of a hand [Flatten version of its 21 landmarks (x,y,z)] and outputs the probability of this sequence to correspond with a hit. It is only composed by an LSTM layer followed by a ReLU activation (using dropout with p = 0.25) and a Dense output layer with only 1 unit, followed by a sigmoid activation.
### Training the model
HitNet has been trained in , using the following parameterization:
- Epochs: 3000.
- Optimizer: Adam.
- Loss: Weighted Binary Cross Entropy*.
- Training/Val Split: 0.85-0.15.
- Data Augmentation:
- Mirroring: X axis.
- Shift: Shift applied in block for the whole sequence.
- X Shift: ±0.3.
- Y Shift: ±0.3.
- Z Shift: ±0.5.
- Interframe Noise: Small shift applied independently to each frame of the sequence.
- Interframe Noise X: ±0.01.
- Interframe Noise Y: ±0.01.
- Interframe Noise Z: ±0.0025.
- Intraframe Noise: Extremely small shift applied independently to each single part of a hand.
- Intraframe Noise X: ±0.0025.
- Intraframe Noise Y: ±0.0025.
- Intraframe Noise Z: ±0.0001.
The weights exported to are not the ones of the last epoch, but the ones that maximized the Validation Loss at any intermediate epoch.
*Loss is weighted since the positive class is extremely underrepresented in the training set.
### Analyzing Results
Confusion matrices show that results are pretty high for both classes putting the confidence threshold at 0.5.
Despite these False Positives and False Negatives could worsen the user experience in a network that is executed several times each second, it does not really affect the playtime in a real situation. It is due to three factors:
- Most False Positives come from the frames anterior or posterior to the hit. In practice, it is solved by emptying the sequence buffers every time that a hit is detected.
- The small amount of False Negatives detected in the train set comes from Data Augmentation or because it is detected on the previous or the following frame. In real cases, these displacements does not affect to the experience.
- The rest of False Positives does not use to appear in real cases since, during playtime, only the sequences including detections entering in the predefined drums are analyzed. In practice it works as double check for the positive cases.
Evolution of the Train/Validation Loss during training confirms that there has been no overfitting.