rand digital piano multi modal transfer of learningcs229.stanford.edu/proj2019spr/poster/5.pdf ·...
Post on 02-Mar-2021
1 Views
Preview:
TRANSCRIPT
GRAND DIGITAL PIANO: MULTI-MODAL TRANSFER OF LEARNINGSANCHES SHAYDA ERNESTO EVGENIY (ESANCHES), ILKYU LEE (LQLEE)
INTRODUCTION
Reproduction of the touch and soundof real instruments has been a challeng-ing problem throughout the years. Forexample, while current digital pianos at-tempt to achieve this task via simplified,finite sample-based methods, there is stillmuch room for improvement in regardsto mimicking an acoustic piano.
In this study, we aim to improve uponsuch methods by utilizing machine learn-ing to train a physically-based model ofan acoustic piano, allowing for the gener-ation of novel, more realistic sounds.
OUR APPROACH
We utilize a multi-modal transfer oflearning method, wherein digital pianodata regarding touch can ultimately gen-erate realistic sound of an acoustic piano:
1. A laser sensor is used on digital pi-ano to gather touch data via key ve-locity and is used to train a modelto predict intermediary modalities,such as MIDI.
2. The model is then applied on theacoustic piano by inputting touchdata to predict the intermediarymodalities that is not naturally avail-able on acoustic pianos.
3. With these intermediary modalitiesas input, another model is trained topredict the sound of the acoustic pi-ano.
4. Finally, the previous model is exe-cuted on the digital piano to gener-ate realistic sound from sensor dataonly available on the digital piano.
IMPLEMENTATION
Figure 1: Full model architecture
EXPERIMENTAL RESULTS
Figure 2: RNN architecture for Laser → MIDI training (Left). RNN training result (Right).
Figure 3: Using Kalman filter for Laser → MIDI task. Right is a zoomed-in sight.
CONCLUSIONS
We have utilized a multi-modal ap-proach to the problem of predicting in-termediary modalities such as MIDI ve-locity from touch data using a binary out-put RNN, with greater success by using aKalman filter.
While the touch → MIDI trainingshowed signs of success, more workmust be done on the sound generationaspect, with conditioning to pitch andMIDI velocity. WaveNet provides a goodstarting point; however, more researchand modifications are necessary for fullcompletion of the entire model.
FUTURE WORK
I Utilize a greater dataset for soundgeneration training and explore par-allelization and other tools for timeefficiency.
I Work on real-time conversion of theprocess from key press to soundgeneration.
I Consider other advanced modelssuch as conditional Generative Ad-versarial networks for sound gener-ation.
ACKNOWLEDGMENTS
We would like to thank the CS229Course Staff and Anand Avati for helpfulsuggestions.
top related