rand digital piano multi modal transfer of learningcs229.stanford.edu/proj2019spr/poster/5.pdf ·...

Post on 02-Mar-2021

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GRAND DIGITAL PIANO: MULTI-MODAL TRANSFER OF LEARNINGSANCHES SHAYDA ERNESTO EVGENIY (ESANCHES), ILKYU LEE (LQLEE)

INTRODUCTION

Reproduction of the touch and soundof real instruments has been a challeng-ing problem throughout the years. Forexample, while current digital pianos at-tempt to achieve this task via simplified,finite sample-based methods, there is stillmuch room for improvement in regardsto mimicking an acoustic piano.

In this study, we aim to improve uponsuch methods by utilizing machine learn-ing to train a physically-based model ofan acoustic piano, allowing for the gener-ation of novel, more realistic sounds.

OUR APPROACH

We utilize a multi-modal transfer oflearning method, wherein digital pianodata regarding touch can ultimately gen-erate realistic sound of an acoustic piano:

1. A laser sensor is used on digital pi-ano to gather touch data via key ve-locity and is used to train a modelto predict intermediary modalities,such as MIDI.

2. The model is then applied on theacoustic piano by inputting touchdata to predict the intermediarymodalities that is not naturally avail-able on acoustic pianos.

3. With these intermediary modalitiesas input, another model is trained topredict the sound of the acoustic pi-ano.

4. Finally, the previous model is exe-cuted on the digital piano to gener-ate realistic sound from sensor dataonly available on the digital piano.

IMPLEMENTATION

Figure 1: Full model architecture

EXPERIMENTAL RESULTS

Figure 2: RNN architecture for Laser → MIDI training (Left). RNN training result (Right).

Figure 3: Using Kalman filter for Laser → MIDI task. Right is a zoomed-in sight.

CONCLUSIONS

We have utilized a multi-modal ap-proach to the problem of predicting in-termediary modalities such as MIDI ve-locity from touch data using a binary out-put RNN, with greater success by using aKalman filter.

While the touch → MIDI trainingshowed signs of success, more workmust be done on the sound generationaspect, with conditioning to pitch andMIDI velocity. WaveNet provides a goodstarting point; however, more researchand modifications are necessary for fullcompletion of the entire model.

FUTURE WORK

I Utilize a greater dataset for soundgeneration training and explore par-allelization and other tools for timeefficiency.

I Work on real-time conversion of theprocess from key press to soundgeneration.

I Consider other advanced modelssuch as conditional Generative Ad-versarial networks for sound gener-ation.

ACKNOWLEDGMENTS

We would like to thank the CS229Course Staff and Anand Avati for helpfulsuggestions.

top related