deep learning and feature learning for mirlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf ·...

71
Deep learning and feature learning for MIR Sander Dieleman July 23 rd , 2014

Upload: nguyenmien

Post on 20-Aug-2018

247 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning andfeature learning for MIR

Sander Dieleman – July 23rd, 2014

Page 2: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

PhD student at Ghent University

Gent

Graduating in December 2014

Currently interning at in NYC

http://reslab.elis.ugent.be

http://github.com/benanne

http://benanne.github.io

Working on audio-based music classification, recommendation, …

[email protected]

Page 3: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

I. Multiscale music audio feature learning

II. Deep content-based music recommendation

III. End-to-end learning for music audio

IV. Transfer learning by supervised pre-training

V. More music recommendation + demo

Page 4: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

I. Multiscale musicaudio feature learning

4

Page 5: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Feature learning is receiving more attention from the MIR community

5

Inspired by good results in:speech recognitioncomputer vision, image classificationNLP, machine translation…

Page 6: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Music exhibits structure onmany different timescales

6

BA AC B Musical form

Themes

Motifs

Periodic waveforms

Page 7: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

K-means for feature learning:cluster centers are features

7

Spherical K-means:

means lie on the unit sphere, have a unit L2 norm

+ conceptually very simple

+ only one parameter to tune: number of

means

+ orders of magnitude faster than RBMs,

autoencoders, sparse coding

(Coates and Ng, 2012)

Page 8: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Spherical K-means features workwell with linear feature encoding

8

Feature extraction is a convolution operation

input data

filter

During training:

During feature extraction:

0 0 1.7 0

-0.2 2.3 1.7 0.7

One-of-K

Linear

(Coates and Ng, 2012)

Page 9: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Multiresolution spectrograms: different window sizes

9

Coarse

Fine 1024 samples

2048 samples

4096 samples

8192 samples

(Hamel et al., 2012)

Page 10: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Gaussian pyramid: repeated smoothing and subsampling

10

Coarse

Fine

Smooth and

subsample /2

Smooth and

subsample /2

Smooth and

subsample /2

(Burt and Adelson, 1983)

Page 11: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Laplacian pyramid: difference between levels of the Gaussian pyramid

11

subtract

(Burt and Adelson, 1983)

Page 12: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Our approach: feature learningon multiple timescales

12

Page 13: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Task: tag prediction onthe Magnatagatune dataset

14

We trained a multilayer perceptron (MLP):

• 1000 rectified linear hidden units

• cross-entropy objective

• predict 50 most common tags

25863 clips of 29 seconds, annotated with 188 tags

Tags are versatile: genre, tempo, instrumentation, dynamics, …

(Law and von Ahn, 2009)

Page 14: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Results: tag prediction onthe Magnatagatune dataset

15

Page 15: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Results: importance of eachtimescale for different types of tags

18

Page 16: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR 19

Learning features at multiple timescales improves

performance over single-timescale approaches

Spherical K-means features consistently

improve performance

Page 17: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

II. Deep content-basedmusic recommendation

20

Page 18: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Music recommendation is becoming an increasingly relevant problem

21

Shift to digital distribution

long tail

The long tail is

particularly long for music

Page 19: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Collaborative filtering: use listening patterns for recommendation

22

+ good performance

- cold start problem

many niche items that

only appeal to a small

audience

Page 20: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

- worse performance

+ no usage data required

Content-based: use audio content and/or metadata for recommendation

23

allows for all items to

be recommended

regardless of popularity

ArtistTitle

Page 21: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

There is a large semantic gap between audio signals and listener

preference

24

audio signals

genre popularity time

lyrical themes

mood

instrumentationlocation

Page 22: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Matrix Factorization: model listening data as a product of latent factors

25

Rusers

songs

= XYT.songs

users

factors

facto

rs

listening data

play countsuser profiles

latent factors

song profiles

latent factors

Page 23: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Weighted Matrix Factorization: latent factor model for implicit feedback data

26

Play count > 0 is a strong positive signal

Play count = 0 is a weak negative signal

WMF uses a confidence matrix to

emphasize positive signals

iu

i

T

uuiuiyx

yxpc

,

2

, 2

1min

**

Hu et al., ICDM 2008

Page 24: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

We predict latent factorsfrom music audio signals

27

Rusers

songs

= XYT.songs

users

factors

facto

rs

audio signals

regression

model

Page 25: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Deep learning approach: convolutional neural network

29

6

128

~3s

4

123

6

30

128128

convolution

max-pooling

tim

e

Spectrograms

Page 26: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

The Million Song Dataset provides metadata for 1,000,000 songs

+ Echo Nest Taste profile subset

Listening data from 1.1m users for 380k songs

+ 7digital

Raw audio clips (over 99% of dataset)

30

Bertin-Mahieux et al., ISMIR 2011

Page 27: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Subset (9330 songs, 20000 users)

Model mAP@500 AUC

Metric learning to rank 0.01801 0.60608

Linear regression 0.02389 0.63518

Multilayer perceptron 0.02536 0.64611

CNN with MSE 0.05016 0.70987

CNN with WPE 0.04323 0.70101

Quantitative evaluation: music recommendation performance

31

Page 28: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Quantitative evaluation: music recommendation performance

32

Full dataset (382,410 songs, 1m users)

Model mAP@500 AUC

Random 0.00015 0.49935

Linear regression 0.00101 0.64522

CNN with MSE 0.00672 0.77192

Upper bound 0.23278 0.96070

Page 29: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: somequeries and their closest matches

33

Query Most similar tracks (WMF)Most similar tracks (predicted)

Jonas Brothers Hold On

Jonas BrothersGames

Miley CyrusG.N.O. (Girl’s Night Out)

Miley CyrusGirls Just Wanna Have Fun

Jonas BrothersYear 3000

Jonas BrothersBB Good

Jonas BrothersVideo Girl

Jonas BrothersGames

New Found GloryMy Friends Over You

My Chemical RomanceThank You For The Venom

My Chemical RomanceTeenagers

Page 30: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: somequeries and their closest matches

34

Query Most similar tracks (WMF)Most similar tracks (predicted)

ColdplayI Ran Away

ColdplayCareful Where You Stand

ColdplayThe Goldrush

ColdplayX & Y

ColdplaySquare One

Jonas BrothersBB Good

Arcade FireKeep The Car Running

M83You Appearing

Angus & Julia StoneHollywood

Bon IverCreature Fear

ColdplayThe Goldrush

Page 31: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: somequeries and their closest matches

35

Query Most similar tracks (WMF)Most similar tracks (predicted)

BeyonceSpeechless

BeyonceGift From Virgo

BeyonceDaddy

Rihanna / J-StatusCrazy Little Thing Called ...

BeyonceDangerously In Love

RihannaHaunted

Daniel BedingfieldIf You’re Not The One

RihannaHaunted

Alejandro SanzSiempre Es De Noche

MadonnaMiles Away

Lil Wayne / ShanellAmerican Star

Page 32: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: somequeries and their closest matches

36

Query Most similar tracks (WMF)Most similar tracks (predicted)

Daft Punk Rock’n Roll

Daft PunkShort Circuit

Daft PunkNightvision

Daft PunkToo Long

Daft PunkAerodynamite

Daft PunkOne More Time

Boys NoizeShine Shine

Boys NoizeLava Lava

Flying LotusPet Monster Shotglass

LCD SoundsystemOne Touch

JusticeOne Minute To Midnight

Page 33: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

37

McFee et al., TASLP 2012

Page 34: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

38

Page 35: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

39

Page 36: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

40

Page 37: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

41

Page 38: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Predicting latent factors is a viable method for music recommendation

42

Page 39: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

III. End-to-endlearning for music audio

43

Page 40: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

The traditional two-stage approach: feature extraction + shallow classifier

44

TODO

Extract features(SIFT, HOG, …)

Shallow classifier(SVM, RF, …)

Extract features(MFCCs, chroma, …)

Shallow classifier(SVM, RF, …)

Page 41: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Integrated approach: learn both the features and the classifier

45

Learn features +

classifier

Extract mid-level

representation

(spectrograms,

constant-Q)

Learn features +

classifier

End-to-end learning:

try to remove the mid-

level representation

Page 42: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Convnets can learn the features and the classifier simultaneously

46

features features

predictionsfeatures

Page 43: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

We use log-scaled mel spectrograms as a mid-level representation

47

hop size = window size / 2

X(f) = |STFT[x(t)]|2

logarithmic loudness (DRC): X’’(f) = log(1 + C X’(f))

mel binning: X’(f) = M X(f)

Page 44: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Evaluation: tag predictionon Magnatagatune

48

25863 clips of 29 seconds, annotated with 188 tags

Tags are versatile: genre, tempo, instrumentation, dynamics, …

Page 45: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR 49

Page 46: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR 50

Page 47: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Spectrograms vs. raw audio signals

51

Length Stride AUC (spectrograms) AUC (raw audio)

1024 1024 0.8690 0.8366

1024 512 0.8726 0.8365

512 512 0.8793 0.8386

512 256 0.8793 0.8408

256 256 0.8815 0.8487

Page 48: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

The learned filters are mostly frequency-selective (and noisy)

52

Page 49: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Their dominant frequenciesresemble the mel scale

53

Page 50: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Changing the nonlinearity to introduce compression does not help

54

Nonlinearity AUC (raw audio)

Rectified linear, max(0, x) 0.8366

Logarithmic, log(1 + C x2) 0.7508

Logarithmic, log(1 + C |x|) 0.7487

Page 51: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR 55

Page 52: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Adding a feature pooling layer lets the network learn invariances

56

Pooling method Pool size AUC (raw audio)

No pooling 1 0.8366

L2 pooling 2 0.8387

L2 pooling 4 0.8387

Max pooling 2 0.8183

Max pooling 4 0.8280

Page 53: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

The pools consist of filters that are shifted versions of each other

57

Page 54: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR 58

Learning features from raw audio is possible, but this doesn’t work as well as using spectrograms (yet).

Page 55: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

IV. Transfer learning bysupervised pre-training

59

Page 56: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Supervised feature learning

60

dog

catrabbit

penguin

car

table

input output

features!

Page 57: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Supervised featurelearning for MIR tasks

61

lots of training data for:

- automatic tagging

- user listening preference prediction

(i.e. recommendation)

GTZAN genre classification 10 genres

Unique genre classification 14 genres

1517-artists genre classification 19 genres

Magnatagatune automatic tagging 188 tags

Page 58: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Tag and listening prediction differ from typical classification tasks

- multi-label classification

- large number of classes (tags, users)

- weak labeling

- redundancy

- sparsity

62

use WMF for label space

dimensionality reduction

Page 59: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Schematic overview

63

Page 60: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Source task results

64

Model NMSE AUC mAP

Linear regression 0.986 0.75 0.0076

MLP (1 hidden layer) 0.971 0.76 0.0149

MLP (2 hidden layers) 0.961 0.746 0.0186

User listening preference prediction

Page 61: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Source task results

65

Model NMSE AUC mAP

Linear regression 0.965 0.823 0.0099

MLP (1 hidden layer) 0.939 0.841 0.0179

MLP (2 hidden layers) 0.924 0.837 0.0179

Tag prediction

Page 62: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Target task results:GTZAN genre classification

66

Page 63: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Target task results:Unique genre classification

67

Page 64: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Target task results:1517-artists genre classification

68

Page 65: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Target task results:Magnatagatune auto-tagging (50)

69

Page 66: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Target task results:Magnatagatune auto-tagging (188)

70

Page 67: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

V. More music recommendation

71

Page 68: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR 72

128

599

4

256

149

4x

MP

4

256

73

2x

MP

4

512

35

4

2x

MP

global

temporal

pooling

mean

15362048 2048

40

L2

max

Spectrograms

(30 seconds)

Latent

factors

Page 69: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR 73

Page 70: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

DEMO

Page 71: Deep learning and feature learning for MIRlabrosa.ee.columbia.edu/cuneuralnet/dieleman072314.pdf · Coldplay The Goldrush Coldplay X & Y Coldplay Square One Jonas Brothers BB Good

Deep learning and feature learning for MIR

Papers

Multiscale approaches to music audio feature learningSander Dieleman, Benjamin Schrauwen, ISMIR 2013

Deep content-based music recommendationAäron van den Oord, Sander Dieleman, Benjamin Schrauwen, NIPS 2013

End-to-end learning for music audioSander Dieleman, Benjamin Schrauwen, ICASSP 2014

Transfer learning by supervised pre-training for audio-

based music classificationAäron van den Oord, Sander Dieleman, Benjamin Schrauwen, ISMIR 2014

75