deep learning and feature learning for mir
TRANSCRIPT
-
8/10/2019 Deep Learning and Feature learning for MIR
1/71
Deep learning andfeature learning for MIR
Sander Dieleman July 23rd
, 2014
-
8/10/2019 Deep Learning and Feature learning for MIR
2/71
Deep learning and feature learning for MIR
PhD student at Ghent University
Gent
Graduating in December 2014
Currently interning at in NYC
http://reslab.elis.ugent.be
http://github.com/benanne
http://benanne.github.io
Working on audio-based music classification,
recommendation,
-
8/10/2019 Deep Learning and Feature learning for MIR
3/71
I. Multiscale music audio feature learning
II. Deep content-based music recommendation
III. End-to-end learning for music audio
IV. Transfer learning by supervised pre-training
V. More music recommendation + demo
-
8/10/2019 Deep Learning and Feature learning for MIR
4/71
Deep learning and feature learning for MIR
I. Multiscale musicaudio feature learning
4
-
8/10/2019 Deep Learning and Feature learning for MIR
5/71
Deep learning and feature learning for MIR
Feature learning is receiving moreattention from the MIR community
5
Inspired by good results in:speech recognitioncomputer vision, image classification
NLP, machine translation
-
8/10/2019 Deep Learning and Feature learning for MIR
6/71
Deep learning and feature learning for MIR
Music exhibits structure onmany different timescales
6
BA AC B Musical form
Themes
Motifs
Periodic waveforms
-
8/10/2019 Deep Learning and Feature learning for MIR
7/71
Deep learning and feature learning for MIR
K-means for feature learning:cluster centers are features
7
Spherical K-means:
means lie on the unit sphere, have a unit L2 norm
+ conceptually very simple
+ only one parameter to tune: number of
means
+ orders of magnitude fasterthan RBMs,
autoencoders, sparse coding
(Coates and Ng, 2012)
-
8/10/2019 Deep Learning and Feature learning for MIR
8/71
Deep learning and feature learning for MIR
Spherical K-means features workwell with linear feature encoding
8
Feature extraction is a convolution operation
input data
filter
During training:
During feature extraction:
0 0 1.7 0
-0.2 2.3 1.7 0.7
One-of-K
Linear
(Coates and Ng, 2012)
-
8/10/2019 Deep Learning and Feature learning for MIR
9/71
Deep learning and feature learning for MIR
Multiresolution spectrograms:different window sizes
9
Coarse
Fine 1024 samples
2048 samples
4096 samples
8192 samples
(Hamel et al., 2012)
-
8/10/2019 Deep Learning and Feature learning for MIR
10/71
Deep learning and feature learning for MIR
Gaussian pyramid: repeatedsmoothing and subsampling
10
Coarse
Fine
Smooth and
subsample /2
Smooth and
subsample /2
Smooth and
subsample /2
(Burt and Adelson, 1983)
-
8/10/2019 Deep Learning and Feature learning for MIR
11/71
Deep learning and feature learning for MIR
Laplacian pyramid: differencebetween levels of the Gaussian pyramid
11
subtract
(Burt and Adelson, 1983)
-
8/10/2019 Deep Learning and Feature learning for MIR
12/71
Deep learning and feature learning for MIR
Our approach: feature learningon multiple timescales
12
-
8/10/2019 Deep Learning and Feature learning for MIR
13/71
Deep learning and feature learning for MIR
Task: tag prediction onthe Magnatagatune dataset
14
We trained a multilayer perceptron (MLP):
1000 rectified linear hidden units
cross-entropy objective
predict 50 most common tags
25863 clips of 29 seconds, annotated with 188 tags
Tags are versatile: genre, tempo, instrumentation, dynamics,
(Law and von Ahn, 2009)
-
8/10/2019 Deep Learning and Feature learning for MIR
14/71
Deep learning and feature learning for MIR
Results: tag prediction onthe Magnatagatune dataset
15
-
8/10/2019 Deep Learning and Feature learning for MIR
15/71
Deep learning and feature learning for MIR
Results: importance of eachtimescale for different types of tags
18
-
8/10/2019 Deep Learning and Feature learning for MIR
16/71
Deep learning and feature learning for MIR 19
Learning features at multiple timescales improves
performance over single-timescale approaches
Spherical K-means features consistently
improve performance
-
8/10/2019 Deep Learning and Feature learning for MIR
17/71
Deep learning and feature learning for MIR
II. Deep content-basedmusic recommendation
20
-
8/10/2019 Deep Learning and Feature learning for MIR
18/71
Deep learning and feature learning for MIR
Music recommendation is becomingan increasingly relevant problem
21
Shift to digital distribution
long tail
The long tail is
particularly long for music
-
8/10/2019 Deep Learning and Feature learning for MIR
19/71
Deep learning and feature learning for MIR
Collaborative filtering: use listeningpatterns for recommendation
22
+ good performance
- cold start problem
many niche items that
only appeal to a smallaudience
-
8/10/2019 Deep Learning and Feature learning for MIR
20/71
Deep learning and feature learning for MIR
- worse performance
+ no usage data required
Content-based: use audio contentand/or metadata for recommendation
23
allows for all items to
be recommendedregardless of popularity
ArtistTitle
Th i l ti
-
8/10/2019 Deep Learning and Feature learning for MIR
21/71
Deep learning and feature learning for MIR
There is a large semantic gapbetween audio signals and listener
preference
24
audio signals
genre popularity time
lyrical themes
mood
instrumentationlocation
-
8/10/2019 Deep Learning and Feature learning for MIR
22/71
Deep learning and feature learning for MIR
Matrix Factorization: model listeningdata as a product of latent factors
25
Rusers
songs
= XYT.songs
use
rs
factors
factors
listening data
play counts user profileslatent factors song profileslatent factors
-
8/10/2019 Deep Learning and Feature learning for MIR
23/71
Deep learning and feature learning for MIR
Weighted Matrix Factorization: latentfactor model for implicit feedback data
26
Play count > 0 is a strong positive signal
Play count = 0 is a weak negative signal
WMF uses a confidence matrix toemphasize positive signals
iui
T
uuiuiyx yxpc
,
2
, 2
1
min**
Hu et al., ICDM 2008
-
8/10/2019 Deep Learning and Feature learning for MIR
24/71
Deep learning and feature learning for MIR
We predict latent factorsfrom music audio signals
27
Rusers
songs
= XYT.songs
use
rs
factors
factors
audio signals
regression
model
-
8/10/2019 Deep Learning and Feature learning for MIR
25/71
Deep learning and feature learning for MIR
Deep learning approach:convolutional neural network
29
6
128
~3s
4
123
6
3
128128
convolution
max-pooling
time
Spectrograms
-
8/10/2019 Deep Learning and Feature learning for MIR
26/71
Deep learning and feature learning for MIR
The Million Song Dataset providesmetadata for 1,000,000 songs
+ Echo Nest Taste profile subset
Listening data from 1.1m users for 380k songs
+ 7digital
Raw audio clips (over 99% of dataset)
30
Bertin-Mahieux et al., ISMIR 2011
-
8/10/2019 Deep Learning and Feature learning for MIR
27/71
Deep learning and feature learning for MIR
Subset (9330 songs, 20000 users)
Model mAP@500 AUC
Metric learning to rank 0.01801 0.60608
Linear regression 0.02389 0.63518
Multilayer perceptron 0.02536 0.64611
CNN with MSE 0.05016 0.70987
CNN with WPE 0.04323 0.70101
Quantitative evaluation: musicrecommendation performance
31
-
8/10/2019 Deep Learning and Feature learning for MIR
28/71
Deep learning and feature learning for MIR
Quantitative evaluation: musicrecommendation performance
32
Full dataset (382,410 songs, 1m users)
Model mAP@500 AUC
Random 0.00015 0.49935
Linear regression 0.00101 0.64522
CNN with MSE 0.00672 0.77192
Upper bound 0.23278 0.96070
-
8/10/2019 Deep Learning and Feature learning for MIR
29/71
-
8/10/2019 Deep Learning and Feature learning for MIR
30/71
Deep learning and feature learning for MIR
Qualitative evaluation: somequeries and their closest matches
34
Query Most similar tracks (WMF)Most similar tracks
(predicted)
Coldplay
I Ran Away
Coldplay
Careful Where You StandColdplay
The Goldrush
Coldplay
X & Y
ColdplaySquare One
Jonas Brothers
BB Good
Arcade Fire
Keep The Car RunningM83
You Appearing
Angus & Julia Stone
Hollywood
Bon IverCreature Fear
Coldplay
The Goldrush
-
8/10/2019 Deep Learning and Feature learning for MIR
31/71
Deep learning and feature learning for MIR
Qualitative evaluation: somequeries and their closest matches
35
Query Most similar tracks (WMF)Most similar tracks
(predicted)
Beyonce
Speechless
Beyonce
Gift From VirgoBeyonce
Daddy
Rihanna / J-Status
Crazy Little Thing Called ...
BeyonceDangerously In Love
Rihanna
Haunted
Daniel Bedingfield
If Youre Not The OneRihanna
Haunted
Alejandro Sanz
Siempre Es De Noche
MadonnaMiles Away
Lil Wayne / Shanell
American Star
-
8/10/2019 Deep Learning and Feature learning for MIR
32/71
Deep learning and feature learning for MIR
Qualitative evaluation: somequeries and their closest matches
36
Query Most similar tracks (WMF)Most similar tracks
(predicted)
Daft Punk
Rockn Roll
Daft Punk
Short CircuitDaft Punk
Nightvision
Daft Punk
Too Long
Daft PunkAerodynamite
Daft Punk
One More Time
Boys Noize
Shine ShineBoys Noize
Lava Lava
Flying Lotus
Pet Monster Shotglass
LCD SoundsystemOne Touch
Justice
One Minute To Midnight
-
8/10/2019 Deep Learning and Feature learning for MIR
33/71
Deep learning and feature learning for MIR
Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)
37
McFee et al., TASLP 2012
-
8/10/2019 Deep Learning and Feature learning for MIR
34/71
Deep learning and feature learning for MIR
Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)
38
-
8/10/2019 Deep Learning and Feature learning for MIR
35/71
Deep learning and feature learning for MIR
Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)
39
-
8/10/2019 Deep Learning and Feature learning for MIR
36/71
Deep learning and feature learning for MIR
Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)
40
-
8/10/2019 Deep Learning and Feature learning for MIR
37/71
Deep learning and feature learning for MIR
Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)
41
-
8/10/2019 Deep Learning and Feature learning for MIR
38/71
Deep learning and feature learning for MIR
Predicting latent factors is a viablemethod for music recommendation
42
-
8/10/2019 Deep Learning and Feature learning for MIR
39/71
Deep learning and feature learning for MIR
III. End-to-endlearning for music audio
43
-
8/10/2019 Deep Learning and Feature learning for MIR
40/71
Deep learning and feature learning for MIR
The traditional two-stage approach:feature extraction + shallow classifier
44
TODO
Extract features
(SIFT, HOG, )
Shallow classifier
(SVM, RF, )
Extract features
(MFCCs, chroma, )
Shallow classifier
(SVM, RF, )
-
8/10/2019 Deep Learning and Feature learning for MIR
41/71
Deep learning and feature learning for MIR
Integrated approach: learn both thefeatures and the classifier
45
Learn features +
classifier
Extract mid-level
representation
(spectrograms,constant-Q)
Learn features +
classifier
End-to-end learning:
try to remove the mid-level representation
-
8/10/2019 Deep Learning and Feature learning for MIR
42/71
Deep learning and feature learning for MIR
Convnets can learn the features andthe classifier simultaneously
46
features features
predictions
features
-
8/10/2019 Deep Learning and Feature learning for MIR
43/71
Deep learning and feature learning for MIR
We use log-scaled mel spectrograms asa mid-level representation
47
hop size = window size / 2
X(f) = |STFT[x(t)]|2
logarithmic loudness (DRC): X(f) = log(1 + CX(f))
mel binning: X(f) = M X(f)
-
8/10/2019 Deep Learning and Feature learning for MIR
44/71
Deep learning and feature learning for MIR
Evaluation: tag predictionon Magnatagatune
48
25863 clips of 29 seconds, annotated with 188 tags
Tags are versatile: genre, tempo, instrumentation, dynamics,
-
8/10/2019 Deep Learning and Feature learning for MIR
45/71
Deep learning and feature learning for MIR 49
-
8/10/2019 Deep Learning and Feature learning for MIR
46/71
Deep learning and feature learning for MIR 50
-
8/10/2019 Deep Learning and Feature learning for MIR
47/71
Deep learning and feature learning for MIR
Spectrograms vs. raw audio signals
51
Length Stride AUC (spectrograms) AUC (raw audio)
1024 1024 0.8690 0.8366
1024 512 0.8726 0.8365512 512 0.8793 0.8386
512 256 0.8793 0.8408
256 256 0.8815 0.8487
-
8/10/2019 Deep Learning and Feature learning for MIR
48/71
Deep learning and feature learning for MIR
The learned filters are mostlyfrequency-selective (and noisy)
52
-
8/10/2019 Deep Learning and Feature learning for MIR
49/71
Deep learning and feature learning for MIR
Their dominant frequenciesresemble the mel scale
53
h h l d
-
8/10/2019 Deep Learning and Feature learning for MIR
50/71
Deep learning and feature learning for MIR
Changing the nonlinearity to introducecompression does not help
54
Nonlinearity AUC (raw audio)
Rectified linear, max(0, x) 0.8366Logarithmic, log(1 + C x2) 0.7508
Logarithmic, log(1 + C |x|) 0.7487
-
8/10/2019 Deep Learning and Feature learning for MIR
51/71
Deep learning and feature learning for MIR 55
Addi f li l l
-
8/10/2019 Deep Learning and Feature learning for MIR
52/71
Deep learning and feature learning for MIR
Adding a feature pooling layer letsthe network learn invariances
56
Pooling method Pool size AUC (raw audio)
No pooling 1 0.8366
L2 pooling 2 0.8387
L2 pooling 4 0.8387
Max pooling 2 0.8183
Max pooling 4 0.8280
Th l i t f filt th t
-
8/10/2019 Deep Learning and Feature learning for MIR
53/71
Deep learning and feature learning for MIR
The pools consist of filters that areshifted versions of each other
57
-
8/10/2019 Deep Learning and Feature learning for MIR
54/71
Deep learning and feature learning for MIR 58
Learning features from raw audio is possible, but thisdoesnt work as well as using spectrograms (yet).
-
8/10/2019 Deep Learning and Feature learning for MIR
55/71
Deep learning and feature learning for MIR
IV. Transfer learning bysupervised pre-training
59
-
8/10/2019 Deep Learning and Feature learning for MIR
56/71
Deep learning and feature learning for MIR
Supervised feature learning
60
dog
catrabbit
penguin
car
table
input output
features!
Supervised feature
-
8/10/2019 Deep Learning and Feature learning for MIR
57/71
Deep learning and feature learning for MIR
Supervised featurelearning for MIR tasks
61
lots of training data for:
- automatic tagging
- user listening preference prediction
(i.e. recommendation)
GTZAN genre classification 10 genres
Unique genre classification 14 genres1517-artists genre classification 19 genres
Magnatagatune automatic tagging 188 tags
Tag and listening prediction differ
-
8/10/2019 Deep Learning and Feature learning for MIR
58/71
Deep learning and feature learning for MIR
Tag and listening prediction differfrom typical classification tasks
- multi-label classification
- large number of classes (tags, users)
- weak labeling- redundancy
- sparsity
62
use WMF for label space
dimensionality reduction
-
8/10/2019 Deep Learning and Feature learning for MIR
59/71
Deep learning and feature learning for MIR
Schematic overview
63
-
8/10/2019 Deep Learning and Feature learning for MIR
60/71
Deep learning and feature learning for MIR
Source task results
64
Model NMSE AUC mAP
Linear regression 0.986 0.75 0.0076
MLP (1 hidden layer) 0.971 0.76 0.0149
MLP (2 hidden layers) 0.961 0.746 0.0186
User listening preference prediction
-
8/10/2019 Deep Learning and Feature learning for MIR
61/71
Deep learning and feature learning for MIR
Source task results
65
Model NMSE AUC mAP
Linear regression 0.965 0.823 0.0099
MLP (1 hidden layer) 0.939 0.841 0.0179
MLP (2 hidden layers) 0.924 0.837 0.0179
Tag prediction
Target task results:
-
8/10/2019 Deep Learning and Feature learning for MIR
62/71
Deep learning and feature learning for MIR
Target task results:GTZAN genre classification
66
Target task results:
-
8/10/2019 Deep Learning and Feature learning for MIR
63/71
Deep learning and feature learning for MIR
Target task results:Unique genre classification
67
Target task results:
-
8/10/2019 Deep Learning and Feature learning for MIR
64/71
Deep learning and feature learning for MIR
Target task results:1517-artists genre classification
68
Target task results:
-
8/10/2019 Deep Learning and Feature learning for MIR
65/71
Deep learning and feature learning for MIR
Target task results:Magnatagatune auto-tagging (50)
69
Target task results:
-
8/10/2019 Deep Learning and Feature learning for MIR
66/71
Deep learning and feature learning for MIR
Target task results:Magnatagatune auto-tagging (188)
70
-
8/10/2019 Deep Learning and Feature learning for MIR
67/71
Deep learning and feature learning for MIR
V. More music recommendation
71
-
8/10/2019 Deep Learning and Feature learning for MIR
68/71
Deep learning and feature learning for MIR 72
128
599
4
256
149
4x
MP
4
256
73
2x
MP
4
512
35
4
2xMP
global
temporalpooling
mean
15362048 2048
40
L2
max
Spectrograms
(30 seconds)
Latent
factors
-
8/10/2019 Deep Learning and Feature learning for MIR
69/71
Deep learning and feature learning for MIR 73
-
8/10/2019 Deep Learning and Feature learning for MIR
70/71
DEMO
-
8/10/2019 Deep Learning and Feature learning for MIR
71/71
Papers
Multiscale approaches to music audio feature learningSander Dieleman, Benjamin Schrauwen, ISMIR 2013
Deep content-based music recommendationAron van den Oord, Sander Dieleman, Benjamin Schrauwen, NIPS 2013
End-to-end learning for music audioSander Dieleman, Benjamin Schrauwen, ICASSP 2014
Transfer learning by supervised pre-training for audio-based music classificationAron van den Oord, Sander Dieleman, Benjamin Schrauwen, ISMIR 2014