deep learning and feature learning for mir

Upload: venommax

Post on 02-Jun-2018

225 views

Category:

Documents


4 download

TRANSCRIPT

  • 8/10/2019 Deep Learning and Feature learning for MIR

    1/71

    Deep learning andfeature learning for MIR

    Sander Dieleman July 23rd

    , 2014

  • 8/10/2019 Deep Learning and Feature learning for MIR

    2/71

    Deep learning and feature learning for MIR

    PhD student at Ghent University

    Gent

    Graduating in December 2014

    Currently interning at in NYC

    http://reslab.elis.ugent.be

    http://github.com/benanne

    http://benanne.github.io

    Working on audio-based music classification,

    recommendation,

    [email protected]

  • 8/10/2019 Deep Learning and Feature learning for MIR

    3/71

    I. Multiscale music audio feature learning

    II. Deep content-based music recommendation

    III. End-to-end learning for music audio

    IV. Transfer learning by supervised pre-training

    V. More music recommendation + demo

  • 8/10/2019 Deep Learning and Feature learning for MIR

    4/71

    Deep learning and feature learning for MIR

    I. Multiscale musicaudio feature learning

    4

  • 8/10/2019 Deep Learning and Feature learning for MIR

    5/71

    Deep learning and feature learning for MIR

    Feature learning is receiving moreattention from the MIR community

    5

    Inspired by good results in:speech recognitioncomputer vision, image classification

    NLP, machine translation

  • 8/10/2019 Deep Learning and Feature learning for MIR

    6/71

    Deep learning and feature learning for MIR

    Music exhibits structure onmany different timescales

    6

    BA AC B Musical form

    Themes

    Motifs

    Periodic waveforms

  • 8/10/2019 Deep Learning and Feature learning for MIR

    7/71

    Deep learning and feature learning for MIR

    K-means for feature learning:cluster centers are features

    7

    Spherical K-means:

    means lie on the unit sphere, have a unit L2 norm

    + conceptually very simple

    + only one parameter to tune: number of

    means

    + orders of magnitude fasterthan RBMs,

    autoencoders, sparse coding

    (Coates and Ng, 2012)

  • 8/10/2019 Deep Learning and Feature learning for MIR

    8/71

    Deep learning and feature learning for MIR

    Spherical K-means features workwell with linear feature encoding

    8

    Feature extraction is a convolution operation

    input data

    filter

    During training:

    During feature extraction:

    0 0 1.7 0

    -0.2 2.3 1.7 0.7

    One-of-K

    Linear

    (Coates and Ng, 2012)

  • 8/10/2019 Deep Learning and Feature learning for MIR

    9/71

    Deep learning and feature learning for MIR

    Multiresolution spectrograms:different window sizes

    9

    Coarse

    Fine 1024 samples

    2048 samples

    4096 samples

    8192 samples

    (Hamel et al., 2012)

  • 8/10/2019 Deep Learning and Feature learning for MIR

    10/71

    Deep learning and feature learning for MIR

    Gaussian pyramid: repeatedsmoothing and subsampling

    10

    Coarse

    Fine

    Smooth and

    subsample /2

    Smooth and

    subsample /2

    Smooth and

    subsample /2

    (Burt and Adelson, 1983)

  • 8/10/2019 Deep Learning and Feature learning for MIR

    11/71

    Deep learning and feature learning for MIR

    Laplacian pyramid: differencebetween levels of the Gaussian pyramid

    11

    subtract

    (Burt and Adelson, 1983)

  • 8/10/2019 Deep Learning and Feature learning for MIR

    12/71

    Deep learning and feature learning for MIR

    Our approach: feature learningon multiple timescales

    12

  • 8/10/2019 Deep Learning and Feature learning for MIR

    13/71

    Deep learning and feature learning for MIR

    Task: tag prediction onthe Magnatagatune dataset

    14

    We trained a multilayer perceptron (MLP):

    1000 rectified linear hidden units

    cross-entropy objective

    predict 50 most common tags

    25863 clips of 29 seconds, annotated with 188 tags

    Tags are versatile: genre, tempo, instrumentation, dynamics,

    (Law and von Ahn, 2009)

  • 8/10/2019 Deep Learning and Feature learning for MIR

    14/71

    Deep learning and feature learning for MIR

    Results: tag prediction onthe Magnatagatune dataset

    15

  • 8/10/2019 Deep Learning and Feature learning for MIR

    15/71

    Deep learning and feature learning for MIR

    Results: importance of eachtimescale for different types of tags

    18

  • 8/10/2019 Deep Learning and Feature learning for MIR

    16/71

    Deep learning and feature learning for MIR 19

    Learning features at multiple timescales improves

    performance over single-timescale approaches

    Spherical K-means features consistently

    improve performance

  • 8/10/2019 Deep Learning and Feature learning for MIR

    17/71

    Deep learning and feature learning for MIR

    II. Deep content-basedmusic recommendation

    20

  • 8/10/2019 Deep Learning and Feature learning for MIR

    18/71

    Deep learning and feature learning for MIR

    Music recommendation is becomingan increasingly relevant problem

    21

    Shift to digital distribution

    long tail

    The long tail is

    particularly long for music

  • 8/10/2019 Deep Learning and Feature learning for MIR

    19/71

    Deep learning and feature learning for MIR

    Collaborative filtering: use listeningpatterns for recommendation

    22

    + good performance

    - cold start problem

    many niche items that

    only appeal to a smallaudience

  • 8/10/2019 Deep Learning and Feature learning for MIR

    20/71

    Deep learning and feature learning for MIR

    - worse performance

    + no usage data required

    Content-based: use audio contentand/or metadata for recommendation

    23

    allows for all items to

    be recommendedregardless of popularity

    ArtistTitle

    Th i l ti

  • 8/10/2019 Deep Learning and Feature learning for MIR

    21/71

    Deep learning and feature learning for MIR

    There is a large semantic gapbetween audio signals and listener

    preference

    24

    audio signals

    genre popularity time

    lyrical themes

    mood

    instrumentationlocation

  • 8/10/2019 Deep Learning and Feature learning for MIR

    22/71

    Deep learning and feature learning for MIR

    Matrix Factorization: model listeningdata as a product of latent factors

    25

    Rusers

    songs

    = XYT.songs

    use

    rs

    factors

    factors

    listening data

    play counts user profileslatent factors song profileslatent factors

  • 8/10/2019 Deep Learning and Feature learning for MIR

    23/71

    Deep learning and feature learning for MIR

    Weighted Matrix Factorization: latentfactor model for implicit feedback data

    26

    Play count > 0 is a strong positive signal

    Play count = 0 is a weak negative signal

    WMF uses a confidence matrix toemphasize positive signals

    iui

    T

    uuiuiyx yxpc

    ,

    2

    , 2

    1

    min**

    Hu et al., ICDM 2008

  • 8/10/2019 Deep Learning and Feature learning for MIR

    24/71

    Deep learning and feature learning for MIR

    We predict latent factorsfrom music audio signals

    27

    Rusers

    songs

    = XYT.songs

    use

    rs

    factors

    factors

    audio signals

    regression

    model

  • 8/10/2019 Deep Learning and Feature learning for MIR

    25/71

    Deep learning and feature learning for MIR

    Deep learning approach:convolutional neural network

    29

    6

    128

    ~3s

    4

    123

    6

    3

    128128

    convolution

    max-pooling

    time

    Spectrograms

  • 8/10/2019 Deep Learning and Feature learning for MIR

    26/71

    Deep learning and feature learning for MIR

    The Million Song Dataset providesmetadata for 1,000,000 songs

    + Echo Nest Taste profile subset

    Listening data from 1.1m users for 380k songs

    + 7digital

    Raw audio clips (over 99% of dataset)

    30

    Bertin-Mahieux et al., ISMIR 2011

  • 8/10/2019 Deep Learning and Feature learning for MIR

    27/71

    Deep learning and feature learning for MIR

    Subset (9330 songs, 20000 users)

    Model mAP@500 AUC

    Metric learning to rank 0.01801 0.60608

    Linear regression 0.02389 0.63518

    Multilayer perceptron 0.02536 0.64611

    CNN with MSE 0.05016 0.70987

    CNN with WPE 0.04323 0.70101

    Quantitative evaluation: musicrecommendation performance

    31

  • 8/10/2019 Deep Learning and Feature learning for MIR

    28/71

    Deep learning and feature learning for MIR

    Quantitative evaluation: musicrecommendation performance

    32

    Full dataset (382,410 songs, 1m users)

    Model mAP@500 AUC

    Random 0.00015 0.49935

    Linear regression 0.00101 0.64522

    CNN with MSE 0.00672 0.77192

    Upper bound 0.23278 0.96070

  • 8/10/2019 Deep Learning and Feature learning for MIR

    29/71

  • 8/10/2019 Deep Learning and Feature learning for MIR

    30/71

    Deep learning and feature learning for MIR

    Qualitative evaluation: somequeries and their closest matches

    34

    Query Most similar tracks (WMF)Most similar tracks

    (predicted)

    Coldplay

    I Ran Away

    Coldplay

    Careful Where You StandColdplay

    The Goldrush

    Coldplay

    X & Y

    ColdplaySquare One

    Jonas Brothers

    BB Good

    Arcade Fire

    Keep The Car RunningM83

    You Appearing

    Angus & Julia Stone

    Hollywood

    Bon IverCreature Fear

    Coldplay

    The Goldrush

  • 8/10/2019 Deep Learning and Feature learning for MIR

    31/71

    Deep learning and feature learning for MIR

    Qualitative evaluation: somequeries and their closest matches

    35

    Query Most similar tracks (WMF)Most similar tracks

    (predicted)

    Beyonce

    Speechless

    Beyonce

    Gift From VirgoBeyonce

    Daddy

    Rihanna / J-Status

    Crazy Little Thing Called ...

    BeyonceDangerously In Love

    Rihanna

    Haunted

    Daniel Bedingfield

    If Youre Not The OneRihanna

    Haunted

    Alejandro Sanz

    Siempre Es De Noche

    MadonnaMiles Away

    Lil Wayne / Shanell

    American Star

  • 8/10/2019 Deep Learning and Feature learning for MIR

    32/71

    Deep learning and feature learning for MIR

    Qualitative evaluation: somequeries and their closest matches

    36

    Query Most similar tracks (WMF)Most similar tracks

    (predicted)

    Daft Punk

    Rockn Roll

    Daft Punk

    Short CircuitDaft Punk

    Nightvision

    Daft Punk

    Too Long

    Daft PunkAerodynamite

    Daft Punk

    One More Time

    Boys Noize

    Shine ShineBoys Noize

    Lava Lava

    Flying Lotus

    Pet Monster Shotglass

    LCD SoundsystemOne Touch

    Justice

    One Minute To Midnight

  • 8/10/2019 Deep Learning and Feature learning for MIR

    33/71

    Deep learning and feature learning for MIR

    Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

    37

    McFee et al., TASLP 2012

  • 8/10/2019 Deep Learning and Feature learning for MIR

    34/71

    Deep learning and feature learning for MIR

    Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

    38

  • 8/10/2019 Deep Learning and Feature learning for MIR

    35/71

    Deep learning and feature learning for MIR

    Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

    39

  • 8/10/2019 Deep Learning and Feature learning for MIR

    36/71

    Deep learning and feature learning for MIR

    Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

    40

  • 8/10/2019 Deep Learning and Feature learning for MIR

    37/71

    Deep learning and feature learning for MIR

    Qualitative evaluation: visualisationof predicted usage patterns (t-SNE)

    41

  • 8/10/2019 Deep Learning and Feature learning for MIR

    38/71

    Deep learning and feature learning for MIR

    Predicting latent factors is a viablemethod for music recommendation

    42

  • 8/10/2019 Deep Learning and Feature learning for MIR

    39/71

    Deep learning and feature learning for MIR

    III. End-to-endlearning for music audio

    43

  • 8/10/2019 Deep Learning and Feature learning for MIR

    40/71

    Deep learning and feature learning for MIR

    The traditional two-stage approach:feature extraction + shallow classifier

    44

    TODO

    Extract features

    (SIFT, HOG, )

    Shallow classifier

    (SVM, RF, )

    Extract features

    (MFCCs, chroma, )

    Shallow classifier

    (SVM, RF, )

  • 8/10/2019 Deep Learning and Feature learning for MIR

    41/71

    Deep learning and feature learning for MIR

    Integrated approach: learn both thefeatures and the classifier

    45

    Learn features +

    classifier

    Extract mid-level

    representation

    (spectrograms,constant-Q)

    Learn features +

    classifier

    End-to-end learning:

    try to remove the mid-level representation

  • 8/10/2019 Deep Learning and Feature learning for MIR

    42/71

    Deep learning and feature learning for MIR

    Convnets can learn the features andthe classifier simultaneously

    46

    features features

    predictions

    features

  • 8/10/2019 Deep Learning and Feature learning for MIR

    43/71

    Deep learning and feature learning for MIR

    We use log-scaled mel spectrograms asa mid-level representation

    47

    hop size = window size / 2

    X(f) = |STFT[x(t)]|2

    logarithmic loudness (DRC): X(f) = log(1 + CX(f))

    mel binning: X(f) = M X(f)

  • 8/10/2019 Deep Learning and Feature learning for MIR

    44/71

    Deep learning and feature learning for MIR

    Evaluation: tag predictionon Magnatagatune

    48

    25863 clips of 29 seconds, annotated with 188 tags

    Tags are versatile: genre, tempo, instrumentation, dynamics,

  • 8/10/2019 Deep Learning and Feature learning for MIR

    45/71

    Deep learning and feature learning for MIR 49

  • 8/10/2019 Deep Learning and Feature learning for MIR

    46/71

    Deep learning and feature learning for MIR 50

  • 8/10/2019 Deep Learning and Feature learning for MIR

    47/71

    Deep learning and feature learning for MIR

    Spectrograms vs. raw audio signals

    51

    Length Stride AUC (spectrograms) AUC (raw audio)

    1024 1024 0.8690 0.8366

    1024 512 0.8726 0.8365512 512 0.8793 0.8386

    512 256 0.8793 0.8408

    256 256 0.8815 0.8487

  • 8/10/2019 Deep Learning and Feature learning for MIR

    48/71

    Deep learning and feature learning for MIR

    The learned filters are mostlyfrequency-selective (and noisy)

    52

  • 8/10/2019 Deep Learning and Feature learning for MIR

    49/71

    Deep learning and feature learning for MIR

    Their dominant frequenciesresemble the mel scale

    53

    h h l d

  • 8/10/2019 Deep Learning and Feature learning for MIR

    50/71

    Deep learning and feature learning for MIR

    Changing the nonlinearity to introducecompression does not help

    54

    Nonlinearity AUC (raw audio)

    Rectified linear, max(0, x) 0.8366Logarithmic, log(1 + C x2) 0.7508

    Logarithmic, log(1 + C |x|) 0.7487

  • 8/10/2019 Deep Learning and Feature learning for MIR

    51/71

    Deep learning and feature learning for MIR 55

    Addi f li l l

  • 8/10/2019 Deep Learning and Feature learning for MIR

    52/71

    Deep learning and feature learning for MIR

    Adding a feature pooling layer letsthe network learn invariances

    56

    Pooling method Pool size AUC (raw audio)

    No pooling 1 0.8366

    L2 pooling 2 0.8387

    L2 pooling 4 0.8387

    Max pooling 2 0.8183

    Max pooling 4 0.8280

    Th l i t f filt th t

  • 8/10/2019 Deep Learning and Feature learning for MIR

    53/71

    Deep learning and feature learning for MIR

    The pools consist of filters that areshifted versions of each other

    57

  • 8/10/2019 Deep Learning and Feature learning for MIR

    54/71

    Deep learning and feature learning for MIR 58

    Learning features from raw audio is possible, but thisdoesnt work as well as using spectrograms (yet).

  • 8/10/2019 Deep Learning and Feature learning for MIR

    55/71

    Deep learning and feature learning for MIR

    IV. Transfer learning bysupervised pre-training

    59

  • 8/10/2019 Deep Learning and Feature learning for MIR

    56/71

    Deep learning and feature learning for MIR

    Supervised feature learning

    60

    dog

    catrabbit

    penguin

    car

    table

    input output

    features!

    Supervised feature

  • 8/10/2019 Deep Learning and Feature learning for MIR

    57/71

    Deep learning and feature learning for MIR

    Supervised featurelearning for MIR tasks

    61

    lots of training data for:

    - automatic tagging

    - user listening preference prediction

    (i.e. recommendation)

    GTZAN genre classification 10 genres

    Unique genre classification 14 genres1517-artists genre classification 19 genres

    Magnatagatune automatic tagging 188 tags

    Tag and listening prediction differ

  • 8/10/2019 Deep Learning and Feature learning for MIR

    58/71

    Deep learning and feature learning for MIR

    Tag and listening prediction differfrom typical classification tasks

    - multi-label classification

    - large number of classes (tags, users)

    - weak labeling- redundancy

    - sparsity

    62

    use WMF for label space

    dimensionality reduction

  • 8/10/2019 Deep Learning and Feature learning for MIR

    59/71

    Deep learning and feature learning for MIR

    Schematic overview

    63

  • 8/10/2019 Deep Learning and Feature learning for MIR

    60/71

    Deep learning and feature learning for MIR

    Source task results

    64

    Model NMSE AUC mAP

    Linear regression 0.986 0.75 0.0076

    MLP (1 hidden layer) 0.971 0.76 0.0149

    MLP (2 hidden layers) 0.961 0.746 0.0186

    User listening preference prediction

  • 8/10/2019 Deep Learning and Feature learning for MIR

    61/71

    Deep learning and feature learning for MIR

    Source task results

    65

    Model NMSE AUC mAP

    Linear regression 0.965 0.823 0.0099

    MLP (1 hidden layer) 0.939 0.841 0.0179

    MLP (2 hidden layers) 0.924 0.837 0.0179

    Tag prediction

    Target task results:

  • 8/10/2019 Deep Learning and Feature learning for MIR

    62/71

    Deep learning and feature learning for MIR

    Target task results:GTZAN genre classification

    66

    Target task results:

  • 8/10/2019 Deep Learning and Feature learning for MIR

    63/71

    Deep learning and feature learning for MIR

    Target task results:Unique genre classification

    67

    Target task results:

  • 8/10/2019 Deep Learning and Feature learning for MIR

    64/71

    Deep learning and feature learning for MIR

    Target task results:1517-artists genre classification

    68

    Target task results:

  • 8/10/2019 Deep Learning and Feature learning for MIR

    65/71

    Deep learning and feature learning for MIR

    Target task results:Magnatagatune auto-tagging (50)

    69

    Target task results:

  • 8/10/2019 Deep Learning and Feature learning for MIR

    66/71

    Deep learning and feature learning for MIR

    Target task results:Magnatagatune auto-tagging (188)

    70

  • 8/10/2019 Deep Learning and Feature learning for MIR

    67/71

    Deep learning and feature learning for MIR

    V. More music recommendation

    71

  • 8/10/2019 Deep Learning and Feature learning for MIR

    68/71

    Deep learning and feature learning for MIR 72

    128

    599

    4

    256

    149

    4x

    MP

    4

    256

    73

    2x

    MP

    4

    512

    35

    4

    2xMP

    global

    temporalpooling

    mean

    15362048 2048

    40

    L2

    max

    Spectrograms

    (30 seconds)

    Latent

    factors

  • 8/10/2019 Deep Learning and Feature learning for MIR

    69/71

    Deep learning and feature learning for MIR 73

  • 8/10/2019 Deep Learning and Feature learning for MIR

    70/71

    DEMO

  • 8/10/2019 Deep Learning and Feature learning for MIR

    71/71

    Papers

    Multiscale approaches to music audio feature learningSander Dieleman, Benjamin Schrauwen, ISMIR 2013

    Deep content-based music recommendationAron van den Oord, Sander Dieleman, Benjamin Schrauwen, NIPS 2013

    End-to-end learning for music audioSander Dieleman, Benjamin Schrauwen, ICASSP 2014

    Transfer learning by supervised pre-training for audio-based music classificationAron van den Oord, Sander Dieleman, Benjamin Schrauwen, ISMIR 2014