cold-start music recommendation using multimodal deep architectures · cold-start music...

69
Cold-Start Music Recommendation Using Multimodal Deep Architectures ORIOL NIETO [email protected] SYSTEMATIC APPROACHES TO DEEP LEARNING METHODS FOR AUDIO ESI WORKSHOP VIENNA, AUSTRIA SEP 15, 2017

Upload: buithuy

Post on 29-Apr-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Cold-StartMusicRecommendationUsingMultimodalDeepArchitectures

[email protected]

SYSTEMATICAPPROACHESTODEEPLEARNINGMETHODSFORAUDIOESIWORKSHOPVIENNA,AUSTRIA

SEP15,2017

• Motivation:TheCold-StartProblem

• Background:CollaborativeFiltering

• Cold-StartMusicRecommendation:

• EstimateCollaborativeFactorsfromAudio

• TheMusicGenomeProject™

• MultimodalDeepArchitectures

Outline

• Motivation:TheCold-StartProblem

• Background:CollaborativeFiltering

• Cold-StartMusicRecommendation:

• EstimateCollaborativeFactorsfromAudio

• TheMusicGenomeProject™

• MultimodalDeepArchitectures

Outline

Cold-Start ProblemTHELONGTAIL

MostPopularTracks

0.01%

Cold-Start ProblemTHELONGTAIL

MostPopularTracks

1%

Cold-Start ProblemTHELONGTAIL

MostPopularTracks

100%

35%oftracks0spinslastweek

• Motivation:TheCold-StartProblem

• Background:CollaborativeFiltering

• Cold-StartMusicRecommendation:

• EstimateCollaborativeFactorsfromAudio

• TheMusicGenomeProject™

• MultimodalDeepArchitectures

Outline

PROBLEMOVERVIEWCollaborative Filtering

Users

? ?

? ? ?

?

? ?

? ?[Items (Tracks) [

Explicit

Thumbs(upanddown)

StationCreation

Implicit

TrackCompletion

TrackSkips

LATENTFACTORSCollaborative Filtering

AggressiveCalm

SimpleHarmony

ComplexHarmony

MATRIXFACTORIZATIONCollaborative Filtering

Koren,Y.,Bell,R.,&Volinsky,C.(2009).MatrixFactorizationTechniquesforRecommenderSystems.Computer,42(8),42–49.

? ?

? ? ?

?

? ?

? ?[ [

Users

Item

s (Tr

acks

)

⇡[ [

k

Users

Item

s (Tr

acks

)

k[ [

PROBLEMFORMULATIONCollaborative Filtering

GivenItemiandUseru: ? ?

? ? ?

?

? ?

? ?[ [

UsersIte

ms

(Trac

ks)

⇡[ [

k

Users

Item

s (Tr

acks

)

k

[ [ItemLatentFactor:qi 2 Rk

UserLatentFactor: pu 2 Rk

Rating: riuriu

RatingApproximation: r̂iu = qTi pu

qipu

argminq⇤,p⇤X

u,i2S(rui � qTi pu)

2 +�(||qi||2 + ||pu||2)Koren,Y.,Bell,R.,&Volinsky,C.(2009).MatrixFactorizationTechniquesforRecommenderSystems.Computer,42(8),42–49.

Collaborative Filtering

Ar[st Title

QueryTrack TheBeatles WhileMyGuitarGentlyWeeps

Ranked1 TheBeatles ADayInTheLife

Ranked2 TheBeatles ADayInTheLife(LoveVersion)

Ranked3 TheBeatles AcrossTheUniverse

EXAMPLE

Collaborative Filtering

Ar[st Title

QueryTrack TheBeatles WhileMyGuitarGentlyWeeps

Ranked35 GeorgeHarrisonWhileMyGuitarGentlyWeeps

(Live)

Ranked82 GeorgeHarrison MySweetLord(Live)

Ranked91PaulMcCartney&Eric

Clapton Something(Live)

EXAMPLE

Ranked158 LedZeppelin Tangerine

THEGOODANDTHEBADCollaborative Filtering

Richpreference-drivensimilarityspace Latentspaceisgenerallynotinterpretable

Powerfulatmatchingtherightsongwiththerightlistener

Canonlyrecommenditemsthathavealreadybeenrated

• Motivation:TheCold-StartProblem

• Background:CollaborativeFiltering

• Cold-StartMusicRecommendation:

• EstimateCollaborativeFactorsfromAudio

• TheMusicGenomeProject™

• MultimodalDeepArchitectures

Outline

Estimate Collaborative Factors

? ?

? ? ?

?

? ?

? ?[ [

Users

Item

s (Tr

acks

)

⇡[ [

k

Users

Item

s (Tr

acks

)

k[ [

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

k

Approximate Item Factors using Audio

Oord,A.VanDen,Dieleman,S.,&Schrauwen,B.(2013).DeepContent-basedMusicRecommendation.AdvancesinNeuralInformationProcessingSystems,2643–2651.

WITHDEEPLEARNINGApproximating Factors using Audio

4096

4096

DenseLayers1024

1DConvolutionalLayers

2048 2048 4096 4096

108

N

4096x4

TimeGlobalPooling

mean

max

L2var

1DConvolutionalLayer

k

TRAININGDATAApproximating Item Factors using Audio

• (Small)Dataset:

• 83ktracks

• 3patchesof35secondspertrack(251kpatches=)

• (Patchesonlyfortraining!)

• Splits:

• Train:80%

• Validation:10%

• Test:10%

M

{X,Y}

TRAININGApproximating Item Factors

• Lossfunction:

• CosineDistance

• Optimization:

• Adam(defaultparams)

• 50%DropoutonDenseLayers

• EarlyStopping

• Mini-batchesof64examples

L(✓) = 1� 1

M

X

X2X,y2Y

f(X; ✓)Ty

||f(X; ✓)||2||y||2

Cosin

eDista

nce

MiniBatches

RESULTSApproximating Item Factors using Audio

Input CosDistance #Epochs Time/Epoch

Audio(35sPatches) 0.25 22 ~2h

RESULTSApproximating Item Factors using Audio

Input CosDistance #Epochs Time/Epoch

Audio(35sPatches) 0.25 22 ~2h

Audio(FullTracks) 0.21 - -

• Motivation:TheCold-StartProblem

• Background:CollaborativeFiltering

• Cold-StartMusicRecommendation:

• EstimateCollaborativeFactorsfromAudio

• TheMusicGenomeProject™

• MultimodalDeepArchitectures

Outline

The Music Genome Project™

>1.5Milliontracksmanuallyanalyzed

~400attributespertrack

AttributeExamples

BreathyVoiceNasalVoiceOddMeterHasBanjoJoyfulLyrics

Ar[st Title

QueryTrack TheBeatles WhileMyGuitarGentlyWeeps

Ranked1 IVThieves TheSoundAndTheFury

Ranked2 Journey TooLate

Ranked3 AlbertLee LookOutCleveland

Recommending Music using the MGP™EXAMPLE

Ar[st Title

QueryTrack TheBeatles WhileMyGuitarGentlyWeeps

Ranked1 IVThieves TheSoundAndTheFury

Ranked2 Journey TooLate

Ranked3 AlbertLee LookOutCleveland

Recommending Music using the MGP™EXAMPLE

Ar[st Title

QueryTrack TheBeatles WhileMyGuitarGentlyWeeps

Ranked1 IVThieves TheSoundAndTheFury

Ranked2 Journey TooLate

Ranked3 AlbertLee LookOutCleveland

Recommending Music using the MGP™EXAMPLE

Ar[st Title

QueryTrack TheBeatles WhileMyGuitarGentlyWeeps

Ranked1 IVThieves TheSoundAndTheFury

Ranked2 Journey TooLate

Ranked3 AlbertLee LookOutCleveland

Recommending Music using the MGP™EXAMPLE

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

k

Approximate Factors using the MGP™

DEEPARCHITECTUREApproximate Factors using the MGP

N

4096

4096

DenseLayers

k

TRAININGDATAApproximating Item Factors using the MGP™

• (Small)Dataset:

• 83ktracks()

• Splits:

• Train:80%

• Validation:10%

• Test:10%

M

{X,Y}

TRAININGApproximating Item Factors using the MGP™

• Lossfunction:

• CosineDistance

• Optimization:

• Adam(defaultparams)

• 50%DropoutonDenseLayers

• EarlyStopping

• Mini-batchesof256examplesCosin

eDista

nce

MiniBatches

L(✓) = 1� 1

M

X

x2X,y2Y

f(x; ✓)Ty

||f(x; ✓)||2||y||2

RESULTSApproximating Item Factors

Input CosDistance #Epochs Time/Epoch

Audio(35sPatches) 0.25 22 ~2h

Audio(FullTracks) 0.21 - -

MGP 0.15 37 7s

Beyond the MGP™

APPROXIMATEMGPWITHMACHINELISTENING

MACHINELISTENINGGENES

(Coming soon: MGP™ Estimation with Waveforms!)

APPROXIMATEMGPWITHMACHINELISTENING

DEEPARCHITECTUREApproximate Factors using MLG

N

4096

4096

DenseLayers

k

RESULTSApproximating Item Factors

Input CosDistance #Epochs Time/Epoch

Audio(35sPatches) 0.25 22 ~2h

Audio(FullTracks)

0.21 - -

MGP 0.15 37 7s

MLG 0.22 37 7s

• Motivation:TheCold-StartProblem

• Background:CollaborativeFiltering

• Cold-StartMusicRecommendation:

• EstimateCollaborativeFactorsfromAudio

• TheMusicGenomeProject™

• MultimodalDeepArchitectures

Outline

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

k

Combine Methods to Approximate Factors

k

Combine Methods to Approximate Factors

N 4 4 k4 4

Dens(vanden

11D

2 2 4 41

N 40m

mL

v

1D

k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-

LATE-FUSIONDEEPARCHITECTURECombine Methods to Approximate Factors

4096x2

4096

4096

DenseLayers

k

RESULTSApproximating Item Factors

Input CosDistance #Epochs Time/Epoch

Audio(35sPatches)

0.25 22 ~2h

Audio(FullTracks)

0.21 - -

MGP 0.15 37 7s

MLG 0.22 37 7s

Audio+MLG 0.19 37 7s

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

k

Further Multimodality to Approximate Factors

LATE-FUSIONDEEPARCHITECTUREFurther Multimodality to Approximate Factors

4096x2+31

4096

4096

DenseLayers

k

one-hotvectorencoding

RESULTSApproximating Item Factors

Input CosDistance #Epochs Time/Epoch

Audio(35sPatches)

0.25 22 ~2h

Audio(FullTracks)

0.21 - -

MGP 0.15 37 7s

MLG 0.22 37 7s

Audio+MLG 0.19 37 7s

Audio+MLG+genres 0.16 37 7s

ISALRIGHTMore data

• LARGEDataset:

• ~900kmostpopulartracks

• 3patchesof35secondspertrack(~2.7Mpatches=)M

{X,Y}

Input Trainedon TestSet CosDistance

Audio SMALL SMALL 0.21

ISALRIGHTMore data

Input Trainedon TestSet CosDistance

Audio SMALL SMALL 0.21

Audio LARGE LARGE 0.37

• LARGEDataset:

• ~900kmostpopulartracks

• 3patchesof35secondspertrack(~2.7Mpatches=)M

{X,Y}

ISALRIGHTMore data

Input Trainedon TestSet CosDistance

Audio SMALL SMALL 0.21

Audio LARGE LARGE 0.37

Audio SMALL LARGE 0.64

• LARGEDataset:

• ~900kmostpopulartracks

• 3patchesof35secondspertrack(~2.7Mpatches=)M

{X,Y}

ISALRIGHTMore data

Input Trainedon TestSet CosDistance

Audio SMALL SMALL 0.21

Audio LARGE LARGE 0.37

Audio SMALL LARGE 0.64

Audio LARGE SMALL 0.21

• LARGEDataset:

• ~900kmostpopulartracks

• 3patchesof35secondspertrack(~2.7Mpatches=)M

{X,Y}

Recommendation Examples

Ar[st Title

QueryTrack TheBeatles WhileMyGuitarGentlyWeeps

Ranked1 BobDylan Knockin’OnHeavensDoor

Ranked2 NeilYoung HeartOfGold

Ranked3 TheRollingStones Angie

Recommendation Examples

Ar[st Title

QueryTrack Sargon Con[nuarà

Ranked1 Mudvayne Happy?

Ranked2 Mudvayne ForgetToRemember

Ranked3 StoneSour Hell&Consequences

Long Tail Context

MostPopularTracks

100%

35%oftracks0spinslastweek

20spinslastweek

Recommendation Examples

Ar[st Title

QueryTrack Sargon Con[nuarà

Ranked1 Mudvayne Happy?

Ranked2 Mudvayne ForgetToRemember

Ranked3 StoneSour Hell&Consequences

Recommendation Examples

Ar[st Title

QueryTrack Sargon Conqnuarà

Ranked1 Mudvayne Happy?

Ranked2 Mudvayne ForgetToRemember

Ranked3 StoneSour Hell&Consequences

Recommendation Examples

Ar[st Title

QueryTrack Sargon Conqnuarà

Ranked1 Mudvayne Happy?

Ranked2 Mudvayne ForgetToRemember

Ranked3 StoneSour Hell&Consequences

Recommendation Examples

Ar[st Title

QueryTrack Sargon Conqnuarà

Ranked1 Mudvayne Happy?

Ranked2 Mudvayne ForgetToRemember

Ranked3 StoneSour Hell&Consequences

Recommendation Examples

Ar[st Title

QueryTrack LaBossad’Urina ElTiempo

Ranked1 IlDivo Hallelujah

Ranked2SarahBrightman&TheLondonSymphony

OrchestraTimeToSayGoodbye

Ranked3 AndreaBocelli Amapola

Long Tail Context

MostPopularTracks

100%

35%oftracks0spinslastweek

0spinslastweek

Recommendation Examples

Ar[st Title

QueryTrack LaBossad’Urina ElTiempo

Ranked1 IlDivo Hallelujah

Ranked2SarahBrightman&TheLondonSymphony

OrchestraTimeToSayGoodbye

Ranked3 AndreaBocelli Amapola

Recommendation Examples

Ar[st Title

QueryTrack LaBossad’Urina ElTiempo

Ranked1 IlDivo Hallelujah

Ranked2SarahBrightman&TheLondonSymphony

OrchestraTimeToSayGoodbye

Ranked3 AndreaBocelli Amapola

Recommendation Examples

Ar[st Title

QueryTrack LaBossad’Urina ElTiempo

Ranked1 IlDivo Hallelujah

Ranked2SarahBrightman&TheLondonSymphony

OrchestraTimeToSayGoodbye

Ranked3 AndreaBocelli Amapola

Recommendation Examples

Ar[st Title

QueryTrack LaBossad’Urina ElTiempo

Ranked1 IlDivo Hallelujah

Ranked2SarahBrightman&TheLondonSymphony

OrchestraTimeToSayGoodbye

Ranked3 AndreaBocelli Amapola

ENSEMBLE OF RECOMMENDERS MAY PRODUCE OPTIMAL RECOMMENDATIONS

MAN vs MACHINE?

MAN + MACHINE

MAN + MACHINE“Mix,of,Art,and,Science”

MAN + MACHINE“Mix,of,Art,and,Science”

Oramas, S., Nieto, O., Sordo, M., Serra, X., A Deep Multimodal Approach for Cold-start Music Recommendation. Deep Learning for Recommender Systems Workshop, RecSys, Como, Italy 2017

Oramas, S., Nieto, O., Barbieri, F., Serra, X., Multi-label Music Genre Classification From Audio, Text, and Images Using Deep Features. Proc. of the 18th International Society for Music Information Retrieval Conference (ISMIR). Suzhou, China, 2017

MAN + MACHINE“Mix,of,Art,and,Science”

THANKS! [email protected]

Oramas, S., Nieto, O., Sordo, M., Serra, X., A Deep Multimodal Approach for Cold-start Music Recommendation. Deep Learning for Recommender Systems Workshop, RecSys, Como, Italy 2017

Oramas, S., Nieto, O., Barbieri, F., Serra, X., Multi-label Music Genre Classification From Audio, Text, and Images Using Deep Features. Proc. of the 18th International Society for Music Information Retrieval Conference (ISMIR). Suzhou, China, 2017