iplayr user study group 2008.01.23 daniel wu gordon chang
TRANSCRIPT
iPlayr User Study Group
2008.01.23Daniel Wu
Gordon Chang
Task assigned
• Design user study to see different effect on– Melody (M), – Lyric (L), – Melody + Lyrics (M+L)
• Design user study interface – example: how to efficiently select 5 out of 70
User Study
• Purpose– 印證 [ 對使用者而言 ] 除了傳統的訊號處理,加上 lyrics 的資訊
來判斷一首歌的情緒會更接近 ground truth 。
• Implication for iPlayr– Form the basis for adding lyrics information (semantic
s) into music recommendation system. – ???
User Study
• Possible methods– Select a ground truth for each piece of music.– Compare M, L, and M+L performance with tha
t ground truth.
User Study• Details
– Select users (TBD)
– Select songs to be tested (TBD)
– Select features to be rated (TBD, probably only emotional features)
– Select framework to rate feature (TBD, PA/PAD/Gordon-walking-pad /6-emotion/comparative*)
– Select ground truth to be compared (TBD, see On Ground Truth slide)
– Each user study consist of three sessions and a pre-session• Pre-session: introduce iPlayr and the experiment• M session: melody-only session, probably consist of 3 songs• L session: lyric-only session, consist of same songs with M session, only pr
esented in different/random order.• M+L session: melody-and-lyric session, presented to the subject in different
order.
– In each session, user listens to the music or read the lyrics, and rate the selected features
On Ground Truth
• Possible source of ground truth– CAL500– User-dependent (use user’s his/her own M+L as his/her groun
d truth)– Comparative* (use Hotter or Notter method)
• Why use CAL500 ground truth?– An established framework– A good benchmark to see the effect of our work
Hotter or Notter
• http://hotter.csie.org/about/
• 消除絕對分數比較,每個使用者評分標準不同的偏誤(不需像 Pandora 那樣需要專家來給絕對分數)
• Large-scale ranking by Sparse Paired Comparisons (avg. 3 votes for 1-object-1-feature)
• Comparison pairs selected by computer
Possible Challenge / Questions
• User Study Purpose / Impact– User study 的目的是印證 [ 對使用者而言 ] ,歌詞對一首歌的角色,然而 iPlayr 作的是
[ 對機器而言 ] 。是否可再確定 User study 的目的?
• User Study Details– User study 的 subject 要如何定義、尋找?– User study 要挑多少首歌?怎麼挑?歌本身可能與跟結果 dependent
• All CAL500• Clustered-pick
– 每一首歌要放完整首,還是可以只放一小片段• 要看 David 的結果,看 30 秒的片段是否有代表性,舉例:進退兩難
– 每次都是 M + L 放在最後?(都熟悉了當然最接近 ground truth )• Control group ( 單純聽 M+L )
• Ground Truth– CAL500 的 Ground truth 是怎麼訂出來的?– 若用絕對給分,每個人的給分標準不同,可能造成偏誤
• Normalize• Hotter or Notter
Experiment
Experiment
• Testers:– 2 people, Daniel and Gordon– Scoring 18 emotions for each song rating from 1 to 5
• Music pieces– Selected from CAL 500 database by testers– 6 songs played randomly– Stopped when all testers finished tagging
• Constraints– Not able to skim through previous answers– Not able to fill in in the first 15 seconds
• Small difference• Effected by previous song?• Become more conservative
Gordon A B C D E F Daniel A B C D E F
Happy 4 2 2 4 1 4 Happy 2 2 3 3 1 3
Sad 2 1 3 2 5 2 Sad 4 1 2 1 4 1
Calming / Soothing 4 1 4 1 4 2 Calming / Soothing 3 1 1 1 3 1
Arousing / Awakening 2 5 3 5 2 3 Arousing / Awakening 2 4 4 5 1 3
Pleasant / Comfortable
5 4 3 3 3 2 Pleasant / Comfortable
2 2 3 3 3 2
Cheerful / Festive 1 3 3 5 1 4 Cheerful / Festive 1 2 3 3 2 4
Tender / Soft 5 1 2 1 4 1 Tender / Soft 4 1 1 1 2 1
Powerful / Strong 1 5 3 4 4 3 Powerful / Strong 2 5 4 5 1 2
Loving / Romantic 5 1 2 1 3 1 Loving / Romantic 4 1 1 1 3 2
Carefree / Lighthearted
2 1 2 3 2 3 Carefree / Lighthearted
2 2 4 3 2 3
Exciting / Thrilling 1 5 3 5 1 1 Exciting / Thrilling 1 4 4 4 1 2
Emotional / Passionate
4 5 4 5 3 3 Emotional / Passionate
3 3 4 3 3 2
Positive / Optimistic 3 4 3 4 2 4 Positive / Optimistic 2 2 3 3 2 3
Touching / Loving 5 1 2 2 2 1 Touching / Loving 4 1 1 1 3 1
Light / Playful 2 2 2 4 1 3 Light / Playful 1 3 2 3 1 4
Angry / Aggressive 1 5 2 3 2 1 Angry / Aggressive 1 3 3 3 1 1
Laid-back / Mellow 4 1 3 1 2 1 Laid-back / Mellow 4 1 1 1 3 2
Bizarre / Weird 1 2 2 1 2 2 Bizarre / Weird 1 2 1 3 1 1