from power chord to the power of models - oredev

51
From Power Chords to the Power of Models @aliostad Ali Kheyrollahi

Upload: ali-kheyrollahi

Post on 07-Apr-2017

179 views

Category:

Software


0 download

TRANSCRIPT

Page 1: From Power Chord to the Power of Models - Oredev

From Power Chords

to the Power of

Models

@aliostadAli Kheyrollahi

Page 2: From Power Chord to the Power of Models - Oredev

> stackoverflow> £1.5 bln

global fashion destination

> 35% every year

Page 3: From Power Chord to the Power of Models - Oredev

8

Local pop music

9

Local pop music “Cheelee pom!”

10

Boney M “Rasputin”

11

Blondie “Heart of Glass”

Page 4: From Power Chord to the Power of Models - Oredev
Page 5: From Power Chord to the Power of Models - Oredev

Infobox

Free textLinks

Page 6: From Power Chord to the Power of Models - Oredev

Data Acquisition

Page 7: From Power Chord to the Power of Models - Oredev

Data Source - Wiki

4,990,2794,990,279 English Articles

37,583,879 Articles

Page 8: From Power Chord to the Power of Models - Oredev

Data Source - Wiki vs BritannicaFeng Zhu (assistant prof at Harvard):

“There has been lots of research on the accuracy of Wikipedia, and the results are mixed—some studies show it is just as good as the experts, others show [that] Wikipedia is not accurate at all.”

“… the editors [of Britannica] are still not found to be more objective than the crowd in articles that are sufficiently revised.”

Page 9: From Power Chord to the Power of Models - Oredev

Data Source - Wikipedia in scholar papers

0

45000

90000

135000

180000

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014Source: Google Scholar

Page 10: From Power Chord to the Power of Models - Oredev

Data Acquisition - Wiki

List of Rock Genres Rock Genres Rock Artists

Store

Store HTML

Capture Links

Store HTML

Python scripts

Postgres

Page 11: From Power Chord to the Power of Models - Oredev

Data Source - Content vs. Data

Hyphen U+002D

figure dash U+2012

minus sign U+2015

em dash U+2014

en dash U+2013

Page 12: From Power Chord to the Power of Models - Oredev

Data Exploration

Page 13: From Power Chord to the Power of Models - Oredev

Data Exploration

“I personally … literally just look at the screen, just like the matrix”

Claudia Perlich, multi-award winner Data Scientist

Page 14: From Power Chord to the Power of Models - Oredev

Data Exploration

“… the dirty little secret that I have won all of them because I have found something wrong with the data… I would like to play around with dataset and get initimately familiar with dataset and its properties.“

Claudia Perlich

Page 15: From Power Chord to the Power of Models - Oredev

Album Genre

Page 16: From Power Chord to the Power of Models - Oredev

Album Genre

http://wiki-rock.azurewebsites.net/top10-album-genres.html

Page 17: From Power Chord to the Power of Models - Oredev

Data Models

Page 18: From Power Chord to the Power of Models - Oredev

Data Models Model?!

Page 19: From Power Chord to the Power of Models - Oredev

Data Models Model

Mathematical representation of a concept based on parameters that impact that concept

• Rating of a native app • Stackoverflow score • Credit score • Fraud check

Page 20: From Power Chord to the Power of Models - Oredev

“All models are wrong… but some are useful.

George Box

Data Models Model

Page 21: From Power Chord to the Power of Models - Oredev

Data Models Graph 101

Social Network Analysis and Graph Theory

• Nodes/vertices and edges/lines • Directedness:

• Directed • Undirected

• Degree, InDegree/OutDegree • Weight

A B

Page 22: From Power Chord to the Power of Models - Oredev

Data Models Centrality

12

4

2

2

1

Same degree Different betweenness

Degree

Page 23: From Power Chord to the Power of Models - Oredev

Graph Codez

import networkx as nx

g = nx.Graph() g.add_edge(‘a’, ‘b’) g.add_edge(‘b’, ‘c’) … print len(g[‘b’]) # degree c = nx.betweenness_centrality(g, normalized=True) # c -> dictionary of node names and their score

DiGraph()

Page 24: From Power Chord to the Power of Models - Oredev

Modelling Influence using Wiki

Page 25: From Power Chord to the Power of Models - Oredev

Data Models Cited Influence

Howlin’ Wolf

Captain Beefheart

1940 1964

Page 26: From Power Chord to the Power of Models - Oredev

Data Models Cited InfluenceMost influential Rock Artists Based on out-degree

The Beatles => 188 Black Sabbath => 127 Led Zeppelin => 118 Jimi Hendrix => 114 Bob Dylan => 94 Pink Floyd => 86 Iron Maiden => 77 Metallica => 77 The Rolling Stones => 66 The Beach Boys => 65 Neil Young => 63 Nirvana => 62 Slayer => 60 Queen => 59

Page 27: From Power Chord to the Power of Models - Oredev

Data Models Cited InfluenceMost influential Rock Artists Based on Betweenness Centrality

Jimi Hendrix => 53476.2014921 The Beatles => 47511.7957531 Bob Dylan => 38107.0298185 Led Zeppelin => 32701.7223273 Nirvana => 29733.9066836 Metallica => 29356.6009213 Queen => 28989.2844223 Robert Smith => 28880.670718 Elvis Presley => 28463.2891497 Slade => 27656.487307 Iron Maiden => 22449.6697023 Ramones => 22437.6112965 Rush => 21125.9481602 Neil Young => 19913.887522

Page 28: From Power Chord to the Power of Models - Oredev

Data Models Cited InfluenceMost influential Artists Based on Betweenness Centrality

Metallica => 566.06 Iron Maiden => 419.21 Corey Taylor => 146.0 Led Zeppelin => 122.73 Slipknot => 116.58 King Diamond => 94.7 Machine Head => 85.12 Rush => 70.41 Black Sabbath => 68.0 Van Halen => 54.56 Deep Purple => 53.5 Megadeth => 42.63 Guns N' Roses => 24.25

Heavy MetalNirvana => 490.08 Muse => 114.5 Weezer => 97.33 Pixies => 94.17 Sonic Youth => 78.5 Rivers Cuomo => 69.5 Siouxsie and the Banshees => 51.67 The Smiths => 51.5 Jeff Buckley => 46.17 The Offspring => 43.0 Placebo => 42.0 My Chemical Romance => 34.0 The Smashing Pumpkins => 32.33

Alternative RockRush => 54.0 Marillion => 34.0 Pink Floyd => 33.0 Yes => 20.0 Porcupine Tree => 19.5 Dream Theater => 19.0 Chris Squire => 16.5 Primus => 15.0 Tool => 12.0 Mahavishnu Orchestra => 8.0 Geddy Lee => 7.0 Neil Peart => 5.0 Keith Emerson => 5.0

Progressive Rock

Page 29: From Power Chord to the Power of Models - Oredev

Data Models PageRank

Page 30: From Power Chord to the Power of Models - Oredev

Data Models Page RankThe Beatles => 0.00837723421839 Blind Lemon Jefferson => 0.00837369035189 Josh White => 0.00824945015047 Bessie Smith => 0.00717743996144 Louis Armstrong => 0.00692897940193 James P. Johnson => 0.00628676810257 Little Richard => 0.00584677302727 Muddy Waters => 0.005773172933 Tampa Red => 0.00572032424174 Robert Johnson => 0.00523579252974 Big Bill Broonzy => 0.00516075834679 Moon Mullican => 0.0050657751593 Black Sabbath => 0.00498789229732 Elvis Presley => 0.00497932058047 Duke Ellington => 0.00465800760107 Bo Diddley => 0.0044496675634 Jimmy Page => 0.00437658472459 Frank Zappa => 0.00431978608953 Miles Davis => 0.00396303890974 Jimi Hendrix => 0.00391117233916 Sister Rosetta Tharpe => 0.00390833570401 Bing Crosby => 0.00385435213525 Bob Dylan => 0.00358608821536 James Brown => 0.00349870931123

Page 31: From Power Chord to the Power of Models - Oredev

Other Models

Page 32: From Power Chord to the Power of Models - Oredev

Weighted graph Album GenresKrautrock

Psychedelic Rock

Experimental Rock

1

1

1

Page 33: From Power Chord to the Power of Models - Oredev

Genre Affinity

Indie Rock

Shoegazing

Alternative Rock

Dream Pop

22

25

2412

Post-rock

Page 34: From Power Chord to the Power of Models - Oredev

Genre Affinity

Gothic Metal

Doom Metal

Black Metal

Heavy Metal

13

34

2712

Stoner Metal

Page 35: From Power Chord to the Power of Models - Oredev

Clustering in Networks

Page 36: From Power Chord to the Power of Models - Oredev

Clustering in Networks

u1 u2 u3 u4 u5u1 1 0 0 1u2 1 1 1 0u3 0 1 0 1u4 0 1 0 1u5 1 0 1 1

Adjacency Matrix (Similarity Matrix)

u1 u2 u3 u4 u5u1 2u2 3u3 2u4 2u5 3

Degree Matrix1

5

4

2

3

Page 37: From Power Chord to the Power of Models - Oredev

Clustering in Networks

u1 u2 u3 u4 u5u1 2u2 3u3 2u4 2u5 3

Spectral Clustering: Using Eigenvectors of the Laplacian Matrix

−u1 u2 u3 u4 u5

u1 1 0 0 1u2 1 1 1 0u3 0 1 0 1u4 0 1 0 1u5 1 0 1 1

=u1 u2 u3 u4 u5

u1 2 -1 0 0 -1u2 -1 3 -1 -1 0u3 0 -1 2 0 -1u4 0 -1 0 2 -1u5 -1 0 -1 -1 3

Degree MatrixAdjacency Matrix (Similarity Matrix)

Laplacian Matrix

Page 38: From Power Chord to the Power of Models - Oredev

Clustering in Networks

Eigenvector: a vector (v) that by getting multiplied in matrix A does not result in changing its direction (similar to being multiplied by scalar λ)

u1 u2 u3 u4 u5

-0.7 0.3 -0.2 -0.1 0.7-0.7 0.3 -0.2 -0.1 0.7

Page 39: From Power Chord to the Power of Models - Oredev

Spectral Clustering Codez

from sklearn.cluster import spectral_clustering import numpy as np

A = [[0.0 for x in n] for x in n] … # build adjacency matrix res = spectral_clustering(np.matrix(A), n_clusters) # res -> list of cluster indices e.g. [1,1,0,5,…]

Page 40: From Power Chord to the Power of Models - Oredev

Spectral Clustering Results

Folk Rock Country Rock

Blues Folk

Country Americana Roots Rock Blues Rock

Southern Rock

Power Metal Progressive Metal Symphonic Metal Black Metal Melodic Death Metal Groove Metal Nu Metal Thrash Metal

Death Metal Metalcore Industrial Metal Gothic Metal Christian Metal Doom Metal Speed Metal

Alternative Rock Indie Rock

New Wave Synthpop

Electronica

Rock R&B Pop

Pop Rock Funk Soul

Heavy Metal Hard Rock

Alternative Metal

Page 41: From Power Chord to the Power of Models - Oredev

Intelligent Models

Page 42: From Power Chord to the Power of Models - Oredev

word2vec Model

Skip-gram: a proximity-based probability model trained using Neural Networks (Deep Learning)

Pink Floyd were an English rock band formed in LondonX XX

Page 43: From Power Chord to the Power of Models - Oredev

word2vec Representation

rock

0000000100000

0000000

0010000000000

Pink Floyd

band

formed

London

0000000010000

0000000000010

1000000000000

0.90.10.20.40.10.1

0.80.10.10.40.10.2

pop

Page 44: From Power Chord to the Power of Models - Oredev

word2vec Demo

Page 45: From Power Chord to the Power of Models - Oredev

Album Genre Model

Fun Happy Saturday We Are Friends Electronic Frozen Blood In My Veins Redneck Dance Chaos and Mayhem Basement Dub

Sentiment Analysis in text

Predicting the genre based on name of the album

Page 46: From Power Chord to the Power of Models - Oredev

Deep Learning Basics

1) Traditional Neural Networks with many layers2) Often uses convolution as the node function 3) Training on Big Data can take weeks even on GPU

0) A method of supervised learning

4) Huge success attributed to improved training, powerful computation and above all Big Data5) Pooling, Dropout and local connections important

Page 47: From Power Chord to the Power of Models - Oredev

Deep Learning Topology

Page 48: From Power Chord to the Power of Models - Oredev

Deep Learning TensorFlow

“Wish you were here”

=> [123, 101, 42, 1969 ]=> [123, 101, 42, 1969, 0, 0, 0, … 0 ]

Rock=> [0, 0, 0, 1, 0, 0, 0, 0 ]

=> [[100000000000],[000000010000], … ]

Page 49: From Power Chord to the Power of Models - Oredev

Deep Learning Demo

Page 50: From Power Chord to the Power of Models - Oredev

Wrap-up

Page 51: From Power Chord to the Power of Models - Oredev

References•All pictures from wikipedia.org used under Creative Commons •Source of all data is from wikipedia.org collected online using a single call and then stored and processed •Efficient Estimation of Word Representations in Vector Space. Mikolov et. al. http://arxiv.org/abs/1301.3781 •Gensim's word2vec •networkx lib •word2vec blog post (500K docs): Five crazy abstractions my Deep Learning word2vec model just did •word2vec on Rock music blog: Daft Punk+Tool=Muse: word2vec model trained on a small Rock music corpus •code for word2vec on wiki data •Highcharts: highcharts •word2vec paper: PDF •Automatic real-time road marking recognition using a feature-driven approach PDF •Video of the road marking recognition: here and here and here •Future of Programming - Rise of the Scientific Programmer (and fall of the craftsman) •Deep Learning articles •code for Deep Learning genre analysis •…