networks, maps, relations

61
Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Upload: anila

Post on 22-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Networks, Maps, Relations. (Humanities Hackathon 2012, Day 4). Objects of study : novels, species, philosophers, philosophies, words, concepts, languages, songs…. The problem at hand : describe relationships between the objects. (similarity, influence, equivalence, co-location….). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Networks, Maps, Relations

Networks, Maps, Relations

(Humanities Hackathon 2012, Day 4)

Page 2: Networks, Maps, Relations
Page 3: Networks, Maps, Relations
Page 4: Networks, Maps, Relations

Objects of study: novels, species, philosophers, philosophies, words, concepts, languages, songs….

The problem at hand: describe relationships between the objects. (similarity, influence, equivalence, co-location….)

Page 5: Networks, Maps, Relations

Graphs

• Simplest case: relations between pairs of objects.

• BINARY: objects are either related or they’re not (no attempt to measure extent or other qualities)

Page 6: Networks, Maps, Relations

(D.P. Hayes, Social Network Theory and the Claim that Shakespeare of Stratford…)

Page 7: Networks, Maps, Relations
Page 8: Networks, Maps, Relations
Page 9: Networks, Maps, Relations

How I made this graph (not recommended)

• adj <- array(c(0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,0,0,0,1,0,1,1,0,1,0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,0,1,0,0,0,1,1,0,1,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,1,1,1,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,0,0,1,0,0,0,1,1,0,1,0,0),c(20,20))

• >PL = graph.adjacency(adj,mode="undirected")

Page 10: Networks, Maps, Relations

How I made this graph

>Names = c( "Beaumont”, "Chapman" "Chettle" , "Dekker”, "Drayton" "Fletcher" , "Greene" , "Heywood" "Jonson" , "Kyd” ,"Lodge” , "Lyly" "Marlowe" , "Marston" , "Middleton" "Munday" , "Nashe" , "Peele" "Webster" , "SHAKESPEARE”)

> V(PL)$name = Names OR> V(PL)$name <- Names

Page 11: Networks, Maps, Relations

Graphs

A graph (or network) consists of:

• A set of vertices (or nodes)• A set of edges of the form (v,w) where v and w

are vertices.• Two vertices are adjacent if they are joined by

an edge.

Page 12: Networks, Maps, Relations

Directed graphs

Undirected graphs model symmetric relations: A is connected to B means B is connected to A.

(similarity, overlap, blood relation…)

Directed graphs (or digraphs) model non-symmetric relations:

(biological descent, Internet links, phone calls…)

Page 13: Networks, Maps, Relations

Weighted graphs

In a weighted graph, edges are assigned numbers – typically measuring the strength of a relation, not just whether it is there or not.

(e.g. edge from v to w records number of e-mails from v to w, not just existence of e-mail from v to w.)

Page 14: Networks, Maps, Relations
Page 15: Networks, Maps, Relations
Page 16: Networks, Maps, Relations

Shakespeare graph (undirected):• Vertices are Elizabethan playwrights• Edges are collaborations (or friendships, or co-

defendancies)

Page 17: Networks, Maps, Relations
Page 18: Networks, Maps, Relations
Page 19: Networks, Maps, Relations

MORAL: A picture of a graph is not a graph. The graph is the list of adjacencies, nothing more.

Page 20: Networks, Maps, Relations

ASIDE: why do this?

Oversimplification, BUTAll statements about books are

oversimplifications, e.g. “Raymond Carver wrote Cathedral”

Our goal is “distant reading”

Page 21: Networks, Maps, Relations

Basic notions

• The degree (or valence) of a vertex is the number of edges attached to it. Loose measure of “importance”

> degree(PL) Beaumont Chapman Chettle Dekker Drayton Fletcher 2 5 7 10 5 5

…Webster SHAKESPEARE

4 9

Page 22: Networks, Maps, Relations

• For directed graphs, the in-degree of a vertex x is the number of edges pointing to x, and the out-degree is the number of edges emanating from x.

• Web graph: in-degree = number of links pointing to my page, out-degree = number of outbound links on my page

Page 23: Networks, Maps, Relations

Basic notions• The distance between two vertices is the length of the shortest

chain of adjacencies connecting them.• > shortest.paths(PL,"SHAKESPEARE","Lyly")• Lyly• SHAKESPEARE 3• > lapply(get.shortest.paths(PL,'SHAKESPEARE','Lyly'),function(x)

V(PL)$name[x])• [[1]]• [1] "SHAKESPEARE" "Greene" "Nashe" "Lyly" (sorry for this ugliness)

Page 24: Networks, Maps, Relations

Basic notions

• The diameter of a graph is the greatest distance between any two vertices.

• > diameter(PL)• [1] 5• > farthest.nodes(PL)• [1] 1 12 5• > shortest.paths(PL,1,12)• Lyly• Beaumont 5

Page 25: Networks, Maps, Relations

Complete graphs

• Every vertex adjacent to every other5 vertices10 edges

Page 26: Networks, Maps, Relations

Complete graphs

More generally: n vertices, each vertex connected to n-1 others for a total of n(n-1)

This counts each edge twice!So (n^2-n)/2 edges.Number of edges scales as number of vertices

squared: studying a graph on 10 times as many vertices can take 100 times as long. (Or more, depending on the question asked…)

Page 27: Networks, Maps, Relations

Trees

A tree is a graph in which every two vertices are joined by one, but only one, path. Equivalently: no cycles.

Page 28: Networks, Maps, Relations
Page 29: Networks, Maps, Relations

Communities

• A clique is a set of vertices which are all mutually adjacent.

(So: any pair of adjacent vertices is a clique of size 2, any “triangle” is a clique of size 3…)

• e.g Shakespeare, Dekker, Chettle.• > largest.cliques(PL)• [[1]]• [1] 4 3 16 8 20

(Dekker,Chettle,Munday,Heywood,Shakespeare)

Page 30: Networks, Maps, Relations
Page 31: Networks, Maps, Relations

Communities

A graph is connected if any vertex can be reached from any other by a chain of adjacencies. Every graph breaks up into connected pieces called connected components.

Page 32: Networks, Maps, Relations

A geometry of their own

“Really, universally, relations stop nowhere, and the exquisite problem of the artist is eternally but to draw, by a geometry of his own, the circle within which they shall happily appear to do so.” (Henry James, preface to Roderick Hudson)

How to draw this circle?

Page 33: Networks, Maps, Relations

Clustering

Connected component: a set of vertices which has no connection to the remainder of the graph.

Cluster: a set of vertices which has relatively few connections to the rest of the graph.

(Note that this isn’t a definition…) Many ways to cluster, no “right way”

Page 34: Networks, Maps, Relations

Clustering in R• > edge.betweenness.community(PL)• Graph community structure calculated with the edge betweenness algorithm• Number of communities (best split): 2 • Modularity (best split): 0.2781065 • Membership vector:• Membership vector:• Beaumont Chapman Chettle Dekker Drayton Fletcher • 1 1 1 1 1 1 • Greene Heywood Jonson Kyd Lodge Lyly • 2 1 1 2 2 2 • Marlowe Marston Middleton Munday Nashe Peele • 2 1 1 1 2 2 • Webster SHAKESPEARE • 1 1

Page 35: Networks, Maps, Relations

How the clusters look

Page 36: Networks, Maps, Relations

“The University Wits were a group of late 16th century English playwrights who were educated at the universities (Oxford or Cambridge) and who became playwrights and popular secular writers. Prominent members of this group were Christopher Marlowe, Robert Greene, and Thomas Nashe from Cambridge, and John Lyly, Thomas Lodge, George Peele from Oxford.” (Wikipedia)

Page 37: Networks, Maps, Relations

Macbeth

Page 38: Networks, Maps, Relations

Clusters of characters in Macbeth> edge.betweenness.community(Macbeth)Graph community structure calculated with the edge betweenness algorithmNumber of communities (best split): 10 Modularity (best split): 0.06733369 Membership vector: MACBETH LADY MACBETH MACDUFF MALCOLM 1 2 1 1 ROSS BANQUO First Witch LENNOX 1 3 4 1 First Murderer DUNCAN Second Witch Third Witch 2 5 4 4 ALL SIWARD Messenger Second Murderer 1 6 7 8 Servant SEYTON 9 10

Page 39: Networks, Maps, Relations

Breakpoint

When can networks tell us things we don’t already know?

Page 40: Networks, Maps, Relations

200 names

Vertices: 200 baby names for boys popular in 2011.

For each name, record popularity in WI, TX, PA, CA, MA, GA, OH, MO, FL, CO, NY, IL

Edges: Two names are adjacent if their popularity distribution across states are “very similar”

Page 41: Networks, Maps, Relations

200 names

• >lapply(largest.cliques(MaleNames), function(x) V(MaleNames)$name[ x ])

[[1]][1] "Jacob" "Anthony" "Dylan" "Matthew"

"Brian" (popular in NY,CA,MA, less so in CO,MO,GA)

Page 42: Networks, Maps, Relations

200 names• > V(MaleNames)$name[neighbors(MaleNames,'Malachi')]• [1] "Ashton" "Ashton" "Kaden" "Kaden" "Malachi"

"Malachi"• > V(MaleNames)$name[neighbors(MaleNames,'Owen')]• [1] "Maxwell" "Maxwell" "Brady" "Brady" "Cole" "Cole"

"Owen" "Owen" • V(MaleNames)$name[neighbors(MaleNames,'Patrick')]• [1] "Thomas" "Thomas" "Patrick" "Patrick" "John" "John"

"Sean" "Sean" "Ryan" "Ryan" "Peter" "Peter"

Page 43: Networks, Maps, Relations
Page 44: Networks, Maps, Relations
Page 45: Networks, Maps, Relations

edge.betweenness.communities finds groups of girls’ names like

• Alaina, Maci, Mackenzie, Lillian, Addison, Alivia

• Piper, Harper, Brooklyn, Brooklynn• Aubrey, Zoey, Autumn, Ellie• Lucy, Josephine, Elise, Clara, Eleanor

Page 46: Networks, Maps, Relations

Density

How likely are two things to be related?The density of a graph is the probability that two random

elements are related: i.e.[total number of edges]/[total number of pairs of vertices]>graph.density(MaleNames)[1] 0.1084846> graph.density(FemaleNames)[1] 0.09950159>graph.density(Macbeth)[1] 0.2810458

Page 47: Networks, Maps, Relations

Transitivity

• A relation is transitive if “A related to B” and “B related to C” implies “A related to C.”

Transitive: “Is descended from,” “born in same city as”

Non-transitive: “is friends with”, “lived at some point in same city as”

Page 48: Networks, Maps, Relations

How transitive is a graph?

Some relations are transitive, others are not. But we don’t have to stop at “yes” or “no”.

How frequently are two friends of yours friends with each other?

• Always• Never• Something in between

Page 49: Networks, Maps, Relations

How transitive is a graph?

Transitivity (or “clustering coefficient”) gives the probability that two random neighbors of the same vertex are neighbors to each other.

> transitivity(MaleNames)[1] 0.4972335> transitivity(FemaleNames)[1] 0.4546713> transitivity(Macbeth)[1] 0.4545455

Page 50: Networks, Maps, Relations

How transitive is a graph?

In both name cases, two random neighbors have about a 50% chance of being connected (while two random vertices have about a 10% chance of being connected.) Quite transitive!

Facebook thinks the same is true for “friends” (and makes this so by thinking so!)

Page 51: Networks, Maps, Relations

Stub: incompletely specified networks

Standard problem: incomplete data. Did X and Y collaborate? Lack of an edge might mean “we know they didn’t” or “we don’t know that they did.”

One idea: use network structure – if graph is highly transitive, and X and Y have many common collaborators, this is evidence that X and Y collaborated.

Page 52: Networks, Maps, Relations

Metrics, clustering, trees

Suppose given: a set of objects (e.g. novels) and for each pair of objects a degree of dissimilarity (a number)

(survey data, lexical similarity, voting similarity…)

This data (subject to “triangle inequality”) is called a metric on the set of objects.

Page 53: Networks, Maps, Relations

Metrics, clustering, trees

Can we associate each object with a point on the plane so that the distances between points correspond to the dissimilarities between objects?

Page 54: Networks, Maps, Relations

Metrics, clustering, trees

Distance From City Distance To City Distance (km)Newark Jersey City 8.02Paterson Elizabeth 28.3Toms River Edison 65.4Trenton Camden 45.55Clifton Cherry Hill 126.24Passaic East Orange 11.84Union City North Bergen 2.92Irvington Bayonne 12.38South Vineland Wayne 176.47Union Vineland 149.49New BrunswickBloomfield 42.14Perth Amboy East Brunswick 15.46West Orange Plainfield 23.19West New York Hackensack 11.18Sayreville Junction Lakewood 41.97Atlantic City Sayreville 121.87Teaneck Linden 36.19……

Page 55: Networks, Maps, Relations

Metrics, clustering, trees

Doesn’t always work: 4 objects, each pair at distance 1.

Multidimensional scaling: embeds objects in the plane (or higher-dimensional space) while approximately realizing desired distances.

(e.g. Rosenberg, Nelson, Vivekananthan (1968)

Page 56: Networks, Maps, Relations
Page 57: Networks, Maps, Relations

Hierarchical clustering

A clustering of a set is a partition into categories.A hierarchical clustering is when we partition

the categories into subcategories, subcategories into subsubcategories….

Page 58: Networks, Maps, Relations
Page 59: Networks, Maps, Relations

A hierarchical clustering on a set of objects is the same as a tree whose leaves are the objects!

Agglomerative clustering, etc. – find hierarchical clustering that best respects measured dissimilarities (analogue of MDS)

Page 60: Networks, Maps, Relations

• Desideratum: objects that are very dissimilar should not be in the same subsubsubsubcategory (or: their distance in the tree should be large)

Page 61: Networks, Maps, Relations

LET US HACK!