web intelligence complex networks i

21
Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence networks in this lecture come from a fabulous site wman, U of Michigan: http://www-personal.umich.edu

Upload: nara

Post on 16-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Web Intelligence Complex Networks I. This is a lecture for week 6 of `Web Intelligence. Example networks in this lecture come from a fabulous site of Mark Newman, U of Michigan: http://www-personal.umich.edu/~mejn/. This part of the course: WI. Introductory Points. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Web Intelligence Complex Networks I

Web Intelligence

Complex Networks I This is a lecture for week 6 of

`Web Intelligence

Example networks in this lecture come from a fabulous site ofMark Newman, U of Michigan: http://www-personal.umich.edu/~mejn/

Page 2: Web Intelligence Complex Networks I

This part of the course: WIwhen what why what else

Sep 27 Complex Networks I The WWW is a huge complex network. Many other networks are overlaid upon it. Networks have important and interesting properties, re: speed of information spread, robustness, speed and quality of search, etc.

Assignment 1 – worth 70% of CW

Oct 4 Complex Networks II Assignment 2 – worth 15% of CW

Oct 11 Web Search: How google works Search, obviously, is central to web intelligence

Assignment 3 –

worth 15% of CW

Nov 8 Text mining and knowledge discovery from the WWW

Towards inferring useful new knowledge automatically; also for better search, for non marked-up web

Reading

Nov 15 Web communities and cultural models

Understanding how the web influences

the formation and behaviour of groups, and the spread of information

Reading

Page 3: Web Intelligence Complex Networks I

Introductory Points

Graphs and networks are of central importance to us, because:

• The web is a large and complex network• Major phenomena that underpin our existence, such as how information spreads, how diseases develop, how economies evolve, are best viewed mathematically as networks.• Networks have structural properties and behaviour. When we analyse the structure of a network, we can reveal important clues about its behaviour. E.g.

• Predict how fast a virus, or rumour will spread on the web• Assess which are the most authoritative web sites• Predict how long it will take to search sections of the web• Predict how robust to damage an area of the www is, or a cellular process is, etc.

Page 4: Web Intelligence Complex Networks I

This Week’s Material

Basic Intro to graphs and networks, terminology, and so on.

The interesting properties of real-world networks.

Metrics and other structural properties that are currently used toanalyse both the www and other networks. To support the understanding of metrics and properties, this week we cover basics of graphs and networks.

Page 5: Web Intelligence Complex Networks I

The very basics

A graph is a set of two things: G = {V, E} V = a set of vertices (also called nodes) e.g. V = {A, B, C, D}

E = a set of edges (also called arcs, or links) e.g. E = { {A,C}, {A,D}, {B,C}, {B, D} } in which each edge is a set of two vertices from V

This graph is:A B

C D

Page 6: Web Intelligence Complex Networks I

The very basics II

An undirected edge between A and B: {A, B} (or {B, A})

A directed edge between A and B: (A, B)

A loop at A: {A, A} or (A, A) A

A B

A B

In an undirected graph, all edges are undirected.In a directed graph, all edges are directed.

Page 7: Web Intelligence Complex Networks I

The very basics III

The degree of a node, in an undirected graph, is the number ofedges attached to it. In this one, the degrees are:A: 2 B: 3 C: 3 D: 3 E: 0 F: 1 G: 2 What is the mean degree of this graph?

A B

C D

FE

G

Page 8: Web Intelligence Complex Networks I

The very basics IV

Nodes in directed graphs have in-degrees and out-degrees.Here: Node: in,out as follows:A: 1, 2 B: 1, 2 C: 2, 1 D: 2, 2 E: 1, 1 F: 1,2 G: 0, 2 A directed graph without cycles is called a DAG.Is this a DAG?

A B

C D

FE

G

Page 9: Web Intelligence Complex Networks I

The very basics VThis is an unlabelled graph.

It is exactly the same as (isormorphic to) this one:

This is a labelled graph.

homepage

teaching research

graphs

homepage

teaching

graphs

research

Since labels and links have meaning, this one is different:

Page 10: Web Intelligence Complex Networks I

Diversity of graphs: considering only loop-free graphs

How many different 2-node, labelled undirected graphs are there?

How many different 2-node, labelled directed graphs are there?

How many different 3-node, labelled undirected graphs are there?

Suppose there are G(k) possible undirected labelled graphs on k nodes.Whenever we add one extra node to an und. Lab. graph on k nodes: Any subset of the k existing nodes could link to it, and there are 2k such subsets. So the number of possible und. lab. graphs on k+1 nodes is 2k times what it is on k nodes.

Page 11: Web Intelligence Complex Networks I

Example numbers for undirected labelled graphs

Size of graph Number of possible graphs

5 nodes 1024

10 nodes 35,184,372,088,832

20 nodes 1.6 1057

100 nodes 1.3 101490

1000 nodes a lot.

Page 12: Web Intelligence Complex Networks I

More basics

If there is a path in the graph from each node to every other, theGraph is connected, else it is unconnected. This one?

A B

C D

FE

G

Page 13: Web Intelligence Complex Networks I

More basics II

Most graphs of interest and importance are far from complete –they tend to be called sparse.

A B

C D

The complete (undirected) graph onn nodes is the graph that containsall n(n1/)/2 possible edges.Is this one complete?

Think about the following graphs: 1: Nodes = students in this university; Edge {A,B} exists if A and B have the same birthday. 2. Nodes = web pages: Edge (A,B) exists if A links to B. 3. Nodes = types of molecules in our bloodstream, Edge(A,B) exists if A interacts with B. 4. Nodes = all living humans. Edge{A,B} exists if A and B have ever shaken hands.

Page 14: Web Intelligence Complex Networks I

More Structural Properties

Diameter: length of the longest path between any two nodes

Number of components: in undirected graphs

Degree distribution: An interesting and important fingerprint ofa graph that we will see more of.

Modularity: A graph is highly modular if it has several clustersof nodes with many links within the clusters, but few links betweenthe clusters.

Hierarchical modularity. A graph seems to be hierarchically modularif it is modular, as above, but the modules are themselves modular.

Page 15: Web Intelligence Complex Networks I

Some NetworksOne of these is a network of protein interactions in yeast. The other is a visualisation of an outbreak of TB.

What do the nodes and edges represent? And … which is which?

Page 16: Web Intelligence Complex Networks I

Is this: spread of HIV infection (node = person / link = HIV transfer)

or is it: books about politics (node = book / link = one mentions the other)

Page 17: Web Intelligence Complex Networks I

Notice how the book network is polarised

Page 18: Web Intelligence Complex Networks I

The internet

Page 19: Web Intelligence Complex Networks I

Assignment 1 Read: Exploring Complex Networks, by Steven Stroglatz,

Nature 410, 268—276

Write: A 500-word `executive summary’ of most of this article. Leave out Box 1, and the section “Regular networks of

coupled dynamical systems”, restart at “Complex network architectures”.AND

Write: A 100-word account of what you assess to be the three main points conveyed by this articleWrite: A 200-word essay about the relevance of those points to the topic of your BSc or MSc (e.g. relevance to AI; relevance to IT(Business), etc..)

Word limits in this assignment are important; over the limits means losing marks

Page 20: Web Intelligence Complex Networks I

Marking

30% of the marks: completeness and readability

30% of the marks: evidence of understanding the article, and generally making sense

30% of the marks: clarity of your arguments

10% of the marks: for making me say “Wow”

Page 21: Web Intelligence Complex Networks I

Next week

Much more advanced, about:• Degree distributions• Cluster Co-efficients• Modularity and hierarchy• Random networks vs real networks• Some basic graph algorithms

• Another article, much smaller, to read.