(big) data science

Post on 12-May-2015

1.544 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

slides from my talk at WebExpo Prague 2013

TRANSCRIPT

GraphAwareTM

by Michal Bachman

and a bit of Graph Theory

(Big) Data Science

GraphAwareTM

“the sexiest job in the 21st century”

HARVARD BUSINESS REVIEW

Data Science

GraphAwareTM

by 2018 the United States could be short up to 190,000 people with the analytical skills ... to make wise use of virtual mountain ranges of data for critical decisions in business, energy, intelligence, health care, finance, and other fields.

McKinsey Global Institute (2011)

Data Science

GraphAwareTM

GraphAwareTM

“hybrid computer scientist/software engineer/statistician”

The Times

Data Scientist

GraphAwareTM

a collection of data sets that are large and complex.

Big Data

GraphAwareTM

is a function of size, connectedness, and uniformity.

Data Complexity

GraphAwareTM

a pattern of interconnections among a set of things.

Network

GraphAwareTM

Social ties

Information we consume

Technological and economic systems

...

Networks

GraphAwareTM

a pattern of interconnections among a set of things.

Network

GraphAwareTM

implicit consequences of one’s actions for the outcomes of everyone in the system

who is linked to whom

Structure Behaviour

GraphAwareTM

is the study of network structure.

Graph Theory

GraphAwareTM

0

25.0

50.0

75.0

100.0

2007 2008 2009 2010

GraphAwareTM

Leonhard Euler

GraphAwareTM

Seven Bridges of Königsberg

A

B

C D

GraphAwareTM

Graph Theory

A

B

C D

GraphAwareTM

Graph Theory

A

B

C D

GraphAwareTM

Graph Theory

A

B

C D

GraphAwareTM

Connected Graph

A

B

C D

E

F

GraphAwareTM

Connected Components

GraphAwareTM

is the social network of the entire world connected?

Question:

GraphAwareTM

(probably :-))

No.

GraphAwareTM

Giant Components

GraphAwareTM

how many giant components are there in a large, complex network?

Question:

GraphAwareTM

why?

1

GraphAwareTM

“I read somewhere that everybody on this planet is separated only by six other people. Six degrees of separation. Between us and everyone else on this planet.”

Six Degrees of Separation: A Play. (John Guare)

Six Degrees of Separation

GraphAwareTM

average Bacon number for all performers in the IMDb.

2.9

GraphAwareTM

Collaboration networks

Who-talks-to-whom graphs

Information linkage graphs

Technological networks

Natural world networks

Transport networks

...

Graphs Are Everywhere

GraphAwareTM

Domain interest

Proxy for a related network

Look for domain-agnostic properties

Motivations for Study

GraphAwareTM

People learned about new jobs through acquaintances rather than close friends.

Granovetter’s Experiment

A

B C

GraphAwareTM

Triadic Closure

A

B C

GraphAwareTM

Triadic Closure

A

B C

GraphAwareTM

If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future.

Triadic Closure

A

D

C

E

B

GraphAwareTM

Bridge

A

D

C

E

B

A

D

C

E

B

F H

J KG

GraphAwareTM

Local Bridge

A

B C

A

B C

GraphAwareTM

Strong Triadic Closure

A

D

C

E

BA

D

C

E

B

F H

J KG

A

D

C

E

B

F H

J KG

GraphAwareTM

Local Bridge = Weak Tie

A

B C

GraphAwareTM

Structural Balance

A

B C

GraphAwareTM

Structural Balance

A

B C

A

B C

A

B C

GraphAwareTM

Structural Balance

A

B C

A

B C

GraphAwareTM

Structural Balance

A

B C

A

B C

A

B C

A

B C

GraphAwareTM

Structural Balance

A

B C

B

C D

A

B

C D

A

GraphAwareTM

Structural Balance

GraphAwareTM

If a labelled complete graph is balanced, then either all pairs of nodes are friends, or else the nodes can be divided into two groups, X and Y, such that each pair of people in X likes each other, each pair of people in Y likes each other, and everyone in X is the enemy of everyone in Y.

The Balance Theorem

B

C D

A

B

C

D

A

GraphAwareTM

The Balance Theorem

GraphAwareTM

Graph Partitioning

GraphAwareTM

is an open-source, fully transactional graph database. It manipulates data in the form of a directed property graph with labelled vertices and edges.

Neo4j

name: "Drama"type: "genre"

name: "Triller"type: "genre"

name: "Pulp Fiction"year: 1994type: "movie"

DIRECTED

IS_OF_GENRE

name: "Quentin Tarantino"type: "person"

name: "Director"type: "occupation"

name: "Actor"type: "occupation"

IS_OF_GENRE

ACTED_IN

name: "Samuel L. Jackson"type: "person"

IS_A

IS_A

IS_A

ACTED_IN

role: "Jules Winnfield"

role: "Jimmie Dimmick"

GraphAwareTM

Neo4j

GraphAwareTM

MATCH (a)-[:ACTED_IN]->(m)

Cypher Query Language

GraphAwareTM

MATCH (a)-[:ACTED_IN]->(m)

Cypher Query Language

GraphAwareTM

START a=node(*)MATCH (a)-[:ACTED_IN]->(m)

Cypher Query Language

GraphAwareTM

START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)

Cypher Query Language

GraphAwareTM

START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)ORDER BY count(m) DESC

Cypher Query Language

GraphAwareTM

START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)ORDER BY count(m) DESCLIMIT 5;

Cypher Query Language

GraphAwareTM

==> +-----------------------------+==> | a.name | count(m) |==> +-----------------------------+==> | "Tom Hanks" | 12 |==> | "Keanu Reeves" | 7 |==> | "Hugo Weaving" | 5 |==> | "Meg Ryan" | 5 |==> | "Jack Nicholson" | 5 |==> +-----------------------------+==> 5 rows==> ==> 47 ms

Cypher Query Language

GraphAwareTM

www.graphaware.com@graph_aware

Thank You

top related