cs 336 march 19, 2012 tandy warnow. basic graph terminology nodes, vertices, edges, degrees, paths,...

34
CS 336 March 19, 2012 Tandy Warnow

Upload: kirsten-wiglesworth

Post on 31-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

CS 336 March 19, 2012

Tandy Warnow

Page 2: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Basic Graph Terminology

• Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated vertices, trees, forests

• Directed graphs: indegree, outdegree, trees

Page 3: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Advanced terminology

• Cliques• Independent sets• Chromatic number and vertex colorings• Eulerian cycles and Eulerian paths• Hamiltonian paths• Matchings• Dominating Set• Vertex Cover

Page 4: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Paths, Connected Components, etc.

• A path is a sequence of vertices v1, v2, …, vn

so that vi is adjacent to vi+1 for i=1,2,…,n-1. A simple path is one that does not have repeated vertices.

• A graph is connected if every pair of vertices in the graph is connected by some path.

• A connected component is a maximal subset of the vertices that is connected.

Page 5: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Cycles

• A cycle in a graph is a path that starts and ends at the same vertex.

• A simple cycle is a cycle that does not have any repeated vertices (other than the start and end vertex).

• A graph is acylic if it has no simple cycles.

Page 6: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Trees

• Two types: rooted and unrooted

• Unrooted (simplest): acylic connected graph

• Rooted: take an unrooted tree, pick one node to be the root, and direct all edges away from the root. Voila!

Page 7: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Theorems about trees

Let T be a connected acyclic graph (i.e., a tree) with n vertices (n>0). Then:

• T has at least one leaf (node with degree 0 or 1).

• T has n-1 edges.

• Every edge in T is a cut-edge.

• Every tree can be 2-colored.

Page 8: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Theorem: Every tree has at least one leaf (node of degree 1)

Theorem: For any tree T with at least one vertex, T has at least one leaf (node with degree 0 or 1).

Proof: • If n=1, then T is a single vertex which is a leaf. • Else, n>1. Let P be a longest simple path in T, so P=v1,v2,

…,vk.• If vk has degree 1, we are done. Otherwise, vk has at least

two neighbors, and so some neighbor w other than vk-1. If w is in P, then we have a simple cycle in T, contradicting that T is a tree. If w is not in P, then we can extend P and get a longer path, contradicting that P is a longest simple path in T.

• Hence, vk has degree 1, and we are done.

Page 9: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Theorem: Any tree with n>0 nodes has n-1 edges

• Proof: by induction on n.• Base case: n=1 (trivial)• Inductive hypothesis: for some positive

n, any tree on n nodes has exactly n-1 edges.

• Let T be a tree on n+1 nodes. We want to show T has exactly n edges.

Page 10: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Proof (cont’d)

• Let v be a node in T with degree 1.

• Remove v from T. The result is a tree T’ with n nodes, and hence n-1 edges (by the inductive hypothesis)

• T’ contains one fewer edge and one fewer vertex (node) than T, and so T has n edges.

Page 11: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Theorem: every edge in a tree is a cut-edge

Proof (by contradiction). • Suppose T is a tree, e=(v,w) is an edge in T that is

not a cut-edge.• Then G=T-{e} (but keeping v and w) is connected.

Hence there is a simple path P from v to w in G. Since e is not in G, P does not include edge e.

• Therefore, we can form a simple cycle C by adding edge e to P.

• Since every edge in C is in T, this means that T is not acyclic, contradicting the assumption that T is a tree (connected acyclic graph).

Page 12: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Vertex Coloring

• A (proper) vertex coloring of a graph is a function c: V -> {1,2,…,k}, s.t. no two adjacent vertices are mapped to the same color.

• The chromatic number of a graph is the minimum number of colors needed to properly color the graph.

• How many colors does a tree need?

Page 13: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

2-coloring a tree

• Theorem: every connected acyclic graph (i.e., tree) can be 2-colored.

• Proof: by induction on the number of vertices.

Page 14: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Proof that every tree can be 2-colored

• Let G be a tree on n vertices. The base case is n=1. Clearly every tree on 1 vertex can be 2-colored.

• The Inductive Hypothesis is that for some positive integer n, any tree on n vertices can be 2-colored.

• Let G be a tree with n+1 vertices. We want to show that G can be 2-colored.

Page 15: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Proof (cont’d)

• Let v be a node in G that has degree 1, and let w be its unique neighbor in G.

• Consider the graph G’ formed by deleting v (and its incident edge but not w) from G.

• G’ is also acyclic (why?) and has n-1 vertices.• Therefore, by the inductive hypothesis, G’ can be 2-

colored. • We extend the coloring from G’ to G, by letting c(v)

be 1 if c(w)=2, and c(v)=2 if c(w)=1.• Note that this coloring is proper for G.• Hence G can be 2-colored.

Page 16: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Structural Induction

• This was a proof by structural induction.

• Proofs by structural induction can be applied more generally!

Page 17: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Theorem about rooted trees

• A rooted tree in which every node has 0 or 2 children is called a “binary tree”

• Theorem: every binary tree with n nodes has (n-1)/2 internal nodes (defined to be nodes with more than 0 children).

• Proof: by strong induction on n.• Base case: n=1. Such a tree has no internal

nodes, so it is true.

Page 18: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Proof, cont’d.

• Strong Inductive hypothesis: for some n>0, and for all positive integers k up to n, all rooted binary trees with k nodes have (k-1)/2 internal nodes.

• Let T have n+1 nodes, and let the children of the root be A and B. (We know the root has two children, since if it had no children, T would have 1 node, contradicting our hypothesis.)

We want to show Int(T) = n/2

Page 19: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

We want to show Int(T) = n/2

• TA, the subtree of T rooted at A, is a binary tree; let nA be the number of nodes in TA

• TB, the subtree of T rooted at B, is a binary tree; let nB be the number of nodes in TB

• Let Int(T) be the number of internal nodes of T, and Int(TA) and Int(TB) be similarly defined.

Page 20: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

We want to show Int(T) = n/2

• Then nA and nB are both at most n, and by the inductive hypothesis

Int(TA) = (nA-1)/2

Int(TB ) = (nB-1)/2

• Therefore

Int(T) = (nA-1)/2 + (nB-1)/2 + 1

Page 21: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

We want to show Int(T) = n/2We have established that

Int(T) = (nA-1)/2 + (nB-1)/2 + 1

Simplifying this, we get

Int(T) = (nA-1 + nB -1 + 2)/2 = (nA + nB)/2

Note nT = nA + nB + 1

Therefore,

Int(T) = (nT - 1)/2

Recall that nT = n+1. Therefore,

Int(T) = n/2

Q.E.D.

Page 22: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Genome Assembly

• Given a DNA sequence, technology can allow you to get a collection of k-mers (substrings of length k) that come from analyses of the sequence.

• From these k-mers, your objective is to come up with the sequence.

Page 23: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Genome Assembly

• Let X be a very long DNA sequence

• Consider all k-mers in X, with k big enough so that no k-mer appears two or more times

• Goal: reconstruct X from its set of k-mers

Page 24: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Genome Assembly, attempt #1

Approach 1:

• Make a node for each k-mer, and put a directed edge from v to w if the k-1 suffix of v is the k-1 prefix of w.

• Create the graph for the following string, using k=5– ACATAGGATTCAC

Page 25: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Genome Assembly, attempt #1

Approach 1:

• Make a node for each k-mer, and put a directed edge from v to w if the k-1 suffix of v is the k-1 prefix of w.

• Every such graph has a Hamiltonian Path, as long as no k-mer appears more than once!

Page 26: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Hamiltonian Path

• A Hamiltonian Path in a graph visits every node exactly once

Page 27: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Genome AssemblyAttempt #1

• Create the graph for the following string, using k=5– ACATAGGATTCAC

• Does the graph have a Hamiltonian Path?• Is it unique?• Can you reconstruct the sequence from the

path?

Page 28: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Hamiltonian Path

• A Hamiltonian Path in a graph visits every node exactly once

• Determining if a graph has a Hamiltonian Path is NP-Complete

• So this approach to Genome Assembly is computationally intensive (infeasible)

Page 29: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Eulerian Cycles

• An Eulerian cycle is one that goes through every edge exactly once

• It is easy to see that if a graph has an Eulerian cycle, then every node has even degree. The converse is also true, but a bit harder to prove.

• For directed graphs, the cycle will need to follow the direction of the edges (also called “arcs”). In this case, a graph has an Eulerian cycle if and only if the indegree is equal to the outdegree for every node.

Page 30: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Eulerian Paths

• An Eulerian path is one that goes through every edge exactly once

• It is easy to see that if a graph has an Eulerian path, then all but 2 nodes have even degree. The converse is also true, but a bit harder to prove.

• For directed graphs, the cycle will need to follow the direction of the edges (also called “arcs”). In this case, a graph has an Eulerian path if and only if the indegree(v)=outdegree(v) for all but 2 nodes (x and y), where indegree(x)=outdegree(x)+1, and indegree(y)=outdegree(y)-1.

Page 31: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

de Bruijn Graph

Input: the set of k-mers for the DNA sequence

Output: the de Bruijn Graph• Vertices: the (k-1)-mers• Directed edges: from v->w if the (k-2)-suffix of

v is the (k-2)-prefix of w, and the k-mer formed by starting with v and ending with w is one of the k-mers in the input

Page 32: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

de Bruijn Graph

• If the k-mer set comes from a sequence and no k-mer appears more than once in the sequence, then the de Bruijn graph has an Eulerian path!

Page 33: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

Using de Bruijn Graphs

Given: set of k-mers from a DNA sequence

Algorithm: • Construct the de Bruijn graph• Find an Eulerian path in the graph• The path defines a sequence with the

same set of k-mers as the original

Page 34: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated

de Bruijn Graph

• Create the de Bruijn graph for the following string, using k=5– ACATAGGATTCAC

• Find the Eulerian path• Is the Eulerian path unique?• Reconstruct the sequence from this path