trees notes

Upload: subramanyam62

Post on 10-Mar-2016

216 views

Category:

Documents


0 download

DESCRIPTION

trees

TRANSCRIPT

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    CSE 211: Data Structures Lecture Notes VII

    DIRECTED GRAPHS

    Graph is a mathematical structure heavily used in computing problems. A directed graph consists of a set of points (vertices) and a set of directed edges(arcs) which represent connections between the points. A traversal is going through a graph in a systematic way that each vertex is visited exactly once. In this lecture we will see some properties of directed graphs, their representation using adjacency matrix and adjancency list, and two important traversal algorithms: the depth-first search and breadth-first search.

    Definitions A graph G is an ordered pair of sets (V,E) where V is a set of vertices and E is a set of edges.

    V = {v1, v2, ..., vn} is a set of vertices E = {e1, e2, ..., em} is a set of edges An edge is given as a ei = (vk, vl) where vk and vl are in the vertex set V, where 1 k,l n and 1 i m.

    Undirected graph: ei = (vk, vl) = (vl, vk) both represent the same edge

    1 4

    5

    G2

    V2 = {1, 2, 3, 4, 5} E2 = {(2,1), (2,3), (2,4), (2,5)}

    2

    3

    2

    V1 = {1, 2} E1 = {(1,1), (1,2)}

    G1

    1

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Directed graph (digraph) is a graph whose edges are ordered pairs. This imposes a directionality to each edge. We will use sGd to means the edge (s,d) is in the edge set of G. We say that d is adjacent to s. The set of all vertices adjacent to s is called the adjacency set of s.

    Properties of Digraphs Reflexivity: G is reflexive iff xGx for all vertices x in V. The G I refleive iff every vertex in G has a self loop.

    Irreflexivity: G is irreflexive iff none of its vertices has a self loop.

    Symmetry: G is symmetric iff whenever xGy then yGx.

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Antisymmetry: G is antisymmetric iff whenever xGy and yGx then x=y.No two distinct vertices have edges in both direction.

    Transitivity: G is transitive iff for each triple of vertices whenever xGy and yGz then xGz

    Paths A path is a sequence of edges such that the destination of one edge is the source of the next. The path is simple if all its vertices except possibly the first and lst are distinct. Length of the path is the number of edges in it. If there is a path from x to y then y is reachable from x.

    A single edge is a path of length 1. A self loop is a path of lenth 1.

    Q: Is there a path from 1 to 5? A:Yes, p={, , } ( {1,2,4,5} ) with length = 3 Connectivity: A digraph is connected for all vertices if we can find a path starting from some vertex. A digraph is strongly connected iff from each vertex there is at least one path to each other vertices.

    a b

    a

    b

    c

    a b c

    d

    1

    2 3

    45

    1

    2 3

    45

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Cycles: A cycle is a path such that the vertex at the destination of the last edge is the source of the first edge. A digraph is acyclic iff it has no cycles in it. In-degree of a vertex is the number of edges arriving at that vertex Out-degree of a vertex is the number of edges leaving that vertex

    Implementation of Directed Graphs:

    1) Adjacency Matrix

    One way to represent a digraph G with K vertices is by KxK boolean matrix G' called adjancey matrix where G' (x,y) =1 iff xGy G'=

    G' (x,y)=0 iff not xGy row x indicates the adjacency set of vertex x.

    Example:

    p = {, , } In-degree(v) = 3 Out-degree(v) = 2

    v a

    b c

    G

    Connected Strongly connected

    1

    23

    5

    4

    6

    1

    23

    5

    4

    6

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Adjacency Matrix is:

    1 2 3 4 5 6

    1 0 1 1 1 0 0

    2 1 0 0 0 1 0

    3 0 0 0 0 1 0

    4 0 0 0 1 0 0

    5 0 0 1 0 0 1

    6 0 0 0 1 0 1

    Advantage: Determining whether y is adjacent to x is done in O(1). Disadvantage: Even if the graph has few edges KxK storage units are needed and any algorithm to examine the whole graph will have performance O(N2 )

    2) Adjacency list

    In many graphs adjacency matrix is sparse. Therefore a linear list representation is better. Each vertex x is a header for a linear list. Each node of this list represents a vertex adjacent to x. The headers can be stored in an array.

    Assuming pointers and integers identifying vertices all occupy the same number of bytes. When the adjacency matrix is more economical than adjancency list?

    Calculate the performance of two implementation strategies. (Will be explained during the lecture)

    1 2 3 4 2 1 5 3 5 4 4 5 3 6 6 4 6

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Graph Traversals

    Some applications require the graph to be traversed . This means that starting from some designated vertex the graph is walked around in a symmetric way such that vertex is visited only once. Two algorithms are commonly used:

    1. Depth-First Search

    2. Breadth-First Search

    DEPTH-FIRST SEARCH: Finds all graph vertices reachable from a starting vertex in a way that explores a given path from the starting vertex before starting another path. Strategy: Go deeper along a path. Algorithm: 1. Place the starting vertex (node) x in the set VISITED 2. Do whatever application dependent work is necessary to be done by visiting a vertex (node) 3. For each vertex y adjacent to x, if y has not been visited, call the depth-first search algorithm

    recursively with y as the starting vertex (node)

    Digraph G 1

    2 3 4

    5 6

    7 Assuming that as an application dependent work we will display the vertex index, if we appply Depth- first search on G: Starting at 1 results: 1 2 5 7 3 6 4 Starting at 2 results: 2 5 7 Starting at 3 results: 3 1 2 5 7 4 6

    BREADTH-FIRST SEARCH: Visits all vertices adjacent to the starting vertex, then visits all vertices adjacent to those vertices and so on. The search is broad rather than deep. Algorithm: 1. Clear the queue Q. 2. Place the starting vertex in the set VISITED 3. Enqueue x in Q 4. Do whatever application dependent work is necessary to be done by visiting a vertex (node) 5. Repeat remaining steps as long as Q is not empty

    a. Dequeue a value y from Q

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    b. Fro each vertex z adjacent to y (if z is not visited before) c. Place z in VISITED d. Do whatever application dependent work is necessary to be done by visiting a vertex

    (node) e. Enqueue z in Q

    Assuming that as an application dependent work we will display the vertex index, if we appply Breadth- first search on G: Starting at 1 results: 1 2 3 4 5 6 7 Starting at 3 results: 3 1 5 6 2 4 7

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    TREES

    The notes only cover the basic definitions.

    A tree is a special case of a digraph used to express hierarchical relationships or multiway decisions in computing. For example game tree, company structure, family tree, language tree, expression tree, sorting and searching trees etc.

    Definitions

    The common characterisitics of all these trees is that there is a single vertex (the root) that can be identified as the top of the tree and from the root to any other vertex in the tree there is exactly one path.

    Formally, we can say that tree is a connected digraph such that, 1. There is exactly one vertex called root with in-degree =0 2. All other vertices have in-degree =1

    In a general tree there is no restriction about the out-degree and the vertex set may be infinite.

    In most application of trees vertex set is finite and the vertices at the bottom have out-degree=0. These vertices are called leaves or terminals.

    A vertex which is neither root nor leaf is called a nonterminal vertex (node).

    Every vertex at the destination end of an edge for which the root is the source is itself the root of another smaller tree called a sub-tree.

    The depth of a tree is the length of the longest path from the root.

    The destination vertices are called children and the root of it is called parent. Children of the same parent are called siblings.

    Vertex (node) n1 is an ancestor of n2 (and n2 is a descendant of n1) if n1 is either the parent of n2 or the parent of some ancestor of n2.

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    A node n2 is a left descendant of node n1 if n2 is either the left child of n1 or a descendant of the left child of n1. A right descendant may be similarly defined.

    Binary Trees

    A binary tree is a tree all of whose vertices have out-degree = 2. The subtrees of a binary tree are ordered in the sense that there is a left child and a right child. If a vertex has only one child it should

    be clearly identified as right or left.

    Properties of Binary Trees

    Strictly Binary Tree: T is a strictly binary tree iff each of its vertices has out-degree = 0 or 2.

    Complete Binary Tree: T is a complete binary tree of depth K iff each vertex of level K is a leaf and each vertex of level less than K has non empty left and right children. A complete binary tree depth K always has 2 k+1 -1 vertices. Depth 0 -- single vertex Depth 1 -- 3 vertices Depth 2 -- 7 vertices

    A complete binary tree of N vertices has depth equal to log 2 (N+1)-1

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Almost complete: All K-1 level vertices are complete but level K vertices are missing (useful in some sorting algorithms such as heap sort).

    Balanced: T is balanced iff for each vertex t in T, the depth of t's left and right subtrees differ by at most one. 1. A binary tree consisting of a single vertex is balanced. 2. A vertex with a single subtree is balanced iff that subtree is a leaf. 3. A binary tree is balanced iff, its leaf and right subtrees are balanced and their depth differ by only

    one.

    Implementation of Binary Trees Array representation

    An almost complete binary tree can be represented as an array. 1st element of the array is the root, 2nd and 3rd elements are children of the root , the nodes are stored level by level.

    W R T M E G S A F D C E

    Lemma: If a complete binary tree with n nodes (depth log 2 n + 1) is represented sequentially. Then any node with index i 1 i n we have

    1. parent(i) is at i/2, if i 1, ( when i=1, the root has no parent. )

    2. leftChild(i) is at 2i, if 2i n, (no left child, if 2i > n.)

    W

    R T

    M E G S

    A F D C E

    W

    R T

    M E G S

    A F D C E

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    3. rightChild(i) is at 2i+1, if 2i+1 n, (no right child, if 2i+1 > n.)

    Linked List Representation

    Since binary tree is a digraph we could implement it using one of the graph representations. However since binary tree has right and left subtrees we can create a specialized structure. Node LCHILD DATA RCHILD

    Operations on Binary Trees:

    There are a number of primitive operations that can be applied to a binary tree. For example: Assume that p is a pointer to a the node n1 of a binary tree, the following functions may be defined: info(p): returns the contents (data field) of n1. leftChild(p), rightChild(p) : return pointers to the left child and right child of n1 respectively. hasLeft(p), hasRight(p): return the value true if the node has leftchild or right child respectively.

    Several other operations can be defined on a tree depending on the application requirements.

    Example Applications of Binary Trees:

    Problem: Find all duplicates in a list of n numbers. Solution: 1) Compare each number with all those that precede it. O(n2) 2) The comparisons can be reduced by using a binary tree. The first number is placed as a root, with empty left and right subtrees. Each successive number then compared to the number in the root. If it matches, we have a duplicate. If it is smaller we examine the left subtree; if it is larger we examine the right subtree. If the subtree is empty, the number is not a duplicate and placed into a new node at that position. If the subtree is not empty then we repeat the above process.

    Let's say tp is a pointer to the root node of a tree (pointer type is tree node type) and we are searching for number n in this tree for a duplicate, if there is not a duplicate then place it in the tree.

    p=tp; while ( n != info(tp) && p != null) { if ( n < info(p) ) p = left(p); else p = right(p); }

    Data

    lchild rchild

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    if ( n == info(p) ) printf ("the number is duplicate"); else if (n < info(p) ) setleft (p, n); else setright( p, n);

    HEAPSORT Binary heap is a completely balanced binary tree satisfying the heap property. Heap property: Given a binary heap A, A[PARENT(i)]A[i]

    Maintaining Heap Property

    HEAPIFY(A, i) 1. L LEFT(i) 2. r RIGHT(i) 3. if lheap-size[A] and A[l]>A[i] 4. then largest l 5. else largest r 6. if r heap-size[A] and A[r]>A[largest] 7. then largest r 8. if largest i 9. then exchange A[i]] A[largest] 10. HEAPIFY(A, largest)

    Running time of HEAPIFY is O(lg n).

    26 24 11 7 6 8 4 1 3 5

    1 2 3 4 5 6 7 8 9 10

    26

    24 11

    7 6 8 4

    1 3 5

    1

    2 3

    4 5 6 7

    8 9 10

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Example Run: HEAPIFY(A,2)

    BUILD-HEAP(A) 1. heap-size[A] length[A] 2. for i length[A]/2 downto 1 3. do HEAPIFY(A, i)

    26

    3 11

    24 6 8 4

    1 7 5

    1

    2 3

    4 5 6 7

    8 9 10 26

    24 11

    3 6 8 4

    1 7 5

    1

    2 3

    4 5 6 7

    8 9 10

    26

    24 11

    7 6 8 4

    1 3 5

    1

    2 3

    4 5 6 7

    8 9 10

    i

    i

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    h=0 Example Run:

    HEAPSORT(A) 1. BUILD-HEAP(A) 2. for i length[A] downto 2 3. do exchange A[1] A[i] 4. heap-size[A] heap-size[A] 1 5. HEAPIFY(A, 1)

    3

    5 4

    1 26 8 11

    24 7 6

    1

    2 3

    4 5 6 7

    8 9 10

    i

    i

    26

    24 11

    7 6 8 4

    1 3 5

    1

    2 3

    4 5 6 7

    8 9 10

    3

    5 4

    1 26 8 11

    24 7 6

    1

    2 3

    4 5 6 7

    8 9 10

    3

    5 4

    24 26 8 11

    1 7 6

    1

    2 3

    4 5 6 7

    8 9 10

    i

    i

    3

    5 11

    24 26 8 4

    1 7 6

    1

    2 3

    4 5 6 7

    8 9 10

    3

    26 11

    24 6 8 4

    1 7 5

    1

    2 3

    4 5 6 7

    8 9 10

    i

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Running time of HEAPSORT = Running time of BUILD-HEAP + n (loop at step 2.) times running time of HEAPIFY = O(n) + O(nlgn) = O(nlgn) Example Run:

    1 3 4 5 6 7 8 11 24 26

    1 2 3 4 5 6 7 8 9 10

    26

    24

    11

    7 6 8 4

    1 3 5

    1

    2 3

    4 5 6 7

    8 9 10

    24

    7 11

    5 6 8 4

    1 3 26

    1

    2 3

    4 5 6 7

    8 9 10 11

    7 8

    5 6 3 4

    1 24

    26

    1

    2 3

    4 5 6 7

    8 9 10

    8

    7 4

    5 6 3 1

    11

    24

    26

    1

    2 3

    4 5 6 7

    8 9 10 7

    6 4

    5 1 3 8

    11

    24

    26

    1

    2 3

    4 5 6 7

    8 9 10

    6

    5 4

    3 1 7 8

    11

    24

    26

    1

    2 3

    4 5 6 7

    8 9 10 1

    3 4

    5 6 7 8

    11

    24

    26

    1

    2 3

    4 5 6 7

    8 9 10

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    BINARY TREE TRAVERSAL: Many applications require traversing a tree so that all the nodes are visited in a certain order. Traversal algorithms are useful in dealing with binary trees. Three different forms of traversal is possible: preorder, postorder, inorder traversal.

    Preorder Traversal: (node-left-right) (depth-first search) void traverse-nlr(t-node) { if (t-node does not have a child ) return; else { visittree(t-node); traverse-nlr( t-node.lchild); traverse-nlr(t-node.rchild); }

    Inorder Traversal (left-node-right) void traverse-lnr(t-node) { if (t-node does not have a child ) return; else { traverse-lnr( t-node.lchild); visittree(t-node); traverse-lnr(t-node.rchild); }

    Postorder Traversal (left-right-node) void traverse-lrn(t-node) { if (t-node does not have a child ) return; else { traverse-lrn( t-node.lchild); traverse-lrn(t-node.rchild); visittree(t-node);

    }

    In Traverse-nlr, a node is visited and its left subtree is traversed. In Traverse-lnr, the left subtree is traversed and then the right subtree is traversed.

    1

    2 3

    4

    5 6

    7 8 9

    PREORDER TRAVERSAL: 1 2 4 7 5 3 6 8 9

    INORDER TRAVERSAL: 7 4 2 5 1 3 8 6 9

    POSTORDER TRAVERSAL: 7 4 5 2 8 9 6 3 1

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    BINARY SEARCH TREES (BST) BSTs are binary trees with a special property: Let x be a node in a BST if y LEFT_SUBTREE( x ), then key[y] key[x] and if y RIGHT_SUBTREE( x ), then key[y] key[x] Recursively speaking:

    1. A leaf node is a BST. 2. A node is the root of a BST if its key value is greater than that of its left child and less than or equal to that of its right child, and if both of its children are either null or the root of a BST.

    Many algorithms that use binary trees proceed in two phases. The first phase builds a binary tree, and the second phase traverses the tree.

    Consider sorting algorithm: given a list of numbers in an input file, we wish to print them in ascending order. As we read the numbers they can be inserted into a binary tree, however unlike the algorithm used to find duplicates, duplicate values are also placed in the tree. When a number is compared with the contents of a node in the tree, a left branch is taken if the number is smaller than the contents of the node, and a right branch if it is greater or equal to the contents of the node.

    Binary search trees can be used in the implementation of efficient insertions and deletions in tables.

    For the input line 14 , 15 , 4, 9, 7, 18, 3, 5, 16, 4, 20, 17, 9, 14, 5

    Operations on a BST:

    1. Traversing a BST 2. Inserting a record in a BST 3. Finding a record in a BST 4. Deleting a record from a BST

    1. If a binary search tree is traversed in inorder and the contents of each node are printed as the node is visited , the numbers are printed in ascending order.

    2. A Recursive Algorithm for insertion:

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Assuming that tree is not empty, we first test the new key against the root key: if it is less than we insert it in the left subtree, if it is equal to or greater insert it in the right subtree.

    After several recursive calls we will reach a point where the subtree into which the key to be inserted is empty, at this point we create a new node and link it to the appropriate pointer in the parent node.

    Examples will given in the class.

    3. Finding a node requires traversing the tree until you reach the searched node

    4. Deletion algorithm:

    Locate the desired node by a search and call it t. If t is a leaf, disconnect it from the parent set (set the pointer in the parent node to null). If t has a left child but no right child, remove t from the tree by making t's parent point to the t's left

    child. If t has a right child but no left child, remove t from the tree by making t's parent point to t's right

    child. Otherwise find t's LNR successor (the node in the t's right subtree with the smallest key. Copy this

    node's information into t, and then delete the node.

    THREADED BINARY TREES

    How can we write non-recursive algorithm for tree traversal?

    A technique called threading can be used to traverse a tree nonrecursively.

    Threading: As we build a BST, we fill empty pointer fields with pointers that help us move up the tree as well as down. Moving up helps us to find the successor of a node during a traverse left-node-right operation.

    7

    4 8

    6

    1

    2

    6

    7 4

    5 4

    Predecessor of 6: 5

    Successor of 2: 4

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    If the threads facilitate inorder traversal then the tree is called right-in threaded.

    Where are the threads generated? If a node has a right child then its LNR successor is below it somewhere in the right subtree. Otherwise its LNR successor is above in the tree. Since a node needs a thread only if it has no right child, common practice is to store the thread in the right child field of the node. A flag is also needed to indicate if it is a thread or not.

    EXPRESSION TREES Another application of binary trees in interpreters and compilers. The statement is converted into tree. For simplicity we will see expression trees. Expression tree is a tree that has operator as its root and identifiers or constants as its leaves. A node representing an operator is a nonleaf whereas a node representing an operand is a leaf. Traversing an expression three in preorder gives its prefix form, traversing it in inorder gives infix form and traversing in postorder produces postfix form.

    thread

    D

    G

    successor successor

    0

    7

    5

    3

    1

    9

    4

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    AVL TREES BSTs carry a high risk of becoming unbalanced, resulting in expensive search and update

    operations. Idea: Maintain a binary search tree that is almost completely balanced AVL Trees are BSTs satisfying AVL tree balance:

    For every node, the heights of its left and right subtrees differ at most 1. Depth of an AVL tree with N nodes is at most O(lgN)

    Insertion After insertion, AVL tree might become unbalanced by a difference of at most 2 in the

    subtrees. For the bottommost unbalanced node, S, there are 4 cases:

    1. The extra node is the left child of the left child of S. 2. The extra node is the right child of the left child of S. 3. The extra node is the left child of the right child of S. 4. The extra node is the right child of the right child of S.

    Balancing an AVL tree.

    Operation should take O(lgN) time It can be done using a series of operations referred as rotations

    Single rotation: Case 1 and 4 Double rotation: Case 2 and 3

    37

    24

    120

    7 32

    42

    4240

    2

    37

    24

    120

    7 32

    42

    4240

    2

    5

    unbalaced

    nodes

    +

    / *

    + sqrt 8 4

    x z 3

  • CSE211 Lecture Notes - E. Ozcan, S. Baydere O. Demir 2009 Spring

    Deletion will require the same operations Single Rotation

    Double Rotation

    P

    S

    A

    C

    B

    S

    ACB

    P

    S

    ACB

    P

    G

    P

    B

    D

    C

    SA

    S

    P

    B D

    CA

    G