heaps and heapsort

Post on 17-Jan-2016

85 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Heaps and Heapsort. Lecture 20. Prof. Sin-Min Lee Department of Computer Science San Jose State University. Heaps. Heap Definition. Adding a Node. Removing a Node. Array Implementation. Analysis Source Code. What is Heap?. - PowerPoint PPT Presentation

TRANSCRIPT

Heaps and HeapsortHeaps and Heapsort

Prof. Sin-Min Lee Department of Computer

Science San Jose State University

• Heap Definition.

• Adding a Node.

• Removing a Node.

• Array Implementation.

• Analysis

• Source Code.

HeapsHeaps

What is Heap?

A heap is like a binary search tree. It is a binary tree where the six comparison operators form a total order semantics that can be used to compare the node’s entries. But the arrangement of the elements in a heap follows some new rules that are different from a binary search tree.

HeapsHeaps

A heap is a certain kind of complete binary tree. It is defined by the following properties.

Heaps (Priority Queues)

A heap is a data structure which supports efficient implementation of three operations: insert, findMin, and deleteMin. (Such a heap is a min-heap, to be more precise.)

Two obvious implementations are based on an array (assuming the size n is known) and on a linked list. The comparisons of their relative time complexity are as follows:

insert findMin deleteMin Sorted array O(n) O(1) O(1) (from large (to move (find at end) (delete the end) to small) elements)

Linked list O(1) O(1) O(n) (append) (a pointer to Min) (reset Min pointer)

HeapsHeaps

1. All leaf nodes occur at adjacent levels.

When a completebinary tree is built,

its first node must bethe root.

When a completebinary tree is built,

its first node must bethe root.

Root

HeapsHeaps

2. All levels of the tree, except for the bottom level are completely filled. All nodes on the bottom level occur as far left as possible.

Left childof theroot

The second node isalways the left child

of the root.

The second node isalways the left child

of the root.

HeapsHeaps

3. The key of the root node is greater than or equal to the key of each child. Each subtree is also a heap.

Right childof the

root

The third node isalways the right child

of the root.

The third node isalways the right child

of the root.

HeapsHeaps

The next nodesalways fill the next

level from left-to-right..

The next nodesalways fill the next

level from left-to-right..

HeapsHeaps

The next nodesalways fill the next

level from left-to-right.

The next nodesalways fill the next

level from left-to-right.

HeapsHeaps

Complete binary tree.

Heap Storage Rules• A heap is a binary tree where the entries of the

nodes can be compared with a total order semantics. In addition, these two rules are followed:

• The entry contained by a node is greater than or equal to the entries of the node’s children.

• The tree is a complete binary tree, so that every level except the deepest must contain as many nodes as possible; and at the deepest level, all the nodes are as far left as possible.

HeapsHeaps

A heap is a certain kind of complete binary tree. 19

4222127

23

45

35

The "heap property"requires that each

node's key is >= thekeys of its children

The "heap property"requires that each

node's key is >= thekeys of its children

HeapsHeaps

This tree is not a heap. It breaks rule number 1.

19

27

23

45

35

HeapsHeaps

This tree is not a heap. It breaks rule number 2.

21

23

45

35

HeapsHeaps

This tree is not a heap. It breaks rule number 3.

2135

23

45

27

Heap implementation of priority queue

• Each node of the heap contains one entry along with the entry’s priority, and the tree is maintained so that it follows the heap storage rules using the entry’s priorities to compare nodes. Therefore:

• The entry contained by a node has a priority that is greater than or equal to the priority of the entries of the nodes children.

• The tree is a complete binary tree.

Adding a Node to a HeapAdding a Node to a Heap

Put the new node in the next available spot.

19

4222127

23

45

35

42

Adding a Node to a HeapAdding a Node to a Heap

Push the new node upward, swapping with its parent until the new node reaches an acceptable location.

19

4222142

23

45

35

27

Adding a Node to a HeapAdding a Node to a Heap

Push the new node upward, swapping with its parent until the new node reaches an acceptable location.

19

4222135

23

45

42

27

Adding a Node to a HeapAdding a Node to a Heap

The parent has a key that is >= new node, or

The node reaches the root. The process of pushing the

new node upward is called reheapification upward.

19

4222135

23

45

42

27

Removing the Top of a HeapRemoving the Top of a Heap

Move the last node onto the root.

19

4222135

23

45

42

27

Removing the Top of a HeapRemoving the Top of a Heap

Move the last node onto the root.

19

4222135

23

27

42

Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-

place node downward, swapping with its larger child until the new node reaches an acceptable location.

19

4222135

23

27

42

Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-

place node downward, swapping with its larger child until the new node reaches an acceptable location.

19

4222135

23

42

27

Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-

place node downward, swapping with its larger child until the new node reaches an acceptable location.

19

4222127

23

42

35

Removing the Top of a HeapRemoving the Top of a Heap The children all have keys <= the out-of-place node, or

The node reaches the leaf.

The process of pushing the new node downward is called reheapification downward.

19

4222127

23

42

35

Adding an Entry to a Heap

519

27 21 422

2335

45

As an example, suppose that we already have nine entries that are arranged in a heap with the above priorities.

Suppose that we are adding a new entry with a priority 42.The first stepis to add this entry in a way that keeps the binary tree complete. In thiscase the new entry will be the left child of the entry with priority 21.

519

27 21 422

2335

45

4242

The structure is no longer a heap, since the node with priority 21 has a child with a higher priority. The new entry (with priority 42) rises upwarduntil it reaches an acceptable location. This is accomplished by swappingthe new entry with its parent.

519

27 4242 422

2335

45

21

The new entry is still bigger than its parent, so a second swap is done.

519

27 35 422

234242

45

21

Adding an entry to a priority queue

Pseudocode for Adding an entry

The priority queue has been implemented as a heap.

1.Place the new entry in the heap in the first available location.This keeps the structure as a complete binary tree, but it might no longer be a heap since the new entry might have a higher priority than its parent.

2. while (the new entry has a priority that is higher than its parent)

swap the new entry with its parent.

Reheapification upward

Now the new entry stops rising, because its parent (with priority 45) has a higher prioritythan the new entry. In general, the “new entry rising” stops when the new entry reaches the root. The rising process is called reheapification upward.

Implementing a HeapImplementing a Heap

• We will store the data from the nodes in a partially-filled array.

An array of dataAn array of data

2127

23

42

35

Implementing a HeapImplementing a Heap

• Data from the root goes in the first location of the array.

An array of dataAn array of data

2127

23

42

35

42

Implementing a HeapImplementing a Heap

• Data from the next row goes in the next two array locations.

An array of dataAn array of data

2127

23

42

35

42 35 23

Implementing a HeapImplementing a Heap

• Data from the next row goes in the next two array locations.

• Any node’s two children reside at indexes (2n) and (2n + 1) in the array. An array of An array of datadata

2127

23

42

35

42 35 23 27 21

Implementing a HeapImplementing a Heap

• Data from the next row goes in the next two array locations.

An array of dataAn array of data

2127

23

42

35

42 35 23 27 21

We don't care what's inWe don't care what's inthis part of the array.this part of the array.

Important Points about the Implementation

Important Points about the Implementation

• The links between the tree's nodes are not actually stored as pointers, or in any other way.

• The only way we "know" that "the array is a tree" is from the way we manipulate the data.

An array of dataAn array of data

2127

23

42

35

42 35 23 27 21

Important Points about the Implementation

Important Points about the Implementation

• If you know the index of a node, then it is easy to figure out the indexes of that node's parent and children.

[1][1] [2] [3] [4] [5]

2127

23

42

35

42 35 23 27 21

Removing an entry from a Heap

• When an entry is removed from a priority, we must always remove the

• entry with the highest priority - the entry that stands “on top of heap”.

• For example, the entry at the root, with priority 45 will be removed.

The highest priority item, at the root, will be removed from the heap

519

27 35 422

2342

45

21

4545

Removing an entry from a heap

• During the removal, we must ensure that the resulting structure remains a heap. If the root entry is the only entry in the heap, then there is really no more work to do except to decrement the member variable that is keeping track of the size of the heap. But if there are other entries in the tree, then the tree must be rearranged because a heap is not allowed to run around without a root. The rearrangement begins by moving the last entry in the last level upto the root, like this:

519

35 422

2342

45

2121

27

2121

422

23

35

42

27

519

Before the moveAfter the move

The last entry in the last levelhas been moved to the root.

Removing an entry from a Heap

Heapsort

• Heapsort combines the time efficiency of mergesort and the storage efficiency of quicksort. Like mergesort, heapsort has a worst case running time that is O(nlogn), and like quicksort it does not require an additional array. Heapsort works by first transforming the array to be sorted into a heap.

Heap representation

A heap is a complete binary tree, and an efficient method of representing a complete binary tree in an array follows these rules:

Data from the root of the complete binary tree goes in the [0] component of the array.

The data from depth one nodes is placed in the next two components of the array.

We continue in this way, placing the four nodes of depth two in the next four components of the array, and so forth. For example, a heap with ten nodes can be stored in a ten element array as shown below:

27

45

42

23 352221

5419

45 27 42 21 23 22 35 19 4 5

45

Rules for location of data in the array

• The data from the root always appears in the [0] component of the array.

• Suppose that the data for a root node appears in the component [i] of the array. Then the data for its parent is always at location [(i - 1) / 2] (using integer division).

• Suppose that the data for a node appears in component [i] of the array. Then its children (if they exist)always have their data at these locations:

Left child at component [2i + 1]

Right child at component [2i + 2]

• The largest value in a heap is stored in the root and that in our array representation of a complete binary tree, the root node is stored in the first array position. Thus, since the array represents a heap, we know that the largest array element is in the first array position. To get the largest value in the correct array position, we simply interchange the first and the final array elements. After interchanging these two array elements, we know that the largest array element is in the final array position, as shown below:

• The dark vertical line in the array separates the sorted side of the array from the unsorted side (on the left) . Moreover the left side still represents a complete binary tree which is almost a heap.

5 27 42 21 23 22 35 19 4 45

When we consider the unsorted side as a complete binary tree, it is only the root that violates the heapstorage rule. This root is smaller than its children. The next step of the heapsort is to reposition this out of place value in order to restore the heap. The process called reheapification downward begins by comparing the value in the root with the value of each of its children. If one or both children are larger than the root , then the root’s value is exchanged with the larger of the two children. This moves the troublesome value down one level in the tree. For example:

27

22

419

21 23 35

5

42

42 27 5 21 23 22 35 19 4 45

• In the example 5 is still out of place, so we will once again exchange it with its largest child resulting in the array and heap shown below:

27

22

419

21 23

35

5

42

42 27 35 21 23 22 5 19 4 45

The unsorted side of the array must be maintained a heap. When the unsorted side of the array is once again a heap, the heapsort continues by exchanging the largest element in the unsorted side with the rightmost element of the unsorted side. For example 42 is exchanged with 4 as shown below:

4 27 35 21 23 22 5 19 42 45

• The sorted side now has two largest elements, and the unsorted side is once again almost a heap, as shown below:

4 27 35 21 23 22 5 19 42 45

522

35

4

27

2321

19

Only the root (4) is out of place, and that may be fixed by another reheapification downward. After the reheapification downward, the largest value of the unsorted side will once again reside at location [0]and we can continue to pull out the next largest value.

Pseudocode for the heapsort algorithm//Heapsort for the array called data with n elements1. Convert the array of n elements into a heap.2. unsorted = n; // The number of elements in the unsorted

side3. while ( unsorted > 1){ // Reduce the unsorted side by one unsorted - -; Swap data[0] with data [unsorted]. The unsorted side of the array is now a heap with the

root out of place. Do a reheapification downward to turn the unsorted side back into a heap. }

• While the Heapsort is a constant factor slower than the Quicksort for average data sets, it still has a complexity of O(n log n) for best, worst and average data.

• Reheapification is the process by which an out of place node is exchanged with its parent (upward) or its children (downward) until the node’s key is <= to it’s parent and >= to it’s children. Then reheapification is complete.

Analysis Analysis

A simpler but related problem: Find the smallest and the second smallest elements in an array (of size n):

• A straightforward method:

Find the smallest using n –1 comparisons; swap it to the end of the array, find the next smallest in n –2 comparisons, for a total of 2n – 3 comparisons.• A method that “saves” results of some earlier comparisons:

(1) Divide the array into two halves, find the smallest elements of each half, call them min1 and min2, respectively; (2) Compare min1 and min2, to find the smallest;

(3) If min1 is the smallest, compare min2 with the remaining elements of the first half to determine the second smallest; similarly for the case if min2 is the smallest.

The number of comparisons is (n –1) + (n/2 –1) = (3n/2 –2.

The second method can be depicted in the following figure:

min1 min2

min

First half of array Second half of array

In the figure, a link connects a smaller element to a larger one going downward; for example, min min1 and min min2, and min1 each element in the first half of the array, similarly for min2. This organization facilitates finding the smallest and second smallest elements.

H1 H2

A binary tree data structure for Min-heaps:A binary tree is called a left-complete binary tree if

(1) the tree is full at each level except possibly at the maximum level (depth), where the tree is full at level i means there are 2i nodes at that level (recall the root is at level 0); and

(2) at the tree’s maximum level h, if there are fewer than 2h nodes then the missing nodes are on the right side of the tree.

Missing on the right side

If there are n nodes in a left-complete binary tree of depth h, then 2h n 2h+1 –1. Thus, h lg n < (h +1). In particular, h = O(lg n).

A binary tree satisfies the (min-)heap-order property if the value at each is less than or equal to the value at each of the child nodes (if exist).

A binary min-heap is a left-complete binary tree that also satisfies the heap-order property.

13

21

24 31

16

19 68

65 26 32

Note that there is no definite order between values in sibling nodes; also, it is obvious that the minimum value is at the root.

A binary min-heap

Two basic operations on binary min-heaps:

(1)Suppose the value at a node is decreased which causes violation to the heap-order property (because the value is smaller than that of its parent’s). The operation to restore the heap property is named percolateUp (siftUp).

(2)Suppose the value at a node is increased which becomes larger than either (or both) of the value of the children’s. The operation to restore the heap order is named percolateDown (siftDown).

Both operations can be completed in time proportional to the tree height, thus, of time complexity O(lg n) where n is the total number of nodes. These operations are crucial in supporting the heap operations insert and deleteMin.

The percolateUp (siftUp) operation:

13

21 35

25 37 43 12

The heap order violated at node valued 12

21

25 37 43

13

12

35

percolateUp

21

25 37 43 35

13

12

percolateUpNote that the time of each step (each level) is constant, i.e., compare with parent and swap

The percolateDown (siftDown) operation:

43

21 35

37 25 45 63

The heap order violated at node valued 43

Each step of percolateDown finds the smaller of the two children (if exist) then swap it with the violating node

35

37 25 45 63

43

21

Who is smaller ?

Who is smaller ?

35

37 45 63

21

25

43

percolateDown

percolate-Down

Implementation of the heap operations:

insert(): insert a node next to the last node, call percolateUp from it, treating the new node as a violation.

deleteMin(): delete the root node, move the last node to the root, then call percolateDown from the root.

findMin(): return the root’s value.

Finally, a binary heap can be implemented using an array storing the heap elements from the root towards the leaves, level by level and from left to right within each level. The left-completeness property guarantees that there are no “holes” in the array. Also, if we use array indexes 1..n to store n elements, the parent’s index (if exists) of node i is i/2, the left child’s index is 2i, the right child 2i + 1 (if exist). The time complexity of deleteMin and insert both are O(lg n); the time complexity of findMin is O(1).

An array implementation of a binary heap:

1

2 3

4 5 6

Number the nodes (starting at 1) by levels, from top to bottom and left to right within level

1 2 3 4 5 6 . . . . .

Level 2 nodesLevel 1 nodesRoot

Parent to children (index i to 2i, 2i+1)

Convert an array T[1..n] into a heap (known as heapify):

A top-down method: repeatedly call insert() by inserting T[i] into a heap T[1..(i –1)], for i = 2 to n.

21

12 9

13 7

Initially, T[1] by itself ia a heap

12

21 9

13 7

Insert 12 into T[1..1], making T[1..2] a heap

9

21 12

13 7

9

13 12

21 7

7

9 12

21 13

Insert 9 into T[1..2], making T[1..3] a heap

Insert 13 into T[1..3], making T[1..4} a heap

Insert 7 into T[1..4], making T[1..5] a heap

The total time complexity of top-down heapify is O(n lg n).

A bottom-up heapify method:

12 9

13 7

21For i = n/2 down to 1

/* percolate down T[i], making the subtree rooted at i a heap */

call percolateDown(i)n = 5, start at node n/2, the node of the highest index that has any children

Complexity analysis: Node1 travels down h levels during percolateDown, nodes 2 and 3 each (h –1) levels, nodes 4 through 7 each (h –2) levels, etc. The total number of levels traveled is h + 2(h –1) + 4 (h –2) + …+ 2h –1(h – (h – 1)) = 2h –1(1 + 2(1/2) + 3(1/2)2 + 4(1/2)3 + …) 2n, because 2h –1 < n/2 since h lg n, and the infinite series 1 + 2x + 3x2 + … = 1/(1 –x)2 when |x| < 1. Thus, the time complexity of heapify is O(n).

top related