cp2530 - appspot.combranko-cirovic.appspot.com/cp2530/cp2530.notes.pdf · analysis of algorithms...

CP2530Algorithms and Data Structures

ANALYSIS OF ALGORITHMS

Analysis is more reliable than experimentation. Testing will reveal behaviour for some inputs, while analysis tells us about algorithm’s behaviour for a! inputs.

There are many different solutions to the same problem. Analysis can help us choose among different solutions.

Performance of a program can be predicted before actual implementation. Analysis gives us better understanding where the “slow” and “fast” parts are.

MOTIVATION

Recall Fibonacci sequence. It can be recursively defined as:

f ( x ) = 1

f ( x - 1) + f ( x - 2)if x = 1 or x = 2otherwise

Here we have two base cases, when argument x is either 1 or 2, i.e. f( 1 ) = 1 and f( 2 ) = 1, reflecting the fact that first two integers in the sequence are 1, 1.

FIBONACCI

The original formula gives rise to natural recursive implementation :

0 int f(int n) {1 if(n == 1 || n == 2)2 return 1;3 else4 return f(n - 1) + f(n - 2);5 }

EFFICIENCY

Basic Question : How much time would recursive algorithm take to compute the nth member of the sequence?

But how to measure time? In seconds? But then the answer changes every time Intel comes out with a faster processor.

To get a rough approximation, we measure time in terms of lines of ( pseudo ) code.

EFFICIENCY

Line 1 is always executed in the code on slide 4. Depending on the evaluation of line 1 either line 2 or line 4 is executed.

Therefore, the time required to compute the nth Fibonacci number in terms of lines of code is :

time(n) = 2 + time(n - 1) + time(n - 2)

The equation above is called recurrence relation, and we’ll see how to solve them shortly.

EFFICIENCY

Recursive implementation of Fibonacci numbers is natural, but very inefficient. Here is why, ( intuitively ) :

f(4) f(3)

f(2) f(1)f(2)f(3)

f(2) f(1) The same numbers are recomputed ! Try running with n = 45 and wait ...

FIBONACCI:SECOND APROACH

Recursive algorithm is slow because it recomputes the same numbers over and over again.

Our second algorithm stores computed numbers in the array :

0 int f(int n) {1 int[] a = new int[n];2 a[0] = a[1] = 1;3 for(int i = 2; i < n; i++)4 a[i] = a[i - 1] + a[i - 2];5 return a[n - 1];6 }

Lines 1, 2 and 5 are executed unconditionally ( 3 times ). Line 3 is executed n - 1 times and line 4 is executed n - 2 times. So

time(n) = 3 + n - 1 + n - 2 = 2n

For n = 45, it takes 90 steps, roughly 25 million times faster than recursive implementation!

FIBONACCI:SECOND APROACH

SPACE COMPLEXITY

Efficiency ( running time ) is not our only concern or the only thing that we can analyze mathematically.

If a program takes a lot of time ( reasonably ), we can still run it and just wait longer for a result.

However, if a program takes a lot of memory ( space ), we may not be able to run it at all.

Recursive algorithm takes a constant amount of space : some for local variables as well as return addresses.

SPACE ANALYSIS

f(2) f(1)

f(n-1) f(n-2)

f(n-3)f(n-2) f(n-4)f(n-3)

The length of any such path is at most n, so the space complexity is again some constant factor times n.

Iterative algorithm uses roughly the same amount of space - size of the array ( n ). Since each step through the loop uses only two previous values, we can improve space complexity :

SPACE ANALYSIS

0 int f(int n) {1 int p = 1;2 int t = 0;3 for(int i = 1; i < n; i++) {4 int c = p + t;5 p = t;6 t = c;7 }8 return c;9 }

SPACE ANALYSIS

Lines 1, 2 and 8 are executed unconditionally ( 3 times ). Line 3 is executed n + 1 times, lines 4, 5 and 6 are executed n times. So

time(n) = 3 + n + 3n = 4n + 3

Because of swapping, this algorithm is slightly slower than array based algorithm, but it uses much less space.

GROWTH OF FUNCTIONS

Quite often we cannot predict the running times of algorithms exactly. Consider the following one, computing the maximum value in the array :

0 int max(int[] a) {1 int m = a[0];2 for(int i = 1; i < a.length; i++) {3 if(a[i] > m)4 m = a[i];5 }6 return m;7 }

GROWTH OF FUNCTIONS

Lines 1 and 6 are executed unconditionally ( 2 times ). Line 2 is executed n ( a.length ) times, line 3 is executed n - 1 times. But we can’t tell how many times line 4 will be executed. So

time(n) = 2 + n + n - 1 + A = 2n + 1 + A

In the expression above we know everything except the quantity A, which is the number of times we must change the value for the current maximum.

GROWTH OF FUNCTIONS

Minimum value of A ( best case )In function max, the best case is when array is sorted in descending order, i.e. step 4 is never executed and the running time is 2n + 1.

Maximum value of A ( worst case )In function max, the worst case is when array is sorted in ascending order, i.e. step 4 is executed n - 1 times and the running time is 3n.

The analysis usually consist of finding the :

GROWTH OF FUNCTIONS

It is the rate of growth that really interests us : how the running time of an algorithm increases with the size of an input in the limit.

Since we are interested in asymptotic efficiency of algorithms ( when input size is large ), when we define running times we simply ignore constants and lower degree variables.

For example, if T(n) = an2 + bn + c, where a, b and c are nonnegative constants, we’ll write Θ(n2) or O(n2).

Θ-NOTATION

What follows is fancy mathematical lingo to express the fact that if my code takes 3n4 + 2n² - 5 steps, I don’t care about lower terms and constants - it takes n4 steps. That’s it!

For a given function g(n) we denote by Θ(g(n)) the set of functions :

Θ(g(n)) = { f(n) : ∃c₁, c₂, n₀ > 0 such that 0 ≤ c₁g(n) ≤ f(n) ≤ c₂g(n) ∀n ≥ n₀ }

Any function f(n) in Θ(g(n)) is bounded by c₁g(n) and c₂g(n) for all n ≥ n₀. So we write f(n) ∈ Θ(g(n)) or f(n) = Θ(g(n)).

Θ-NOTATION

c₁g(n₁)

f(n₁)

c₂g(n₁)

n₁n₀

c₂g(n)

c₁g(n)

runningtime

n - size of input

Graphically, Θ-notation can be represented as :

O-NOTATIONFor a given function g(n) we denote by O(g(n)) the set of functions :

O(g(n)) = { f(n) : ∃c, n₀ > 0 such that 0 ≤ f(n) ≤ cg(n) ∀n ≥ n₀ }

Θ-notation asymptotically bounds a function from above and below. O-notation stands for asymptotic upper bound ( worst case running time ).

Again we write, for example, 2n² + 3n - 2 = O(n²), but the real meaning is 2n² + 3n - 2 ∈ O(n²).

O-NOTATIONGraphically, O-notation can be represented as :

f(n₁)

cg(n₁)

n₁n₀

runningtime

n - size of input

The most common values in the analysis of algorithms are :

O-NOTATION

constant logarithmic linear

O(1) O(log n) O(n)

quadratic polynomial exponential

O(n²) O(nk) ( k ≥ 1 ) O(an) ( a > 1 )

O-notation should be used to characterize function as closely as possible. While it is true that f(n) = 4n3 + 3n4/3 is in O(n5), it is more accurate to say that it is in O(n3).

Ω-NOTATION

For a given function g(n) we denote by Ω(g(n)) the set of functions :

Ω(g(n)) = { f(n) : ∃c, n₀ > 0 such that 0 ≤ cg(n) ≤ f(n) ∀n ≥ n₀ }

Ω-notation stands for asymptotic lower bound ( best case running time ).

Ω-NOTATIONGraphically, Ω-notation can be represented as :

cg(n₁)

f(n₁)

n₁n₀

runningtime

n - size of input

RECURRENCES

When an algorithm contains a recursive call, its running time can often be described by recurrence, which is an equation that describes function in terms of smaller inputs.

For example, recurrence that describes recursive Fibonacci function is :

T(n) = 1T(n - 1) + T(n - 2)

if n = 1 or n = 2otherwise

The importance of solving recurrences is in obtaining asymptotic Θ or O bounds.

HOMOGENEOUS RECURRENCES

Recurrences of the form : a0T(n) + a1T(n - 1) + ... + akT(n - k) = 0

are called homogeneous recurrences.

For example, Fibonacci sequence, in terms of recurrences,T(n) - T(n - 1) - T(n - 2) = 0 is homogeneous recurrence.

With each recurrence we associate characteristic polynomial : p(x) = a0xk + a1xk-1 + ... + ak

For example, x2 - x - 1 is p(x) for Fibonacci sequence.

Let ri denote the ith root of the characteristic polynomial. Then homogeneous recurrence has solution of the form :

∑cirin = c1r1n + c2r2n + ... + ckrkn

provided that all roots are distinct.

Coefficients c1, c2, ..., ck can be determined from k initial conditions ( trivial recursive cases ) by solving the system of k linear equations in k unknowns.

HOMOGENEOUS RECURRENCES

EXAMPLE

Fibonacci recurrence is defined as T(n) - T(n - 1) - T(n - 2) = 0 with its characteristic polynomial x2 - x - 1. The roots of this polynomial are :

1 + √52r1 = r2 = 1 - √5

So the solution is of the form :

1 + √52c1 c2

1 - √52+

T(n) = '(

EXAMPLE

We find coefficients c1 and c2 from initial conditions T(1) = 1 and T(2) = 1. Their values are c1 = 1 / √5 and c1 = -1 / √5. Thus

Since (1 + √5) / 2 > 1 and (1 - √5) / 2 < 1, we have

T(n) = O(1.618n)

That’s bad. Very bad! It will take ~2,500,000,000 steps to compute the 45th member of the Fibonacci sequence!

1 + √52

1 - √52

T(n) = √51___ )

THE MASTER METHOD

The master method is a method for solving recurrences of the form :

where a ≥ 1, b > 1 and f(n) is asymptotically positive function. Then T(n) can be bounded asymptotically as follows :

nb_T(n) = aT + f(n)'

1. If f(n) = O(nlogba - ε) for some constant ε > 0, thenT(n) = Θ(nlogba).

2. If f(n) = Θ(nlogba), then T(n) = Θ(nlogba log n).

3. If f(n) = Ω (nlogba + ε) for some constant ε > 0 and if af(n/b) ≤ cf(n) for some constant c < 1, then T(n) = Θ(f(n)).

THE MASTER METHOD

nb_T(n) = aT + f(n)'

EXAMPLES (1)

Consider T(n) = 9T(n / 3) + n.

For this recurrence we have a = 9, b = 3, f(n) = n.

Thus nlog39 = n2. Let ε = 6.

Since f(n) = n = O(nlog39 - 6) = O(nlog33) = O(n), we apply case (1) and conclude T(n) = Θ(nlog39) = Θ(n2).

EXAMPLES (2)

Consider T(n) = T(2n / 3) + 1.

For this recurrence we have a = 1, b = 3/2, f(n) = 1

Thus nlog3/21 = n0 = 1.

Since f(n) = 1 = Θ(nlog3/21) = Θ(1), we apply case (2) and conclude T(n) = Θ(nlog3/21 log n) = Θ(log n).

EXAMPLES (3)

Consider T(n) = 3T(n / 4) + n log n.

For this recurrence we have a = 3, b = 4, f(n) = n log n and thus nlog43.

Since log43 < 1, choose ε such that ε + log43 = 1.

Then nlog43+ε = n1 = n and obviously f(n) = n log n = Ω(n)( case 3 is candidate ).

EXAMPLES (3)

Next we have to show that regularity condition applies :

a f(n / b) ≤ cf(n) for some constant c

3 (n / 4) log (n / 4) ≤ cn log n

Let c = 3 / 4. The we have :

(3n / 4) log (n / 4) < (3n / 4) log n

Case 3 applies and we conclude T(n) = Θ(n log n).

DATA STRUCTURES

So far, we studied static, fixed-size data structures such as arrays.

Quite often, it is not possible to predict, in advance, how much memory is needed to carry out computation ( can compiler predict how many variables you will declare? )

For this reason, we’ll study dynamic data structures - structures that can grow, as well as shrink, at run time.

SELF-REFERENTIAL CLASSES

All dynamic data structures are based on the concept ofself-referential nodes.

Self-referential nodes are implemented through self-referential classes which contain data field(s) as well as references ( links ) to objects of the same type.

Pictorially, self-referential node can be represented as :

Node.javaclass Node { private int data; private Node link;

public Node() { data = 0; link = null; }

public void setData(int data) { this.data = data; }

Node.java public int getData() { return data; }

public void setLink(Node link) { this.link = link; }

public Node getLink() { return link; }}

DYNAMIC MEMORY ALLOCATION

Creating and maintaining data structures requires dynamic memory allocation - ability to request and allocate more memory at run time.

Recall that in Java, the new operator is essential in dynamic memory allocation. For example : Node n = new Node();would request enough memory to store object of type Node and assign reference to variable n.

Unlike in C/C++, we don’t have to worry about deleting nodes we don’t need. Java garbage collector does the job!

TestNode.java

class TestNode { public static void main(String[] args) { Node n1, n2, n3, temp; n1 = new Node(); n2 = new Node(); n3 = new Node(); n1.setData(1); n1.setLink(n2); n2.setData(2); n2.setLink(n3); n3.setData(3);

TestNode.java

temp = n1; while(temp != null) { System.out.print(temp.getData() + " "); temp = temp.getLink(); } System.out.println(); }}

LINKED NODES

This is what we have created here, pictorially :

temp 3

The major disadvantage with this approach is that we have to declare nodes in advance. Instead of doing so we’ll develop methods to create and link nodes dynamically.

STACKS

The simplest dynamic data structure is stack - new nodes can be added to and removed from the top only.

For this reason, stack is referred to as LIFO ( Last-In-First-Out ) data structure.

In order to implement stack, we need two private variables :Node top and int size, initially set to null and 0, respectively.

Primary methods are void push(int data) which adds a new node on top of the stack, and void pop() which removes node from the top.

STACKS : push

This is the sequence of steps taken every time the push(data) method is called :

1. Dynamically create new node : Node n = new Node();

topsize = 0

STACKS : push

2. Set data ( assume 1 ) : n.setData(data);

topsize = 0

3. Link newly created node to the previous one : n.setLink(top);

STACKS : push

4. Set top reference to newly created node : top = n;

size = 0

5. Increment size : size++;

STACKS : push

size = 11

size = 2

STACKS : push

public void push(int data) { Node n = new Node(); n.setData(data); n.setLink(top); top = n; size++;}

Pop method removes the node on top. This is the sequence of steps taken every time the pop() method is called :

STACKS : pop

1. If the stack is empty ( size is zero or top is null ), report it and exit. Otherwise, set top to the node below : top = top.getLink();

2. Decrement size : size--;

STACKS : pop

size = 11

size = 2

STACKS : pop

public void pop() {if(isEmpty())System.out.println("Stack is empty.");

else {top = top.getLink();size--;

Stack.java

class Stack { Node top; int size; public Stack() { top = null; size = 0; }

Stack.java

public void push(int data) { Node n = new Node(); n.setData(data); n.setLink(top); top = n; size++;}

Stack.java

public boolean isEmpty() { if(size == 0) return true; else return false;}

Stack.java

public void pop() { if(isEmpty()) System.out.println("Stack is empty."); else { top = top.getLink(); size--; }}

Stack.java

public void print() {if(isEmpty())System.out.print("Stack is empty");

else {Node temp = top;while(temp != null) {System.out.print(temp.getData() + " ");temp = temp.getLink();

}}System.out.println();

TestStack.java

class TestStack { public static void main(String[] args) { Stack s = new Stack(); s.push(1); s.push(2); s.push(3); s.print(); s.pop(); s.print(); s.pop(); s.pop(); s.print(); s.pop(); }}

LINKED LISTS

Linked lists are data structures similar to stacks. Insertions are done at the end of the list called tail.

Here we use another reference - head, which points to the first node in the list.

With two references, it is possible to search, sort, remove entries not only at tail etc.

Hence, in order to implement linked lists, we need two references Node head, tail and int size. References are initialized to null and size to 0;

LINKED LISTS : insert

This is the sequence of steps taken every time the insert(data) method is called :

1. Dynamically create new node : Node n = new Node();

headsize = 0

2. Set data ( assume 1 ) : n.setData(data);

headsize = 0

3. If list is empty set head reference to newly created node n ( head = n ). Otherwise, link this node to the last one : tail.setLink(n);

headsize = 0

4. Set tail reference to newly created node : tail = n;

5. Increment size : size++;

headsize = 0

headsize = 1

size = 2

public void insert(int data) { Node n = new Node(); n.setData(data); if(isEmpty()) head = n; else tail.setLink(n); tail = n; size++;}

LINKED LISTS : removeRemove method removes the node at tail. This is the sequence of steps taken every time the remove() method is called :

1. If the list is empty ( size is zero or head/tail are null ), report it and exit.

2. If size is 1, reset head and tail to null and size to zero.

3. Otherwise, traverse the list until we find the node just before tail :

while(temp.getLink() != tail) temp = temp.getLink();

LINKED LISTS : remove

head tail

size = 2temp

4. Set tail to temp : tail = temp;

5. Set tail’s link to null : tail.setLink(null);

6. Decrement size : size--;

head tail

size = 2temp

size = 1

public void remove() { if(isEmpty()) System.out.println("List is empty."); else if(size == 1) { head = tail = null; size = 0; }

else { Node temp = head; while(temp.getLink() != tail) temp = temp.getLink(); tail = temp; tail.setLink(null); size--; }}

LinkedList.java

class LinkedList { private Node head; private Node tail; int size; public LinkedList() { head = tail = null; size = 0; }

LinkedList.java

public boolean isEmpty() { if(size == 0) return true; else return false; }

LinkedList.java

public void insert(int data) { Node n = new Node(); n.setData(data); if(isEmpty()) head = n; else tail.setLink(n); tail = n; size++; }

LinkedList.java

public void remove() { if(isEmpty()) System.out.println("Can't remove. List is empty."); else if(size == 1) { head = tail = null; size = 0; } else { Node temp = head; while(temp.getLink() != tail) temp = temp.getLink(); tail = temp; tail.setLink(null); size--; } }

LinkedList.java

public void print() { if(isEmpty()) System.out.println("List is empty."); else { Node temp = head; while(temp != null) { System.out.print(temp.getData() + " "); temp = temp.getLink(); } System.out.println(); } }}

TestLinkedList.java

class TestLinkedList { public static void main(String[] args) { LinkedList l = new LinkedList(); l.insert(1); l.insert(2); l.insert(3); l.print(); l.remove(); l.print(); l.remove(); l.print(); l.remove(); l.print(); l.remove(); l.print(); }}

QUEUES

Queue is dynamic data structure similar to linked list : it can be described by two references head and tail, and size. Fundamental methods are :

void enqueue(int data) : identical to insert(int data) in linked list. New node is linked at the tail of queue.

void dequeue() : removes node from head. For this reason queues are called FIFO ( First-In-First-Out ) data structures.

QUEUES : dequeue

head tail

size = 2size = 1

Queue.java

class Queue { private Node head; private Node tail; private int size; public Queue() { head = tail = null; size = 0; }

Queue.java

public boolean isEmpty() { if(size == 0) return true; else return false; }

Queue.java

public void enqueue(int data) { Node n = new Node(); n.setData(data); if(isEmpty()) head = n; else tail.setLink(n); tail = n; size++; }

Queue.java

public void dequeue() { if(isEmpty()) System.out.println("Queue is empty"); else if(size == 1) { head = tail = null; size = 0; } else { head = head.getLink(); size--; } }

Queue.java

public void print() { if(isEmpty()) System.out.println("Queue is empty."); else { Node temp = head; while(temp != null) { System.out.print(temp.getData() + " "); temp = temp.getLink(); } System.out.println(); } }}

TestQueue.java

class TestQueue { public static void main(String[] args) { Queue q = new Queue(); q.enqueue(1); q.enqueue(2); q.print(); q.dequeue(); q.print(); q.dequeue(); q.print(); q.dequeue(); q.print(); }}

BINARY TREES

Graph G = ( V, E ) is the set of vertices and edges.

Cycle C is a path in graph, i.e. distinct sequence of vertices and edges, that begins and ends with the same vertex.

BINARY TREES

Tree T is acyclic graph. It is rooted in a root vertex.

Binary Tree is a tree in which any node can have at most two leaves ( children ).

rootroot

BINARY SEARCH TREE

All keys ( if any ) in left subtree with respect to root vertex precede the key in the root.

The key in the root vertex precedes all keys ( if any ) in its right subtree.

The left and right subtrees of the root are again binary search trees.

Binary Search Tree is a binary tree which is either empty or in which each vertex contains a key that satisfies conditions :

BINARY SEARCH TREE

Binary Search Tree

null null null

Binary Search Tree as data structure

BNode.java

class BNode { private int data; private BNode leftLink; private BNode rightLink; public BNode() { data = 0; leftLink = rightLink = null; }

BNode.java

public void setData(int data) { this.data = data; } public int getData() { return data; }

BNode.java

public void setLeftLink(BNode leftLink) { this.leftLink = leftLink; } public BNode getLeftLink() { return leftLink; }

BNode.java

public void setRightLink(BNode rightLink) { this.rightLink = rightLink; } public BNode getRightLink() { return rightLink; }}

BINARY TREE: INSERTION

1 - 2 - 3

1 - 3 - 2

3 - 2 - 1

3 - 1 - 2

Nodes cannot be inserted into a binary tree in a predefined way. Every insertion must preserve the binary tree. Different insertions might result in a different tree.

Observe that both sequences, 2 - 1 - 3 and 2 - 3 - 1 will yield the same tree.

Hence, before we can insert node into a binary tree, we have to find the proper position for that node. Initially, binary tree consists of a single reference root which is null.

temptempb.insert(2);b.insert(1);b.insert(4);b.insert(3);b.insert(5);

public void insert(int data) { BNode b = new BNode(); b.setData(data); if(root == null) root = b; else { BNode temp = root; while(temp != null) {

if(data < temp.getData()) { if(temp.getLeftLink() != null) temp = temp.getLeftLink(); else { temp.setLeftLink(b); break; }}

else if(data > temp.getData()) { if(temp.getRightLink() != null) temp = temp.getRightLink(); else { temp.setRightLink(b); break; }}

else { System.out.println("Error. Duplicate."); break; } }}

TREE TRAVERSAL

In many applications it is necessary to visit nodes in a regular way. Typically we have to visit the node ( V ), traverse its left subtree ( L ) and finally traverse its right subtree ( R ) :

V L R V R L L V R R V L L R V R L V

By standard convention, these six traversals are reduced to three, by traversing the left subtree before the right.

Their names are : Preorder ( VLR ), Inorder ( LVR ) and Postorder ( LRV ).

PREORDER (VLR)

412 3 5

INORDER (LVR)

321 4 5

POSTORDER (LRV)

531 4 2

BTree.javaclass BTree { private BNode root; public BTree() { root = null; } public void insert(int data) { BNode b = new BNode(); b.setData(data); if(root == null) root = b;

BTree.java

else { BNode temp = root; while(temp != null) { if(data < temp.getData()) { if(temp.getLeftLink() != null) temp = temp.getLeftLink(); else { temp.setLeftLink(b); break; } }

BTree.java

else if(data > temp.getData()) { if(temp.getRightLink() != null) temp = temp.getRightLink(); else { temp.setRightLink(b); break; } }

BTree.java

else { System.out.println("Error. Duplicate."); break; } } } }

BTree.java

public void showPreorder() { System.out.print("Preorder traversal: "); preorder(root); System.out.println(); } public void preorder(BNode b) { if(b != null) { System.out.print(b.getData() + " "); preorder(b.getLeftLink()); preorder(b.getRightLink()); } }

BTree.java

public void showInorder() { System.out.print("Inorder traversal: "); inorder(root); System.out.println(); } public void inorder(BNode b) { if(b != null) { inorder(b.getLeftLink()); System.out.print(b.getData() + " "); inorder(b.getRightLink()); } }

BTree.java public void showPostorder() { System.out.print("Postorder traversal: "); postorder(root); System.out.println(); } public void postorder(BNode b) { if(b != null) { postorder(b.getLeftLink()); postorder(b.getRightLink()); System.out.print(b.getData() + " "); } }}

TestBTree.java

class TestBTree { public static void main(String[] args) { BTree b = new BTree(); b.insert(2); b.insert(1); b.insert(4); b.insert(4); b.insert(3); b.insert(5); b.showPreorder(); b.showInorder(); b.showPostorder(); }}

AVL TREESIf we inserted into a binary tree values 1, 2, 3 the resulting tree would look like :

root 1

null null

The worst-case performance ( search ) on binary search tree is when values are inserted in ascending / descending order. It is not better than linear search O(n).

Height-Balance PropertyFor every internal ( non-leaf ) node v of binary search tree T, the heights of the children of v can differ by at most 1.

Any tree that satisfies Height-Balance Property is said to be an AVL tree ( Adelson, Velskii, Landis ). AVL trees maintain the logarithmic height.

AVL TREES

INSERTION

Nodes are inserted into AVL tree in the same way as they were inserted into Binary Search Tree.

This action might violate height-balance property. If this is the case, tree must be restructured. There are four cases :

INSERTION

Once the new node x is inserted into a tree ( and linked to its parent! ), we’ll call :

public AVLNode unbalanced(AVLNode x)

Method unbalanced(x) climbs up the tree from x until it either finds unbalanced node by computing hbp or reaches the null reference ( root’s parent ).

If the tree is unbalanced, the method will return reference to unbalanced node ( z ), null otherwise.

INSERTION

Observe that it is not sufficient to check hbp at root alone. The following tree ( sequence 4 3 5 1 2 6 7 ) is balanced at root, but unbalanced at node 5.

root balanced

unbalanced

L-ROTATIONroot

b.insert(1);b.insert(2);b.insert(3);

L-ROTATION

t0 t1 t2 t3

nullroot y

This is L-Rotation in terms of pointers ( mess! ). Most recently inserted node is x, node z is unbalanced :

Newly inserted node x violates the height-balance property ( H(L) - H(R) = -2 ) in the right subtree with respect to unbalanced node z.

Newly inserted node x is the right child of its parent y.

L-ROTATION

Criteria for L-Rotation :

void l(x, y, z) AVLNode t0, t1, t2, t3; t0 = z.getLeftLink(); t1 = y.getLeftLink(); t2 = x.getLeftLink(); t3 = x.getRightLink(); y.setLeftLink(z); y.setRightLink(x); y.setParent(z.getParent()); z.setLeftLink(t0); z.setRightLink(t1); z.setParent(y); x.setLeftLink(t2); x.setRightLink(t3); x.setParent(y); if(z == root) root = y;

LR-ROTATION

LR-ROTATIONThis is LR-Rotation in terms of pointers ( mess! ). Most recently inserted node is x, node z is unbalanced :

t0 t1 t2 t3

nullroot x

Newly inserted node x violates the height-balance property ( H(L) - H(R) = -2 ) in the right subtree with respect to unbalanced node z.

Newly inserted node x is the left child of its parent y.

LR-ROTATION

Criteria for LR-Rotation :

void lr(x, y, z) AVLNode t0, t1, t2, t3; t0 = z.getLeftLink(); t1 = x.getLeftLink(); t2 = x.getRightLink(); t3 = y.getRightLink(); x.setLeftLink(z); x.setRightLink(y); x.setParent(z.getParent()); z.setLeftLink(t0); z.setRightLink(t1); z.setParent(x); y.setLeftLink(t2); y.setRightLink(t3); y.setParent(x); if(z == root) root = x;

R-ROTATION

R-ROTATIONThis is R-Rotation in terms of pointers ( mess! ). Most recently inserted node is x, node z is unbalanced :

t0 t1 t2 t3

nullroot y

R-ROTATION

Newly inserted node x violates the height-balance property ( H(L) - H(R) = 2 ) in the left subtree with respect to unbalanced node z.

Newly inserted node x is the left child of its parent y.

Criteria for R-Rotation :

void r(x, y, z) AVLNode t0, t1, t2, t3; t0 = x.getLeftLink(); t1 = x.getRightLink(); t2 = y.getRightLink(); t3 = z.getRightLink(); y.setLeftLink(x); y.setRightLink(z); y.setParent(z.getParent()); x.setLeftLink(t0); x.setRightLink(t1); x.setParent(y); z.setLeftLink(t2); z.setRightLink(t3); z.setParent(y); if(z == root) root = y;

RL-ROTATION

RL-ROTATIONThis is RL-Rotation in terms of pointers ( mess! ). Most recently inserted node is x, node z is unbalanced :

t0 t1 t2 t3

nullroot x

RL-ROTATION

Newly inserted node x violates the height-balance property ( H(L) - H(R) = 2 ) in the left subtree with respect to unbalanced node z.

Newly inserted node x is the right child of its parent y.

Criteria for RL-Rotation :

void rl(x, y, z) AVLNode t0, t1, t2, t3; t0 = y.getLeftLink(); t1 = x.getLeftLink(); t2 = x.getRightLink(); t3 = z.getRightLink(); x.setLeftLink(y); x.setRightLink(z); x.setParent(z.getParent()); y.setLeftLink(t0); y.setRightLink(t1); y.setParent(x); z.setLeftLink(t2); z.setRightLink(t3); z.setParent(x); if(z == root) root = x;

REMOVAL OF NODES IN AVL TREES

CASE 1 : Removal of external nodes ( nodes that do not have children ) ALGORITHM:

1. find node ( x ) with specified key

2. find x’s parent ( y )

3. set y’s left ( right ) link to null

4. possibly handle root

CASE 1

nullroot

null null

a.remove(1);

find(1);

y = x.getParent();

if(x == y.getLeft()) y.setLeft(null);else y.setRight(null);

CASE 2 : Removal of internal nodes ( nodes that have at least one child ) Subcase ( a ) : 1 child

ALGORITHM:

1. find node ( w ) with specified key

2. if w has single child, set reference x to that child and reference y to w’s parent

3. properly link y to x and possibly handle root, i.e. w is root.

CASE 2A

CASE 2BCASE 2 : Removal of internal nodes, subcase ( b ) : 2 children

ALGORITHM:1. find node ( w ) with specified key

2. find node z that follows w in inorder traversal

3. copy value at z to w

4. if z has no children, unlink it; otherwise properly link z’s right subtree to z’s parent.

CASE 2B

Deleting a node from AVL tree may result in an unbalanced tree :

root 5

Resulting tree needs to be balanced using the same rotations as in the case of insertion :

R-Rotation

RL-Rotation

Situation is however more complex than in the case of insertion.

More than one rotation might be necessary to balance the tree.

For this reason, it necessary to climb back to the root node checking the balance property at each node.

Play with this at :http://www.qmatica.com/DataStructures/Trees/AVL/AVLTree.html

find(k) : returns an element with key k

insert(k, e) : inserts element e with key k into a dictionary

remove(k) : removes element with key k from dictionary

HASH TABLES

A dictionary is a collection of pairs ( k, e ) where k is the key and e is an element. Keys uniquely identify each pair. The most common methods are :

The primary purpose of dictionaries is to store elements so that they can be located effectively using keys.

HASH TABLES

The simplest implementation of dictionaries is an array, where the key is array’s index and element is data stored at that index.

The problem with arrays is that array’s indices are dense and many keys are not. Consider CNA students’ numbers : they are 8 digits long and there are roughly 8,500 students.

The solution to this problem is to use a hash table which consists of two components : bucket array and hash function.

The hash function is used to map sparse keys to dense locations of the bucket array.

The task of a hash function is to map a key k into an integer in the range [ 0, N - 1 ], where N is the size of a bucket.

One of the simplest hash functions is : h(k) = k mod N

HASH TABLES

Here is an example of a hash table with four buckets storing prime numbers { 2, 5, 11, 23, 37 } :

2 % 4 = 2

5 % 4 = 1

11 % 4 = 3

23 % 4 = 3

37 % 4 = 1

null23

If two different keys are mapped into the same bucket, we say that collision occurred.

Obviously, if buckets can store a single element, we would not be able to handle collisions.

Therefore, instead of storing single element in a bucket, we rather store a list of elements. Such a collision resolution is called chaining.

HASH TABLES

HashTable.java

class HashTable { private LinkedList[] table; private int size; public HashTable(int size) { this.size = size; table = new LinkedList[size]; for(int i = 0; i < size; i++) table[i] = new LinkedList(); }

HashTable.java

public void insert(int i) { int hashCode = i % size; table[hashCode].insert(i); }

public void remove(int i) { int hashCode = i % size; table[hashCode].remove(i); }

HashTable.java

public void get(int i) { int hashCode = i % size; int n = table[hashCode].search(i); if(n == -1) System.out.println(i + " is not in the table."); else { System.out.println(i + " is in the bucket " + hashCode + " at position " + n); } }

HashTable.java

public void print() { for(int i = 0; i < size; i++) { System.out.print("Bucket " + i + ": "); table[i].print(); } }}

GRAPHS

Graph G = ( V, E ) is the set of vertices and edges.

If e = ( u, v ) is in E, then e is an edge between vertices u and v. Circles represent vertices and arcs ( lines ) represent edges.

GRAPHS

If pairs of vertices are unordered, graph G is called undirected graph.

If pairs of vertices are ordered, graph G is called directed graph. Represented as line segments / arcs with arrowheads indicating direction.

GRAPH DEFINITIONS

A graph is called connected if there is a path from any vertex to any other vertex.

A path in a graph is a sequence of distinct vertices, each adjacent to the next.

A cycle in a graph is a path, where the starting and ending vertex are the same.

GRAPH REPRESENTATIONS

Adjacency Matrix representation

Mixed ( array - linked lists ) representation

Linked lists representation

There are ( at least ) three ways to represent graphs :

1. ADJACENCY MATRIX REPRESENTATION

Let n denote the number of vertices in a directed graph. We define the adjacency matrix :

boolean[][] am = new boolean[n][n];

am[i][j] is true if and only if vertex i is adjacent to vertex j, i.e., there exists a directed edge from i to j. If the graph is undirected, adjacency matrix will be symmetric.

1. ADJACENCY MATRIX

Directed graph

0 1 1 0

0 0 1 1

0 0 0 0

1 1 1 0

Adjacency matrix

0 1 2 3

In this representation, one-dimensional array is used to represent vertices.

Each entry in that array is a reference to a linked list of vertices adjacent to the vertex represented by array's index.

We assume that vertices are labeled with positive integers starting from zero.

2. MIXED REPRESENTATION

The same graph as in previous example, would be represented as:

Directed graph

2 null3

2 null

Mixed representation

3. LINKED LIST REPRESENTATION

Similar to previous one. Instead of matrix, linked list is used to represent vertices.

3 null0

IMPLEMENTATION

In order to model graphs we need to types of nodes, representing vertices and edges as adjacent vertices.

For the sake of simplicity we will label graph nodes with nonnegative integers.

label EdgeNodeEdgeNode:

EdgeNode

VertexNodeVertexNode:

Edge.java

class Edge { private int label; private Edge next; public Edge() { label = 0; next = null; } public void setLabel(int label) { this.label = label; }

Edge.java

public int getLabel() { return label; } public void setNextEdge(Edge next) { this.next = next; } public Edge getNextEdge() { return next; }}

Vertex.javaclass Vertex { private int label; private Vertex nextVertex; private Edge nextEdge; public Vertex() { label = 0; nextVertex = null; nextEdge = null; } public void setLabel(int label) { this.label = label; } public int getLabel() { return label; }

Vertex.java public void setNextVertex(Vertex nextVertex) { this.nextVertex = nextVertex; } public Vertex getNextVertex() { return nextVertex; } public void setNextEdge(Edge nextEdge) { this.nextEdge = nextEdge; } public Edge getNextEdge() { return nextEdge; }}

Our driver class, TestGraph.java will model directed graph:

TestGraph.javaclass TestGraph { public static void main(String args[]) { Vertex v0 = new Vertex(); Vertex v1 = new Vertex(); Vertex v2 = new Vertex(); Vertex v3 = new Vertex(); Edge e0 = new Edge(); Edge e1 = new Edge(); Edge e2 = new Edge(); Edge e3 = new Edge(); Edge e4 = new Edge(); Edge e5 = new Edge(); Edge e6 = new Edge();

TestGraph.java

v0.setLabel(0); v1.setLabel(1); v2.setLabel(2); v3.setLabel(3);

e0.setLabel(1); e1.setLabel(2); e2.setLabel(2); e3.setLabel(3); e4.setLabel(0); e5.setLabel(1); e6.setLabel(2);

TestGraph.java

v0.setNextVertex(v1); v1.setNextVertex(v2); v2.setNextVertex(v3); v0.setNextEdge(e0); e0.setNextEdge(e1); v1.setNextEdge(e2); e2.setNextEdge(e3); v3.setNextEdge(e4); e4.setNextEdge(e5); e5.setNextEdge(e6);

TestGraph.java

int vertex, edge; Vertex baseVertex = v0; while(baseVertex != null) { vertex = baseVertex.getLabel(); System.out.print("Vertex: " + vertex + " Edges: "); Edge baseEdge = baseVertex.getNextEdge(); while(baseEdge != null) { edge = baseEdge.getLabel(); System.out.print("(" + vertex + "," + edge + ") "); baseEdge = baseEdge.getNextEdge(); } System.out.println(); baseVertex = baseVertex.getNextVertex(); } }}

GENERAL IMPLEMENTATION

The general implementation requires dynamic creation of vertices and edges, i.e. methods insertVertex() and insertEdge(int, int).

We assume that vertices are labeled with integers starting with zero at the time of their creation.

insertVertex() method is almost identical to the regular insert method in linked lists.

insertVertex() METHOD

public void insertVertex() { Vertex v = new Vertex(); v.setLabel(vertexLabel); if(vertexLabel == 0) head = v; else tail.setNextVertex(v); tail = v; vertexLabel++; }

insertEdge(int,int) METHOD

The second method, insertEdge(int u, int v), takes two integers as arguments representing labels of vertices which we want to join.

First of all, we have to check if both vertices exist. If not, edge cannot be inserted. Otherwise, new edge node is created and labeled with v.

Finally, the newly created edge node should be linked to the linked list of adjacent vertices with respect to the vertex node with label u.

insertEdge(int,int) METHOD public void insertEdge(int u, int v) { Vertex base_u = find(u); Vertex base_v = find(v); if(base_u == null || base_v == null) System.out.println("Cannot insert edge between vertices " + u + " and " + v); else { Edge e = new Edge(); e.setLabel(v); if(base_u.getNextEdge() == null) base_u.setNextEdge(e); else { Edge temp = base_u.getNextEdge(); while(temp.getNextEdge()!= null) temp = temp.getNextEdge(); temp.setNextEdge(e); } } }

Graph.java

class Graph { private int vertexLabel; private Vertex head; private Vertex tail; public Graph() { vertexLabel = 0; head = tail = null; } public int getSize() { return vertexLabel; }

Graph.java

public void insertVertex() { Vertex v = new Vertex(); v.setLabel(vertexLabel); if(vertexLabel == 0) head = v; else tail.setNextVertex(v); tail = v; vertexLabel++; }

Graph.java public void insertEdge(int u, int v) { Vertex base_u = find(u); Vertex base_v = find(v); if(base_u == null || base_v == null) System.out.println("Cannot insert edge between vertices " + u + " and " + v); else { Edge e = new Edge(); e.setLabel(v); if(base_u.getNextEdge() == null) base_u.setNextEdge(e); else { Edge temp = base_u.getNextEdge(); while(temp.getNextEdge()!= null) temp = temp.getNextEdge(); temp.setNextEdge(e); } } }

Graph.java

public Vertex find(int x) { Vertex base = head; while(base.getLabel() != x) { base = base.getNextVertex(); if(base == null) break; } return base; }

Graph.java public void print() { int vertex, edge; Vertex baseVertex = head; while(baseVertex != null) { vertex = baseVertex.getLabel(); System.out.print("Vertex: " + vertex + " Edges: "); Edge baseEdge = baseVertex.getNextEdge(); while(baseEdge != null) { edge = baseEdge.getLabel(); System.out.print("(" + vertex + "," + edge + ") "); baseEdge = baseEdge.getNextEdge(); } System.out.println(); baseVertex = baseVertex.getNextVertex(); } }}

Graph.javaclass TestGraph { public static void main(String args[]) { Graph g = new Graph(); for(int i = 0; i < 4; i++) g.insertVertex(); g.insertEdge(0, 1); g.insertEdge(0, 2); g.insertEdge(1, 2); g.insertEdge(1, 3); g.insertEdge(3, 0); g.insertEdge(3, 1); g.insertEdge(3, 2); g.insertEdge(4, 4);

g.print(); }}

GRAPH TRAVERSALS

Depth First Search ( DFS )

To traverse graph means to visit all the vertices in some systematic order. Here we will consider Depth First Search (DFS) and Breadth First Search.

DFS is closely related to preorder traversal of a tree. Recall that preorder traversal visits each node before its children.

Preorder (vertex v)1 visit (v);2 for(each child w of v)3 Preorder(w);

To turn this into a graph traversal algorithm, we basically replace "child" by "neighbor" (adjacent vertex).

To prevent infinite recursion, we want to visit each vertex once. Here is the algorithm in pseudocode:

DFS(G, source)1 mark all vertices as not visited;2 traverse(vertex v)3 mark v as visited;4 for each neighbor of v5 if neighbor is not marked as visited6 traverse(neighbor);

DFS DEMO

0 1 3 4 5 2

IMPLEMENTATION

public void dfs() { System.out.print("dfs: "); boolean visited[] = new boolean[vertexLabel]; for(int i = 0; i < vertexLabel; i++) if(!visited[i]) traverse(i, visited);

System.out.println(); }

IMPLEMENTATION

private void traverse(int i, boolean visited[]) { if(!visited[i]) { Vertex v = find(i); visited[i] = true; Edge adjacent = v.getNextEdge(); while(adjacent != null) { int index = adjacent.getLabel(); System.out.print("(" + i + " " + index + ") "); adjacent = adjacent.getNextEdge(); traverse(index, visited); } } }

IMPLEMENTATION

Graph t = new Graph(); for(int i = 0; i < 5; i++) t.insertVertex(); t.insertEdge(1, 0); t.insertEdge(1, 3); t.insertEdge(3, 2); t.insertEdge(3, 4); t.dfs();

BFS is a traversal through a graph that finds all the vertices that are reachable from the source vertex.

The order of traversal is such that the algorithm explores all of the neighbors of a vertex before proceeding on the neighbors of its neighbors.

A vertex is discovered the first time it is encountered by the algorithm. A vertex is finished after all of its neighbors are explored.

BFSThe algorithm in pseudocode is:

BFS(G, source)1 create an empty queue Q;2 label all vertices as not-visited;3 mark the source vertex as visited and place it in Q;4 while(Q is not empty)5 remove the head from Q;6 mark it as visited;7 place its neighbors in the queue;

public void getDataAtHead() { return head.getData(); }

Class Queue requires this method :

BFS DEMO

0 1 2 3 4 5

Q = {}Q = {0}Q = {1, 2}

Q = {2, 3, 4, 5}Q = {3, 4, 5}Q = {4, 5}Q = {5}

Q = {}

IMPLEMENTATION

public void bfs() { System.out.print("bfs: "); boolean visited[] = new boolean[vertexLabel]; Queue q = new Queue(); for(int i = 0; i < vertexLabel; i++) if(!visited[i]) { q.enqueue(i); do { int entry = q.getDataAtHead(); if(visited[entry]) q.dequeue();

IMPLEMENTATION else { visited[entry] = true; q.dequeue(); Vertex v = find(entry); Edge adjacent = v.getNextEdge(); while(adjacent != null) { int index = adjacent.getLabel();

System.out.print("(" + entry + " " + index + ") ");

if(visited[index] == false) q.enqueue(index); adjacent = adjacent.getNextEdge(); } } } while(!q.isEmpty()); } System.out.println(); }

SHORTEST PATHS

Weighted graph is a graph that has nonnegative integer ( or real number ) associated with each edge.

Let G = ( V, E ) be a weighted directed graph and P = { v0, v1, ..., vk } be a directed path in G as in :

v0 v1 v2 vkw0 w1 w2 wk-1...

We define the weight of path P as a sum of weights of its constituent edges : w(P) = w0 + w1 + ... + wk-1

SHORTEST PATHS

The shortest path between two vertices u and v is then defined as a path between vertices u and v with minimum weight.

The single-source shortest-path problem is the problem of finding shortest paths between the specified vertex (source) and any other vertex in a weighted directed graph.

Dijkstra's algorithm solves the single-source shortest-path problem for the case in which all edges have nonnegative weights.

DIJKSTRA’S ALGORITHM

Dutch computer scientist,Turing Award ( 1972 ), ACM ( 2002 )

Graph algorithms, programming languages, Operating Systems

“Referring to computing as computer science is as calling surgery a knife science”Edsger Wybe Dijkstra

(1930-2002)

Solves the single-source shortest-paths problem for the case in which all edges have nonnegative weights. The algorithm uses the following procedures:

Initialization of a single sourceDistance of the source vertex is set to zero. All other distances are set to infinity.

d[v] denotes the distance from the source vertex and

p[v] denotes the predecessor of vertex v along the shortest path from source.

INITIALIZATION

In pseudo code initialization looks like :

Init(G, s)1 for each vertex v in G2 set d[v] to INFINITY3 set p[v] to NIL4 set d[s] to 0

DIJKSTRA’S ALGORITHMRelaxation.The process of relaxing an edge (u, v) consists of testing whether we can improve the shortest path to v found so far by going through u, and if so, updating d[v] and p[v].

d[v] = 9

d[u] = 5

2v d[u] + w(u,v) < d[v]d[v] = 7

RELAXATION

In pseudo code relaxation looks like :

Relax(u, v, w)1 if(d[v] > d[u] + w(u, v)2 d[v] = d[u] + w(u, v)3 p[v] = u

Algorithm maintains a set S of vertices whose final shortest path weights from the source have already been determined.

The algorithm repeatedly selects the vertex u from V - S with the minimum shortest-path estimate, inserts u into S and relaxes all edges leaving u.

In pseudo code algorithm looks like :

DIJKSTRA(G, w, s)1 Init(G, s)2 S is empty3 Q = V4 while(Q is not empty)5 u = extractMIN(Q)6 add u to S7 for each vertex v adjacent to u8 relax(u, v, w)

EXAMPLE

INF INF

S = {}

Q = {0(0), 1(INF), 2(INF), 3(INF), 4(INF), 5(INF)}

EXAMPLE

S = {0}

Q = {1(6), 2(2), 3(INF), 4(INF), 5(INF)}

EXAMPLE

S = {0, 2}

Q = {1(6), 3(INF), 4(3), 5(INF)}

EXAMPLE

S = {0, 2, 4}

Q = {1(4), 3(INF), 5(5)}4

EXAMPLE

S = {0, 2, 4, 1}

Q = {3(7), 5(5)}7

EXAMPLE

S = {0, 2, 4, 1, 5}

Q = {3(7)}7

EXAMPLE

S = {0, 2, 4, 1, 5, 3}

Q = {}7

JAVA DATA STRUCTURES

Java provides several interfaces ( collections ) that handle dynamic structures : List <E>, Map <K, V>, Set <E> and Queue <E>

Here we will take a look at List ( LinkedList and ArrayList ).

For more information visit :http://docs.oracle.com/javase/tutorial/collections/interfaces/index.html

interface List <E>

Defined in java.util package, interface list represents dynamic collection of objects.

It also provides a ListIterator that allows element insertion and replacement.

Some of the known implementing classes are ArrayList, LinkedList, Stack and Vector.

JavaList.javaimport java.util.List;import java.util.LinkedList;import java.util.Collections;

class JavaList { public static void main(String args[]){ LinkedList<Integer> l = new LinkedList<Integer>();

l.add(3); l.add(1); l.add(5); l.add(4); l.add(2); System.out.println(l);

l.remove(); l.removeLast(); System.out.println(l);

l.remove(1); System.out.println(l);

JavaList.java l.add(12); l.add(6); l.add(-2); System.out.println(l);

Collections.sort(l); System.out.println(l);

Collections.reverse(l); System.out.println(l);

Integer[] a = l.toArray(new Integer[l.size()]); System.out.println(a);

int[] n = new int[a.length]; for(int i = 0; i < a.length; i++) n[i] = (int)a[i];

System.out.println(n[0]); }}

ARRAYLIST

Another implementation of interface List.

LinkedList implements List with a doubly linked list.

ArrayList implements List with a dynamically resizing array.

When the original capacity is exceeded, new array ( double in size ) is created and original elements copied to it.

Deal.javaimport java.util.*;

class Deal { public static void main(String[] args) { if (args.length < 2) { System.out.println("Usage: Deal hands cards"); return; } int numHands = Integer.parseInt(args[0]); int cardsPerHand = Integer.parseInt(args[1]); // Make a normal 52-card deck. String[] suit = new String[] { "spades", "hearts", "diamonds", "clubs" };

Deal.java

String[] rank = new String[] { "ace", "2", "3", "4", "5", "6", "7", "8", "9", "10", "jack", "queen", "king" };

List<String> deck = new ArrayList<String>(); for (int i = 0; i < suit.length; i++) { for (int j = 0; j < rank.length; j++) { deck.add(rank[j] + " of " + suit[i]); } }

Deal.java

// Shuffle the deck. Collections.shuffle(deck); if (numHands * cardsPerHand > deck.size()) { System.out.println("Not enough cards."); return; } for (int i = 0; i < numHands; i++) System.out.println(dealHand(deck, cardsPerHand)); }

Deal.java

public static <E> List<E> dealHand(List<E> deck, int n) { int deckSize = deck.size(); List<E> handView = deck.subList(deckSize - n, deckSize); List<E> hand = new ArrayList<E>(handView); handView.clear(); return hand; }}

ALGORITHM DESIGN

Divide and Conquer algorithm design technique

Merge Sort, Towers of Hanoi

Dynamic Programming algorithm design technique

The Longest Common Subsequence

Greedy algorithm design technique

Activity Selection Problem, Minimum Spanning Trees

Fundamental algorithm design techniques are :

DIVIDE AND CONQUER

Divide the problem into a number of subproblems.

Conquer the subproblems by solving them recursively.

Combine solutions to subproblems into the solution of the original problem.

Divide and Conquer is a paradigm for designing algorithms which involves three steps at each lever of recursion :

MERGE SORT

Divide step : Divide the n-element sequence to be sorted into two subsequences of n / 2 elements.

Conquer step : sort the two subsequences recursively.

Combine step : merge the two subsequences to produces the sorted answer.

Merge Sort is divide-and-conquer sorting algorithm which operates intuitively as follows :

MERGE SORT

6 3 1 7 8 2 5

8 2 56 3 1 7

36 1 7 8 2 5

6 3 1 7 8 2

MERGE SORT

56 3 1 7 8 2

63 1 7 2 8 5

2 5 81 3 6 7

1 2 3 5 6 7 8

MergeSort.java

class MergeSort { private int[] a; public MergeSort(int[] a) { this.a = a; } public void sort() { divide(0, a.length - 1); }

MergeSort.java

public void divide(int p, int r) { int q = (p + r) / 2; if(p < r) { divide(p, q); divide(q + 1, r); merge(p, q, r); } }

MergeSort.java public void merge(int p, int q, int r) { int merged[] = new int[r - p + 1]; int i = p; int j = q + 1; int k = 0; while(i <= q && j <= r) { if(a[i] <= a[j]) { merged[k] = a[i]; i++; } else { merged[k] = a[j]; j++; } k++; }

MergeSort.java

while(i <= q) { merged[k] = a[i]; i++; k++; }

while(j <= r) { merged[k] = a[j]; j++; k++; } for(i = p; i <= r; i++) a[i] = merged[i - p]; }

MergeSort.java

public void print() { for(int i = 0; i < a.length; i++) System.out.print(a[i] + " "); System.out.println(); }}

TestMergeSort.java

class TestMergeSort { public static void main(String[] args) { int[] a = { 6, 3, 1, 7, 8, 2, 5 }; MergeSort ms = new MergeSort(a); ms.print(); ms.sort(); ms.print(); }}

ANALYSIS

The running time of recursive function divide() is :

T(n) = 12T(n / 2) + Θ(merge)

if n = 1if n > 1

Since Θ(merge) is Θ(n), we have :

T(n) = 12T(n / 2) + Θ(n)

if n = 1if n > 1

By application of Master Theorem we find that the running time of Merge Sort is : T(n) = Θ(n log n)

TOWERS OF HANOIIn the puzzle Towers of Hanoi the task is to move n disks from peg source to peg dest using auxiliary peg aux.

Only one disk can be moved at a time and bigger disk is not allowed to be placed on top of the smaller disk.

TOWERS OF HANOI

Our task is to write method move() which solves the puzzle. The task can be summarized as :

move(DISKS, 1, 3, 2);

where 1 denotes the source peg, 3 destination peg, 2 auxiliary peg and DISKS is the number of disks to be moved.

Solution is recursive and focuses on the hard step ( how to move the bottom disk from source to destination? ), not an easy one ( where to move the disk on top? )

TOWERS OF HANOIThe only way to move the bottom disk from peg 1 to peg 3 is to have remaining disks on peg 2 :

So the original task of moving 3 disks from 1 to 3 is divided into two smaller tasks : moving two disks from 1 to 2 using 3 and moving those two disks from 2 to 3 using 1.

TOWERS OF HANOI

This can be summarized in code as:

move(2, 1, 2, 3); System.out.println("Move disk from 1 to 3"); move(2, 2, 3, 1);

Assuming that the original call was move(3, 1, 3, 2).

The rest of the problem is done essentially in the same way - the beauty of recursion!

Since the number of disks is smaller by one in each recursive call, recursion stops when number of disks becomes zero.

Towers.java

class Towers { private int DISKS; public Towers(int n) { DISKS = n; } public void solve() { move(DISKS, 1, 3, 2); }

Towers.java

public void move(int n, int source, int dest, int aux) { if(n > 0) { move(n - 1, source, aux, dest); System.out.println("Move disk from " + source + " to " + dest); move(n - 1, aux, dest, source); } }}

TestTowers.java

class TestTowers { public static void main(String[] args) { Towers t = new Towers(3); t.solve(); }}

ANALYSIS

It can be shown that recursive solution to Towers of Hanoi puzzle produces the best possible solution.

The running time is T(n) = 2T(n - 1) + 1. Solving this recurrence ( homogeneous ) results in O(2n).

For 64 disks, assuming that disk can be moved in one second, it would take 5 × 1011 years to solve the puzzle. The age of the universe is estimated to be 10 billion (1010) years.

DYNAMIC PROGRAMMING

Dynamic programming paradigm is similar to divide and conquer - it solves problems by combining solutions to subproblems.

The major difference is that subproblems in dynamic programming are dependent, while they are not in the case of divide and conquer.

This technique is typically applied to optimization problems - finding the optimal solution among many solutions.

DYNAMIC PROGRAMMING

Characterize the structure of an optimal solution.

Recursively define the value of an optimal solution.

Compute the value of an optimal solution.

Construct an optimal solution from computed information.

Typical steps in dynamic programming involve :

THE LONGEST COMMON SUBSEQUENCE

Before we formally define the LCS problem, we introduce substring and subsequence problems since they all belong to string matching problems.

In its simplest form, the string matching algorithm takes as input two strings, pattern and text, and reports if pattern is contained in text.

Although there are many solutions to string matching problem, here we consider the solution based on finite state automata.

FINITE STATE AUTOMATA

Q is a finite set of states

q0 is the initial or start state

A ⊆ Q is a distinguished set of accepting states

Σ is a finite alphabet

δ is a function from Q × Σ → Q, called the transition function of M.

Finite state automaton M is a quintuple M = (Q, q0, A, Σ, δ) where :

0 1 2 3

FINITE STATE AUTOMATA

Finite state automata can be depicted by their transition state diagrams. The one that follows accepts strings that contain “abc” pattern :

SUBSTRING PROBLEM

a b cx y text

a b c pattern

b x ca y text

a b c pattern

class Substring { public static void main(String[] args) { if(isSubstring(args[0], args[1])) System.out.println(args[1] + " contains pattern " + args[0]); else System.out.println(args[1] + " does not contain pattern " + args[0]); }

Substring.java

public static boolean isSubstring(String patt, String text) { int i = 0; int j = 0; while(i < text.length()) { if(j == pattern.length()) break; if(text.charAt(i) == pattern.charAt(j)) j++; else j = 0; i++; }

Substring.java

if(j == pattern.length()) return true; else return false; }}

SUBSEQUENCE TESTSubsequence testing is somewhat similar to substring testing. We say that pattern is a subsequence of text, if letters of pattern appear in order, possibly separated.

b x ca y texta b c pattern

Here, pattern “abc” is not a substring of “abxcy”, but a subsequence.

SUBSEQUENCE TESTAs in the case of substrings, we use finite automata approach. The one that follows accepts strings that contain “abc” subsequence :

0 1 2 3a b c

**-c*-b*-a

Subsequence.java

class Subsequence { public static void main(String[] args) { if(isSubsequence(args[0], args[1])) System.out.println(args[1] + " contains sequence " + args[0]); else System.out.println(args[1] + " does not contain sequence " + args[0]); }

Subsequence.java

public static boolean isSubsequence(String patt, String text) { int i = 0; int j = 0; while(i < text.length()) { if(j == pattern.length()) break; if(text.charAt(i) == pattern.charAt(j)) j++; i++; }

Subsequence.java

if(j == pattern.length()) return true; else return false; }}

LONGEST COMMON SUBSEQUENCE

Consider pattern “abc” and text “bxycd”. Pattern “abc” is neither a substring nor a subsequence of “bxycd”.

The longest common subsequence of pattern and text in this case is “bc”.

x y cb d texta b c pattern

In the LCS problem, pattern and text have symmetric roles, and will be therefore referred to simply string A and string B.

DNA sequencingGenes are typically represented as sequences of four letters ACGT corresponding to four submolecules forming DNA. Similarity between two genes is then determined by computing the length of LCS.

File ComparisonThe UNIX diff utility compares two files by finding the LCS between the lines of two files.

LONGEST COMMON SUBSEQUENCE

The major importance of the LCS algorithm is in :

SOLUTION TO LCS

If the two strings start with the same letter, it is safe to choose that letter as the first character of the subsequence. For example if A = “abc” and B = “acd” we have :

As a first step, we will try to develop recursive solution. Here are some simple facts :

matchc d

subproblem

SOLUTION TO LCS

Suppose, the first two characters differ. It is not possible for them to be part of a subsequence - one or the other or both will have to be removed :

SOLUTION TO LCS

public int lcs(String A, String B, int i, int j) { if(i == A.length() || j == B.length()) return 0; else if(A.charAt(i) == B.charAt(j)) return 1 + lcs(A, B, i + 1, j + 1); else return max(lcs(A, B, i + 1, j), lcs(A, B, i, j + 1)); }

Here is the code based on recursive algorithm :

The worst case scenario is when strings are disjoint. Thus the expensive line is :

SOLUTION TO LCS

return max(lcs(A, B, i + 1, j), lcs(A, B, i, j + 1));

Yielding the recurrence T(n) = 2T(n - 1) whose solution is T(n) = O(2n). Not good.

In order to improve efficiency, dynamic programming uses memoization, i.e. intermediate values are stored in the array.

We define L[i][j] to be the length of LCS for Ai and Bj as :

L[i][j] =

0L[i - 1][j - 1] + 1

if i = 0 or j = 0if i, j > 0 and A[i] = B[j]

max(L[i][j - 1], L[i - 1][j]) if i, j > 0 and A[i] ≠ B[j]

SOLUTION TO LCS

Once the algorithm terminates, L[m][n] stores the LCS value, where m and n are A’s and B’s lengths respectively.

0 0 00 0 0

b e b d c

0 0 0 0

1 1 1 1 1

1 1 1 1 2

SOLUTION TO LCS

Based on dynamic programming algorithm, this is how the matrix would be filled if A = “abc” and B = “bebdc” :

LCS.javaclass LCS { public static void main(String[] args) { int n = lcs(args[0], args[1]); System.out.println("LCS: " + n); } public static int lcs(String A, String B) { int m = A.length(); int n = B.length(); int[][] L = new int[m + 1][n + 1]; for(int i = 0; i <= m; i++) L[i][0] = 0; for(int j = 0; j <= n; j++) L[0][j] = 0;

LCS.java

for(int i = 0; i < m; i++) { for(int j = 0; j < n; j++) { if(A.charAt(i) == B.charAt(j)) L[i + 1][j + 1] = L[i][j] + 1; else L[i + 1][j + 1] = max(L[i + 1][j], L[i][j + 1]); } } return L[m][n]; }

LCS.java

public static int max(int x, int y) { if(x > y) return x; else return y; }}

GREEDY ALGORITHMS

Given a set of activities S = { 1, 2, ..., n }, find a maximum size subset A of compatible activities.

Each activity has associated start time si and finish time fi, such that si < fi.

Two activities i and j are compatible if si ≥ fj or sj ≥ fi.

Important class of techniques. Roughly, globa!y optimal solution can be obtained by making loca!y optimal ( greedy ) choices.

Activity Selection Problem

ACTIVITY SELECTION PROBLEM

0 1 2 3 4 5 6 7 8 9 10 11 time

1s1 f1

2s2 f2

3s3 f3

4s4 f4

5s5 f5

ActivitySelector.java

class ActivitySelector { public static void main(String[] args) { int[] s = {1, 3, 5, 3, 8}; int[] f = {4, 5, 7, 8, 11}; // sorted LinkedList l = selectActivity(s, f); l.print(); }

ActivitySelector.java public static LinkedList selectActivity(int[] s,int[] f) { LinkedList l = new LinkedList(); l.insert(1); int j = 1; for(int i = 1; i < s.length; i++) { if(s[i] >= f[j]) { l.insert(i + 1); j = i; } } return l; }}

ELEMENTS OF GREEDY STRATEGY

It can be proved that Activity Selector always produces optimal solution ( maximum size ).

In many instances, greedy algorithms do not generate optimal solution.

If we applied greedy strategy to the problem of computing the shortest path between A and C, it wouldn’t work. Why?

100 10

ELEMENTS OF GREEDY STRATEGY

The Greedy Choice PropertyProof is required that a greedy choice at each step yields an optimal solution.

Optimal SubstructureA problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems.

In order to design a greedy algorithm, the problem must exhibit :

MINIMUM SPANNING TREES

Spanning Tree T of graph G is a tree that connects all of its vertices.

T1 (weight: 55)

T2 (weight: 59)

Minimum Spanning Tree is a spanning tree whose weight is minimal.

PRIM’S ALGORITHM

Choose arbitrary vertex u. Find vertex v adjacent to u such that w(u, v) is minimal. Place that edge into tree T.

Continue adding edges of minimum weight into tree T that are incident to vertices already in a tree and not forming a circuit.

Stop when the number of edges in T is n - 1.

Input : G = 〈 V, E 〉 weighted connected undirected graph with n vertices.Output : T - minimum spanning tree of G.

PRIM’S ALGORITHM

KRUSKAL’S ALGORITHM

Choose an edge with the minimum weight.

Continue adding edges of minimum weight into tree T that do not form a circuit.

Stop when the number of edges in T is n - 1.

Input : G = 〈 V, E 〉 weighted connected undirected graph with n vertices.Output : T - minimum spanning tree of G.

KRUSKAL’S ALGORITHM

cp2530 - appspot.combranko-cirovic.appspot.com/cp2530/cp2530.notes.pdf · analysis of algorithms...

Documents

ise 311 - 8 1 operations analysis operations analysis...

malware analysis - iswat · the malware analysis process...

design analysis and analysis

swot analysis, priorities, and competitive factor...

actuarial analysis actuarial analysis actuarial analysis...

situation analysis analysis table of contents executive...

discriminant analysis to describe multiple regression...

analysis techniques: flood analysis example with...

structural analysis - transactional analysis

presentation outline introduction chp analysis electrical...

financial analysis project - · pdf filefinancial analysis...

agilent u1881a and u1882a power measurement application...

complexity analysis : asymptotic analysis

marketing analysis - clv analysis

job analysis step by step guide - bnhexpertsoft.com ·...

strategic planning - methods and tools contents i.situation...

needs analysis, task analysis

systems analysis. chapter five systems analysis define...

game analysis - transactional analysis

geographical analysis overlay, cluster analysis, auto-...