csc 172 data structures. theoretical bound many good sorting algorithms run in o(nlogn) time. can...

CSC 172 DATA STRUCTURES

THEORETICAL BOUND Many good sorting algorithms run in

O(nlogn) time. Can we do better? Can we reason about algorithms not yet

invented?

THEORETICAL BOUND We can think of any comparison based

sorting algorithm as a decision tree

THEORETICAL BOUND A binary tree of depth d has at most 2^d

leaves. A binay tree with L leaves must have depth

at lease log(L) There are n! Arrangements of n items A tree with n! Leaves must have depth at

least log(n!) log(n!) is at least Ω(n log n)

THEORETICAL BOUNDlog(n!) = log(n(n-1)(n-2)...(2)(1))= log(n) + log(n-1) + .... + log(2) + log(1)>= log(n) + log(n-1) + .... + log(n/2) >= (n/2) log(n/2)>=(n/2) log(n) – (n/2)= Ω(n log n)

EXAMPLE

A post office routes mail letters are sorted into separate bags for

different geographical areas, each of these bags is itself sorted into batches for smaller sub-regions, and so on until they are delivered.

Radix Sort

RADIX SORT

Radix sort considers the structure of the keys Assume keys are represented in base M Sorting is done by comparing bits in the same

position

RADIX EXCHANGE SORT Examine bits from left to right Sort array with the respect to the leftmost bit

1XXX1XXX

1XXX0XXX

0XXX0XXX0XXX

1XXX1XXX

1XXX

RADIX EXCHANGE SORT Scanning PartitionREPEAT ;

scan top down to find a “1”scan bottom up to find a “0”exchange

UNTIL: scans cross

1XXX1XXX

1XXX0XXX

0XXX1XXX0XXX

1XXX1XXX

0XXX0XXX0XXX

1XXX1XXX

1XXX


1XXX1XXX

1XXX0XXX

0XXX0XXX0XXX

1XXX1XXX

1XXX0XXX0XXX

1XXX1XXX

1XXX


1XXX1XXX

1XXX0XXX

0XXX0XXX0XXX

1XXX1XXX

1XXX0XXX0XXX

1XXX1XXX

1XXX

TIME: O(b N)

DIVIDE AND CONQUER

MergesortQuicksort

Maximum SubsequenceFast Fourier Transform

DIVIDE AND CONQUER

Divide: Solve a subproblem by recursively calling on a subset of the data

Conquer: The solution to the larger problem is formed from the solutions to the sub problems.

Example One dimensional pattern recognition Input: a vector x of n floating point

numbers Output: the maximum sum found in any

contiguous subvector of the input.

X[2..6] or 187

84-23-939758-532659-4131

Obvious solution//check all pairsint sum; int maxsofar = 0;for (int i = 0; i<x.length;i++)for (int j = i; j<x.length;j++){ sum = 0; for (int k = i;k<=j;k++)

sum += x[k]; maxsofar = max(sum,maxsofar);

}

A better solution// check all pairsint sum; int maxsofar = 0;for (int i = 0; i<x.length;i++) {sum = 0;for (int j = i; j<x.length;j++){

sum += x[k]; // the sum of x[i..j]

maxsofar = max(sum,maxsofar); }}

Divide & Conquer

To solve a problem of size n, recursively solve two sub-problems of size of size n/2, and combine their solutions to yield a solution to the complete problem.

D&C for LCS

a b

x

ma mb

ma , mb or:

mc

Recursive D&D LCS

public int LCS(int[] x){return LCS(x,0,x.length-1);

}

public int LCS(int[] x, int low, high ){

// the hard part

Recursive D&C LCS

public int LCS(int[] x, int low, int high){if (low>high) return 0;if (low == high) return max(0,x[low]);int mid = (low + high) /2;return max(LCS(x,low,mid),

LCS(x,mid+1,high));

}// still need to do “mc”

How to find mc?

Note that mc consists of two partsThe part starting at the boundary and reaching upThe part ending at the boundary and reaching down

The sum of these is mc

mc

mclower mcup

public int LCS(int[]x,int low,int high){if (low>high) return 0;if (low == high) return max(0,x[low]);int mid = (low + high) /2;int umax = findUmax(x,mid+1,upper);int lmax = findLmax(x,low,middle);return max(LCS(x,lower,middle),

LCS(middle+1,upper), lmax + umax);

}

findLmaxint findLmax(int[]x,int low,int mid){double lmax = 0, sum = 0;for (int j = mid;j>=low;j--){

sum+=x[j];lmax = max(lmax,sum);

}return lmax;

} // Run Time? In terms of middle-lower?

findUmaxint findLmax(int[]x, int mid1,int high){int umax = 0, sum = 0;for (int j = midd1;j<=high;j++){

sum+=x[j];umax = max(lmax,sum);

}return umax;

} // Run Time? In terms of high-mid1?

Runtime of Div&Conq

Please : Read p 449 & 450 of Weiss

RUNTIMET(N) = 2T(N/2) + O(N)

isO(N log N)

In General

T (N )={ O (N logba) if a> bk

O (N k logN ) if a=bk

O (Nk) if a< bk

T (N )=aT (Nb

)+O (Nk) ;a≥1 ,b> 1

by telescoping

T (N )={ O (N logba) if a> bk

O (N k logbN ) if a=bk

O (Nk) if a< bk

T (N )=T (bm)=am∑i=0

m {bk

a }i

because

T (N )=aT (Nb

)+O (Nk)

csc 172 data structures. theoretical bound many good sorting algorithms run in o(nlogn) time. can...

Documents