cis435 week04

21
Medians & Order Statistics Data Structures & Algorithms

Upload: bansal-ashish

Post on 27-Jun-2015

35 views

Category:

Education


5 download

TRANSCRIPT

Page 1: Cis435 week04

Medians & Order Statistics

Data Structures & Algorithms

Page 2: Cis435 week04

Medians & Order Statistics

What are they? The ith order statistic of a set of n

elements is defined as the ith smallest element in the set. E.g., the minimum order statistic is the

first order statistic, the max is the last order statistic

The median is informally the “halfway” point – there are one (if n is odd) or two (if n is even)

Page 3: Cis435 week04

Medians & Order Statistics

This chapter deals with finding a particular order statistic in a set We know we can use a sorting

algorithm to find an order statistic in O(nlog2n) time, by sorting the data first

There are faster algorithms, however, that don’t require a sort

Page 4: Cis435 week04

Minimum and Maximum

A basic algorithm is (O(n)) – just scan the set and keep track of the smallest This is the best we can do – in order to find

the min (or max) we must compare every element; since we’ve done this in O(n) time, you can’t get any better

Type FindMin(Type data[], int length){ Type min = data[0]; for ( int i = 1 ; i < length ; ++i ) if ( data[i] < min ) min = data[i]; return min;}

Type FindMin(Type data[], int length){ Type min = data[0]; for ( int i = 1 ; i < length ; ++i ) if ( data[i] < min ) min = data[i]; return min;}

Page 5: Cis435 week04

Simultaneous min and max

Sometimes its useful to come up with both at the same timeWe can run separate algorithms, or modify the original to keep track of both What about finding the second smallest element?

Write an algorithm to compute the second smallest element. How many comparisons

are required?

Write an algorithm to compute the second smallest element. How many comparisons

are required?

Page 6: Cis435 week04

Selection in Expected Linear Time

A general solution would be more useful It would seem like the type of

problem that would be hard to solve – at least O(nlog2n)

In reality it can be accomplished in O(n), using a divide-and-conquer algorithm

Page 7: Cis435 week04

Randomized Select

Recall the Randomized QuickSort algorithm As in QuickSort, the idea is to partition the input

recursively Unlike QuickSort, Randomized Select only cares

about one of the partitions – the partition containing the order statistic you’re looking for

The expected running time for this algorithm is O(n)

Randomized Select requires the Randomized Partition algorithm previously discussed

Page 8: Cis435 week04

The Randomized Select Algorithm

RandomizedSelect(A, begin, end, i){

if (begin == end ) return A[begin];

q = RandomizedPartition(A, begin, end)k = q – begin + 1if ( i <= k )

return RandomizedSelect(A, begin, q, i)else

return RandomizedSelect(A, q+1, end, i-k)}

Page 9: Cis435 week04

The Randomized Select Algorithm

First, partition the array This guarantees that all the elements in

A[begin..q] are less than all the elements in A[q+1..end]

Then compute how many elements are in the array A[begin..q] This is just q-begin+1 (since begin may be

non-zero) This also happens to be the order statistic of

the partition element

Page 10: Cis435 week04

The Randomized Select Algorithm

Because of the partitioning, we know which partition the order statistic must be in If the order statistic is less than k, then

recursively search the left partition for order statistic i

If the order statistic is to the right of k, then recursively search the right partition for order statistic i – k We already know that k values are smaller than the

smallest element in this partition We’re looking for the (i-k)th smallest element in that

partition

Page 11: Cis435 week04

Analysis of Randomized Select

Worst case, O(n2) We could get unlucky and partition

around the largest or smallest remaining element

This is unlikely, since it’s randomized

The average case is somewhat more complicated (see the formula on 189) but amounts to O(n)

Page 12: Cis435 week04

Generic Programming

Generic programming is “programming using types as parameters”The idea of generic programming is to write code that is data-type independent Many algorithms and data structures that we

discuss will operate independently of data type Generic programming provides a way of

writing the code once, then specifying the data type to operate on later

Reference: The C++ Programming Language, by Bjarne Stroustrup

Reference: The C++ Programming Language, by Bjarne Stroustrup

Page 13: Cis435 week04

Generic Programming in C++

The principle of generic programming in C++ is implemented via templates Templates provide a way to represent

a wide range of general concepts and simple ways to combine them

Page 14: Cis435 week04

Template Functions

The C++ compiler deduces the template arguments from the function argumentsCalling this function is the same as calling any other function:int some_min = FindMin(some_array, SIZE);

template <typename Type>Type FindMin(Type data[], int length){ Type min = data[0]; for ( int i = 1 ; i < length ; ++i ) if ( data[i] < min ) min = data[i]; return min;}

template <typename Type>Type FindMin(Type data[], int length){ Type min = data[0]; for ( int i = 1 ; i < length ; ++i ) if ( data[i] < min ) min = data[i]; return min;}

Page 15: Cis435 week04

Template Functions

You are not limited to one template parameter Multiple parameters are listed as a

comma separated list:template <typename T, typename U> …

Template parameters aren’t even limited to typenames:

template <typename T, int i> …

Page 16: Cis435 week04

Template Functions

There are rare occasions when the compiler cannot deduce the type of the template argument E.g., when the argument is only used as a return

type In these cases, explicit specification can be used

template <typename T>T* create(){ return new T;}…SomeClass *p = create<SomeClass>();

template <typename T>T* create(){ return new T;}…SomeClass *p = create<SomeClass>();

Page 17: Cis435 week04

Template Functions

Template functions can also be overloaded, both with other template functions and with non-template functions This may be required in a number of

situations: For some types, you can use a different (more

efficient) algorithm Multiple type deductions can be made, such that

the compiler can’t decide which version to use

template <typename T> T sqrt(T);template <typename T> complex<T> sqrt(complex<T>);double sqrt(double);

template <typename T> T sqrt(T);template <typename T> complex<T> sqrt(complex<T>);double sqrt(double);

Page 18: Cis435 week04

Template Classes

C++ also provides for template classes

template <typename T>class SomeArray {public: SomeArray(); T& ItemAt(int index); void SetItemAt(int index, const T& value); …private: T m_data[SIZE];};

template <typename T>class SomeArray {public: SomeArray(); T& ItemAt(int index); void SetItemAt(int index, const T& value); …private: T m_data[SIZE];};

Page 19: Cis435 week04

Template Classes

Instantiating an object from a template class takes a little more work You must specify the type The resulting object can be used like any

other object

SomeArray<double> array_of_doubles;array_of_doubles.SetItemAt(0, 2.0);double d = array_of_doubles.ItemAt(0);

SomeArray<double> array_of_doubles;array_of_doubles.SetItemAt(0, 2.0);double d = array_of_doubles.ItemAt(0);

Page 20: Cis435 week04

Template Classes

Like function templates, template parameters are not limited to only generic types Other data can also be provided:

template <typename Type, int Storage>class SomeArray {public: SomeArray(); Type& ItemAt(int index); …private: Type m_data[Storage];};…SomeArray<double, 1000> array_of_doubles;

template <typename Type, int Storage>class SomeArray {public: SomeArray(); Type& ItemAt(int index); …private: Type m_data[Storage];};…SomeArray<double, 1000> array_of_doubles;

Page 21: Cis435 week04

Some Cautions About Templates

Templates provide a convenient way of writing an algorithm or data structure only onceFor each instantiated template, the compiler creates a separate piece of compiled code

E.g., SomeArray<double>, SomeArray<int>, and SomeArray<SomeClass> creates three different implementations of SomeArray in memory

Templates are considered a primary contributor to code bloat because of this property

Care should be taken in template classes to only include methods that depend on the templated type

Other functionality can be moved to a standalone function, or to a non-template base class