cse 373: data structures and algorithms lecture 6: searching 1

46
CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

CSE 373: Data Structures and Algorithms

Lecture 6: Searching

1

Page 2: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Searching and recursion• Problem: Given a sorted array a of integers and

an integer i, find the index of any occurrence of i if it appears in the array. If not, return -1.– We could solve this problem using a standard

iterative search; starting at the beginning, and looking at each element until we find i

– What is the runtime of an iterative search?

• However, in this case, the array is sorted, so does that help us solve this problem more intelligently? Can recursion also help us?

2

Page 3: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Binary search algorithm

• Algorithm idea: Start in the middle, and only search the portions of the array that might contain the element i. Eliminate half of the array from consideration at each step.– can be written iteratively, but is harder to get right

• called binary search because it chops the area to examine in half each time– implemented in Java as method Arrays.binarySearch in java.util package

3

Page 4: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

4

7

16

20

37

38

43

0

1

2

3

4

5

6

min

mid (too big!)

max

i = 16

Binary search example

4

Page 5: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

4

7

16

20

37

38

43

0

1

2

3

4

5

6

min

mid (too small!)

max

i = 16

Binary search example

5

Page 6: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

4

7

16

20

37

38

43

0

1

2

3

4

5

6

min, mid, max (found it!)

i = 16

Binary search example

6

Page 7: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Binary search pseudocodebinary search array a for value i: if all elements have been searched, result is -1. examine middle element a[mid]. if a[mid] equals i, result is mid. if a[mid] is greater than i, binary search left half of a for i. if a[mid] is less than i, binary search right half of a for i.

7

Page 8: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Runtime of binary search• How do we analyze the runtime of binary search and

recursive functions in general?

• binary search either exits immediately, when input size <= 1 or value found (base case),or executes itself on 1/2 as large an input (rec. case)– T(1) = c– T(2) = T(1) + c– T(4) = T(2) + c– T(8) = T(4) + c– ...– T(n) = T(n/2) + c

• How many times does this division in half take place?

8

Page 9: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Divide-and-conquer• divide-and-conquer algorithm: a means for

solving a problem that first separates the main problem into 2 or more smaller problems, then solves each of the smaller problems, then uses those sub-solutions to solve the original problem– 1: "divide" the problem up into pieces– 2: "conquer" each smaller piece– 3: (if necessary) combine the pieces at the end to

produce the overall solution

– binary search is one such algorithm

9

Page 10: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Recurrences, in brief• How can we prove the runtime of binary search?

• Let's call the runtime for a given input size n, T(n).At each step of the binary search, we do a constant number c of operations, and then we run the same algorithm on 1/2 the original amount of input. Therefore:

– T(n) = T(n/2) + c– T(1) = c

• Since T is used to define itself, this is called a recurrence relation.

10

Page 11: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Solving recurrences• Master Theorem:

A recurrence written in the formT(n) = a * T(n / b) + f(n)

(where f(n) is a function that is O(nk) for some power k)has a solution such that

• This form of recurrence is very common for divide-and-conquer algorithms

k

k

k

k

k

a

ba

ba

ba

nO

nnO

nO

nT

b

,

),(

)log(

),(

)(

log

11

Page 12: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Runtime of binary search• Binary search is of the correct format:

T(n) = a * T(n / b) + f(n)

– T(n) = T(n/2) + c– T(1) = c

– f(n) = c = O(1) = O(n 0) ... therefore k = 0– a = 1, b = 2

• 1 = 20, therefore:T(n) = O(n0 log n) = O(log n)

• (recurrences not needed for our exams)

12

Page 13: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Why Sorting?

13

• Practical application– People by last name– Countries by population– Search engine results by relevance

• Fundamental to other algorithms

• Different algorithms have different asymptotic and constant-factor trade-offs– No single ‘best’ sort for all scenarios– Knowing one way to sort just isn’t enough

• Many to approaches to sorting which can be used for other problems

Page 14: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Problem statement

14

There are n comparable elements in an array and we want to rearrange them to be in increasing order

Pre:– An array A of data records– A value in each data record– A comparison function

• <, =, >, compareTo

Post:– For each distinct position i and j of A, if i < j then A[i] A[j]

– A has all the same data it started with

Page 15: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

Sorting Classification

In memory sorting External sorting

Comparison sorting(N log N)

Specialized Sorting

O(N2) O(N log N) O(N)# of tape accesses

• Bubble Sort• Selection Sort• Insertion Sort• Shellsort Sort

• Merge Sort• Quick Sort• Heap Sort

• Bucket Sort• Radix Sort

• Simple External Merge Sort• Variations

in place? stable?

Page 16: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

16

Comparison Sorting

comparison-based sorting: determine order through comparison operations on the input data: <, >, compareTo, …

Page 17: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

17

Bogo sort

• bogo sort: orders a list of values by repetitively shuffling them and checking if they are sorted

• more specifically:– scan the list, seeing if it is sorted– if not, shuffle the values in the list and repeat

• This sorting algorithm has terrible performance!– Can we deduce its runtime?

Page 18: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

18

Bogo sort codepublic static void bogoSort(int[] a) { while (!isSorted(a)) { shuffle(a); }}

// Returns true if array a's elements// are in sorted order.public static boolean isSorted(int[] a) { for (int i = 0; i < a.length - 1; i++) { if (a[i] > a[i+1]) { return false; } } return true;}

Page 19: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

19

Bogo sort code, helpers// Shuffles an array of ints by randomly swapping each// element with an element ahead of it in the array.public static void shuffle(int[] a) { for (int i = 0; i < a.length - 1; i++) { // pick random number in [i+1, a.length-1] inclusive int range = a.length-1 - (i + 1) + 1; int j = (int)(Math.random() * range + (i + 1)); swap(a, i, j); }}

// Swaps a[i] with a[j].private static void swap(int[] a, int i, int j) { if (i == j) return;

int temp = a[i]; a[i] = a[j]; a[j] = temp;}

Page 20: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

20

Bogo sort runtime

• How long should we expect bogo sort to take?– related to probability of shuffling into sorted order– assuming shuffling code is fair, probability equals

1 / (number of permutations of n elements)

– bogo sort takes roughly factorial time to run• note that if array is initially sorted, bogo finishes

quickly!– it should be clear that this is not satisfactory...

!nPnn

Page 21: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

21

O(n2) Comparison Sorting

Page 22: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

22

Bubble sort

• bubble sort: orders a list of values by repetitively comparing neighboring elements and swapping their positions if necessary

• more specifically:– scan the list, exchanging adjacent elements if they are not

in relative order; this bubbles the highest value to the top– scan the list again, bubbling up the second highest value– repeat until all elements have been placed in their proper

order

Page 23: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

23

• Traverse a collection of elements– Move from the front to the end– "Bubble" the largest value to the end using pair-

wise comparisons and swapping

512354277 101

1 2 3 4 5 6

Swap42 77

"Bubbling" largest element

Page 24: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

24

"Bubbling" largest element

• Traverse a collection of elements– Move from the front to the end– "Bubble" the largest value to the end using pair-

wise comparisons and swapping

512357742 101

1 2 3 4 5 6

Swap35 77

Page 25: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

25

"Bubbling" largest element

• Traverse a collection of elements– Move from the front to the end– "Bubble" the largest value to the end using pair-

wise comparisons and swapping

512773542 101

1 2 3 4 5 6

Swap12 77

Page 26: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

26

"Bubbling" largest element

• Traverse a collection of elements– Move from the front to the end– "Bubble" the largest value to the end using pair-

wise comparisons and swapping

577123542 101

1 2 3 4 5 6

No need to swap

Page 27: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

27

"Bubbling" largest element

• Traverse a collection of elements– Move from the front to the end– "Bubble" the largest value to the end using pair-

wise comparisons and swapping

577123542 101

1 2 3 4 5 6

Swap5 101

Page 28: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

28

"Bubbling" largest element

• Traverse a collection of elements– Move from the front to the end– "Bubble" the largest value to the end using pair-

wise comparisons and swapping

77123542 5

1 2 3 4 5 6

101

Largest value correctly placed

Page 29: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

29

Bubble sort codepublic static void bubbleSort(int[] a) { for (int i = 0; i < a.length; i++) { for (int j = 1; j < a.length - i; j++) { // swap adjacent out-of-order elements if (a[j-1] > a[j]) { swap(a, j-1, j); } } }}

Page 30: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

30

Bubble sort runtime• Running time (# comparisons) for input size n:

– number of actual swaps performed depends on the data; out-of-order data performs many swaps

)(O

2

)1(

)(1

2

1

0

1

0

1

0 1

n

nn

i

in

n

i

n

i

n

i

in

j

Page 31: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

31

Selection sort• selection sort: orders a list of values by

repetitively putting a particular value into its final position

• more specifically:– find the smallest value in the list– switch it with the value in the first position– find the next smallest value in the list– switch it with the value in the second position– repeat until all values are in their proper places

Page 32: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

32

Selection sort example

Page 33: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

33

Index0 1 2 3 4 5 6 7

Value27 63 1 72 64 58 14 9

1st pass1 63 27 72 64 58 14 9

2nd pass1 9 27 72 64 58 14 63

3rd pass1 9 14 72 64 58 27 63

Selection sort example 2

Page 34: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

34

Selection sort codepublic static void selectionSort(int[] a) { for (int i = 0; i < a.length; i++) { // find index of smallest element int min = i; for (int j = i + 1; j < a.length; j++) { if (a[j] < a[min]) { min = j; } }

// swap smallest element with a[i] swap(a, i, min); }}

Page 35: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

35

Selection sort runtime• Running time for input size n:– in practice, a bit faster than bubble sort. Why?

)(O

2

)1(

)(

)1)1((1

2

1

0

1

0

1

0

1

0 1

n

nn

i

in

in

n

i

n

i

n

i

n

i

n

ij

Page 36: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

36

Insertion sort

• insertion sort: orders a list of values by repetitively inserting a particular value into a sorted subset of the list

• more specifically:– consider the first item to be a sorted sublist of length 1– insert the second item into the sorted sublist, shifting the

first item if needed– insert the third item into the sorted sublist, shifting the

other items as needed– repeat until all values have been inserted into their proper

positions

Page 37: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

37

Insertion sort• Simple sorting algorithm.– n-1 passes over the array– At the end of pass i, the elements that occupied

A[0]…A[i] originally are still in those spots and in sorted order.

2 8 15 1 17 10 12 5

0 1 2 3 4 5 6 7

1 2 8 15 17 10 12 5

0 1 2 3 4 5 6 7

afterpass 2

afterpass 3

2 15 8 1 17 10 12 5

0 1 2 3 4 5 6 7

Page 38: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

38

Insertion sort example

Page 39: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

39

Insertion sort codepublic static void insertionSort(int[] a) { for (int i = 1; i < a.length; i++) { int temp = a[i];

// slide elements down to make room for a[i] int j = i; while (j > 0 && a[j - 1] > temp) { a[j] = a[j - 1]; j--; }

a[j] = temp; }}

Page 40: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

40

Insertion sort runtime• worst case: reverse-ordered elements in

array.

• best case: array is in sorted ascending order.

• average case: each element is about halfway in order.

)(

2

)1()1(...321

2

1

1

nO

nnni

n

i

)(111

1

nOnn

i

)(

4

)1())1(...321(

2

1

22

1

1

nO

nnn

in

i

Page 41: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

41

Comparing sorts

• We've seen "simple" sorting algos. so far, such as:– selection sort– insertion sort

• They all use nested loops and perform approximately n2 comparisons

• They are relatively inefficient

comparisons swaps

selection n2/2 n

insertionworst: n2/2best: n

worst: n2/2best: n

Page 42: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

42

Average case analysis• Given an array A of elements, an inversion is an

ordered pair (i, j) such that i < j, but A[i] > A[j]. (out of order elements)

• Assume no duplicate elements.

• Theorem: The average number of inversions in an array of n distinct elements is n (n - 1) / 4.

• Corollary: Any algorithm that sorts by exchanging adjacent elements requires O(n2) time on average.

Page 43: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

43

Shell sort description• shell sort: orders a list of values by comparing

elements that are separated by a gap of >1 indexes– a generalization of insertion sort– invented by computer scientist Donald Shell in 1959

• based on some observations about insertion sort:– insertion sort runs fast if the input is almost sorted– insertion sort's weakness is that it swaps each

element just one step at a time, taking many swaps to get the element into its correct position

Page 44: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

44

Shell sort example

• Idea: Sort all elements that are 5 indexes apart, then sort all elements that are 3 indexes apart, ...

Page 45: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

45

Shell sort codepublic static void shellSort(int[] a) { for (int gap = a.length / 2; gap > 0; gap /= 2) { for (int i = gap; i < a.length; i++) { // slide element i back by gap indexes // until it's "in order" int temp = a[i]; int j = i; while (j >= gap && temp < a[j - gap]) { a[j] = a[j – gap]; j -= gap; } a[j] = temp; } }}

Page 46: CSE 373: Data Structures and Algorithms Lecture 6: Searching 1

46

Sorting practice problem• Consider the following array of int values.

[22, 11, 34, -5, 3, 40, 9, 16, 6]

(a) Write the contents of the array after 3 passes of the outermost loop of bubble sort.

(b) Write the contents of the array after 5 passes of the outermost loop of insertion sort.

(c) Write the contents of the array after 4 passes of the outermost loop of selection sort.

(d) Write the contents of the array after a pass of bogo sort. (Just kidding.)