introduction data structures and algorithms (60-254)

31
Introduction Data Structures and Algorithms (60-254)

Upload: stephany-stewart

Post on 12-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction Data Structures and Algorithms (60-254)

IntroductionData Structures and Algorithms (60-254)

Page 2: Introduction Data Structures and Algorithms (60-254)

2

Who am I?

About Me• A Professor in the School of Computer Science.• Former Acting Director of the school (2004).• Co-founder/co-owner of ThotLab Games and Mindcrafters Corporation.• Past President of the Canadian Artificial Intelligence Association

(formerly known as the Canadian Society for the Computational Studies of Intelligence).

• Area of research is Search Algorithms (especially pathfinding algorithms for video games).

Page 3: Introduction Data Structures and Algorithms (60-254)

3

More about me, me, me.

• 2014: Published my first novel: Dream Box: Sounds of the Forest• 2005: Promoted to full professor• 2001: Joined the School of Computer Science in Windsor• 1996: Promoted to Associate Professor and granted tenure• 1991: Joined U. Regina at rank of Assistant Professor• 1991: PhD in Computing Science, University of Alberta• 1990: Part-time Lecturer and Research Associate at Waterloo• 1987: M.Math in Computer Science, University of Waterloo• 1984: B.Math Joint Honours Computer Science / Economics (Co-op)• 1972: First job – peeling potatoes for a penny per pound.

Page 4: Introduction Data Structures and Algorithms (60-254)

4

Administrative Information

• Instructor: Dr. Scott Goodwin• Office: Lambton Tower 5110, ext. 3774• Email: mailto:[email protected]• Web: www.gamebrains.ca/254• Office Hours: TR 4:00pm-5:00pm• Lecture: 01 TR 2:30pm-3:50pm (ER 3123)• Labs: 51 M 5:30pm – 6:50pm (ER 3119)

52 R 1:00pm – 2:20pm (ER 3119)53 T 4:00pm – 5:20pm (ER 3119)54 R 4:00pm – 5:20pm (ER 3119)

• Labs start during the week of September 14-17.

Page 5: Introduction Data Structures and Algorithms (60-254)

5

Graduate Assistants and Teaching Assistants• Roohollah Etemadi, Graduate Assistant• Sanjay Renukamurthy, Graduate Assistant• Jonathan Binder, Teaching Assistant• William Roeder, Teaching Assistant• Eric Smith, Teaching Assistant

Page 6: Introduction Data Structures and Algorithms (60-254)

6

Prerequisites

• Prerequisites• You are required to have passed a C programming course (that is, either 60-

141 or 60-206). No exceptions will be made.

• No Textbook Required. See References on handout.• Supplementary Material

• A set of course-notes, prepared by Dr. Mukhopadhay, is available for purchase from Document and Imaging Services. My lectures will be based on these course-notes.

Page 7: Introduction Data Structures and Algorithms (60-254)

7

Course Goals

• Teach• The principles of good algorithm design• The role of data structures.

• You will also learn• how to analyze algorithms - a skill that helps you to compare and evaluate

competing algorithms that solve the same problem.

• Labs are an important part of this course• Implement (in C, C++, C#, or JAVA) the algorithms and data structures that

you learn about in the classroom lectures.• These will help you in consolidating the concepts discussed in the lectures,

and also to check how practice matches theory.

Page 8: Introduction Data Structures and Algorithms (60-254)

8

Tentative Class ScheduleChapter Topic # Lectures

1 Introduction 1

2 Algorithm Analysis 2

3 Linear Lists 2

4 Recursion 4

5 Sorting 3

6 Non-Linear Lists 3

7 Hashing 2

8 Applications 5

Review 1

Page 9: Introduction Data Structures and Algorithms (60-254)

9

Evaluation Scheme

Lab Assignments (10) 15% (1.5% each)  

Lab Attendance (10 labs) 5% (0.5% each)  

Midterm 1 20% Thursday, October 8 (in class)

Midterm 2 20% Thursday, November 12 (in class)

Final Exam 40% Monday, December 14 (TBA)

Page 10: Introduction Data Structures and Algorithms (60-254)

10

Rules and Regulations

• No student is allowed to take a course more than two times without permission from the Dean.

• Midterm tests, which are missed for any reason whatsoever, cannot be made up.

• In the exceptional case that a student misses a midterm test for a valid reason, i.e. supported by appropriate documentation, the mark for that test will be carried over to the final. In case of a Doctor’s note, the student must submit a Student Medical Certificate signed by a Medical Doctor and the note must specifically state that the student was incapable of writing the exam on the day of the test.

• The final exam must be written in order to obtain a grade for the course.

• If the final exam is missed (for a valid reason), the student will write his/her final exam on the alternate exam day, December 22, 2015.

• If a student is sick, s/he must inform the instructor about his/her illness as soon as possible (within 7 days), and with a supporting doctor's note which clearly states s/he is not able to attend the exam/test/assignment.

• If a student has a medical condition, which may create problems during the term, s/he must inform the instructor in writing with supporting documents beforehand. No consideration will be made after the term is over (last day of classes).

• No extensions to the labs will be allowed, and no make-ups will be considered. If a student misses a lab assignment, the corresponding mark will be carried over to the final exam.

• If a student is caught adopting unfair means (e.g., plagiarism), that student will face serious consequences including official disciplinary procedures (see Course Information Handout).

Page 11: Introduction Data Structures and Algorithms (60-254)

11

Policy on cheating

• The instructor will put a great deal of effort into helping students to understand and learn the material in the course. However, the instructor will not tolerate any form of cheating.

• The instructor will report any suspicion of cheating to the Director of the School of Computer Science. If sufficient evidence is available, the Director will begin a formal process according to the University Senate Bylaws. The instructor will not negotiate with students who are accused of cheating but will pass all information to the Director of the School of Computer Science.

• Refer to the Course Information Handout and Senate Bylaw 31.

Page 12: Introduction Data Structures and Algorithms (60-254)

12

Data Structures and Algorithms: Introduction• What are algorithms?• Why is the study of algorithms worthwhile?• What is the role of algorithms relative to other technologies used in

computers?• Are all problems solvable by computer?• What is a well-specified computational problem?• What are hard problems?• How can we assess or compare different algorithms for the same

problem?

Page 13: Introduction Data Structures and Algorithms (60-254)

13

The Science of Computation

•Problem Solving•Algorithms•Data Structures•Analysis•Experimentation

Page 14: Introduction Data Structures and Algorithms (60-254)

14

Definitions

• Computational Problem:• Specifies what the output should be for each valid input.• Some problems are associated with a search space of potential solutions.

• Algorithm: • An algorithm is

1. a finite set of instructions2. clearly specified, unambiguous3. effective (can be carried out)for performing a computation or solving a problem. It is a mechanism that generates correct output from valid input.

Page 15: Introduction Data Structures and Algorithms (60-254)

15

Goal and more definitions

• Our goal: Choose or design an algorithm that• Explores the search space

• Achieves the input/output relationship• Is efficient

• Correctness: An algorithm is correct (sound) if every output generated for all (valid) inputs is a solution. (In other words, every output is a solution.)

• Completeness: An algorithm is complete if it generates outputs corresponding to every solution for all (valid) inputs. (In other words, every solution is an output.)

• Termination: finite number of steps for all valid inputs?• Time and Space Complexity: runtime, memory usage?• Optimality: returns the best solution?

Page 16: Introduction Data Structures and Algorithms (60-254)

16

A Problem

Problem:Given two positive integers m and n, where n m, find the greatest common divisor, gcd(m,n).

Search space: All numbers from 1 to n.Naive algorithm (exhaustive search):

Go through search space (from 1 to n)Keep track of largest number that divides both m and n.

Is there a more clever way of doing this?

Page 17: Introduction Data Structures and Algorithms (60-254)

17

Euclidean Algorithm

Yes, the Euclidean Algorithm (Euclid – c.350 B.C.E.)Pseudocode: Algorithm GreatestCommonDivisor

Input: Two positive integers, m and nOuptut: The gcd of m and n

while ( (r m mod n) 0)m nn r

Output n and STOP.

Page 18: Introduction Data Structures and Algorithms (60-254)

18

Examplem = 24, n = 9

r 24 mod 9 = 6m 9n 6r 9 mod 6 = 3m 6n 3r 6 mod 3 = 0

STOP

m = 9, n = 24r 9 mod 24 = 9m 24n 9

rest same as previous

Page 19: Introduction Data Structures and Algorithms (60-254)

19

Performance

• Quantification of performance of the algorithm.• Crucial parameters: time and space.• Called time and space complexity of the algorithm.• Will be discussed later in the course.

• For example:• Time complexity of GCD:

• Takes at most 2 log n steps, where n m• Thus, worst-case time complexity: O(log n)

Page 20: Introduction Data Structures and Algorithms (60-254)

20

Performance (iterations)

How does the algorithm explore the search space?Given m and n, how many times does “while” execute?First, we need the following result: Theorem 1: If m > n, then m mod n < m/2Proof: If n m/2, then the claim follows since: r = m mod n < n

Ex. [m=11,n=5]: 5 11/2 r = 11 mod 5 = 1 < 11/2 If n > m/2, then r = m mod n = m – n < m/2

Ex. [m=11,n=7]: 7 > 11/2 r = 11 mod 7 = 4 < 11/2 q.e.d.

Page 21: Introduction Data Structures and Algorithms (60-254)

21

Intuition

Consider sequence of reminders:r0 = m mod n < m/2r1 = n mod r0 < n/2r2 = r0 mod r1 < r0/2r3 = r1 mod r2 < r1/2r4 = r2 mod r3 < r2/2r5 = r3 mod r4 < r3/2 …..Second iteration: r1 < n/2Third iteration: r2 < r0/2 < m/4Fourth iteration: r3 < r1/2 < n/4Fifth iteration: r4 < r2/2 < r0/4 < m/8Sixth iteration: r5 < r3/2 < r1/4 < n/8

Thus, ri will become 0 in at most 2 log n iterations

Page 22: Introduction Data Structures and Algorithms (60-254)

22

Example

Suppose: m = 1989 and n = 1590Remainder sequence:

399 = 1989 mod 1590 399 < 1989/2 393 = 1590 mod 399 393 < 1590/2 6 = 399 mod 393 6 < 399/2 < 1989/43 = 393 mod 6 3 < 393/2 < 1590/40 = 6 mod 3 0 < 6/2 < 399/4 < 1989/8

We stop here, but if continue: < 1590/8 < 1989/16< 1590/16

Page 23: Introduction Data Structures and Algorithms (60-254)

23

Number of Steps

In 2 * 3 steps, ri reduced by factor of 8 2 * 4 ri 16 2 * 5 ri 32 2 * 6 ri 64 …… 2 * log n ri nThus, ri becomes 0 in at most 2 log n iterations.For m=1989, n=1590, 2*log 1590 = 22 iterations.Actual sequence is 399, 393, 6, 3, 0 … i.e., 5 iterations.Compare with naïve algorithm’s 1590 iterations!

Page 24: Introduction Data Structures and Algorithms (60-254)

24

Correctness

• We have to prove that the algorithm is correct.• Does the input/output relationship hold? • Above must be true for all valid inputs• Analogous to proving correctness of a Theorem in Math

How to prove this for Algorithm GCD?

We can prove this by induction over the number of iterations of the algorithm (see notes for the complete proof). The proof hinges on the claim that

gcd(m,n) = gcd(n,r).

Page 25: Introduction Data Structures and Algorithms (60-254)

25

Proof

• Claim 1: gcd(m,n) = gcd(n,r)• Proof:• m = q * n + r where 0 r n.• Any divisor of n and r must be a divisor of m. So gcd(n,r) | m. So gcd(n,r) is a common

divisor of m and n.• r = m - q*n• Any divisor of m and n must be a divisor of r. So gcd(m,n) | r. So gcd(m,n) is a common

divisor of n and r.• Suppose gcd(m,n) gcd(n,r). Since gcd(m,n) is a common divisor of n and r, then gcd(n,r)

must be > gcd(m,n). This leads to a contradiction since gcd(n,r) is a common divisor of m and n but greater than gcd(m,n).

• Hence gcd(m,n) must be the gcd(n,r).

Page 26: Introduction Data Structures and Algorithms (60-254)

26

Termination

• Show that algorithm terminates in a finite number of steps.• This must be true for every valid input.• Can we show this for Algorithm GCD?• We must show that ri goes to 0 in a finite number of steps.

Observe:• The sequence of remainders strictly decreases.• They are all non-negative.• Thus ri will become 0 in at most 2 log n steps.

Page 27: Introduction Data Structures and Algorithms (60-254)

27

Data Structures

• The study of different ways of organizing (storing) data.• Why?• Efficiency of algorithm depends on how data is organized.• Reason for studying data structures and algorithms together.

Page 28: Introduction Data Structures and Algorithms (60-254)

28

Example: Find the median

For example: The median of a list of n numbers is a number M such that:

n/2 numbers in the list are M, and n/2 … are M.

Many definitions of median, we take:If n is even:

Two medians: lower median and upper median,

Then, median is average of lower and upper medians. If n is odd:

Both medians (lower and upper) are the same.

Page 29: Introduction Data Structures and Algorithms (60-254)

29

Consider

Consider this problem:Given a sorted list of n numbers, find the median.

A crucial question: How should we store the list?

We store it in an array, A, thenThe median is found in constant time, O(1) !!

Median = (A[5]+A[6])/2 = (19+22)/2 = 20.5

2 5 9 16 19 22 26 27 30 311 2 3 4 5 6 7 8 9 10

Page 30: Introduction Data Structures and Algorithms (60-254)

30

Another approach

Whereas in a linked list:Traverse half of the list in n/2 steps, which is O(n) !

Quite simple stated: The way in which data is organized is crucial in complexity analysis. We will study the properties of many data structures in this course.

2 5 9 16 19 22

first

….

Page 31: Introduction Data Structures and Algorithms (60-254)

31

Additional Problems

• Design an efficient algorithm to determine if a list has repeated elements.

• Given a list of n elements find their minimum (or maximum).

• Given n points in the plane, find the pair(s) of points which are closest to each other.

• Given n points in the plane determine if any three are contained in a straight line.