rzc-computerscience

Topic 1011: Topics in Computer Science

Topic 1011: Topics in Computer Science

Dr J Frost ([email protected])

Last modified: 2nd November 2013

A note on these Computer Science slides

These slides are intended to give just an introduction to two key topics in Computer Science: algorithms and data structures. Unlike the other topics in the Riemann Zeta Club, theyre not intended to give a deeper knowledge required for solving difficult problems. The main intention is to provide an initial base of Computer Science knowledge, which may help you in your university interviews.

In addition to these slides, its highly recommended that you study the following Riemann Zeta slides to deal with more specific Computer-Science-ey questions:

Logic

Combinatorics

Pigeonhole Principle

?

Any box with a ? can be clicked to reveal the answer (this works particularly well with interactive whiteboards!).

Make sure youre viewing the slides in slideshow mode.

A: London

B: Paris

C: Madrid

For multiple choice questions (e.g. SMC), click your choice to reveal the answer (try below!)

Question: The capital of Spain is:

Slide Guidance

Contents

Time and Space Complexity

Big O Notation

Sets and Lists

Binary Search

Sorted vs Unsorted Lists

Hash Tables

Recursive Algorithms

Sorting Algorithms

Bubble Sort

Merge Sort

Bogosort

Time and Space Complexity

Suppose we had a list of unordered numbers, which the computer can only view one at a time.

1

4

2

9

3

7

Suppose we want to check if the number 8 is in the list.

If the size of the problem is , (i.e. there are cards in the list), then in the worst case, how much time will it take to check that some number is in there?

And given that the list is stored on a disc (rather than in memory), how much memory (i.e. space) do we need for our algorithm?

(Worst Case) Time Complexity

Time

If theres n items to check, and each takes some constant amount of time to check, so we know the time will be at most some constant times n.

Space Complexity

Space

We only need one slot of memory for the number were checking against the list, and one slot of memory for the current item in the list were looking at. So the space needed will be constant, and importantly, is not dependent on the size n of our list.

?

?

Big O notation

So the time and space complexity of an algorithm gives us a measure of how complex the algorithm is in terms of the time itll take, and the space required to do its handywork.

In mathematics, Big O notation is used to measure how some expression grows.

Suppose for example we have the function:

y = 2x3 + 10x2 + 3

y = 2x3

We can see that as becomes larger, the and terms become inconsequential, because the term dominates.

Since and for all positive , then

Were not interested in the scaling of 15, since this doesnt tell us anything about the growth of the function.

We say that:

i.e. grows cubically.

Big O notation

Formally, if , then there is some constant such that for all sufficiently large . So technically we could say that , because the big-O just provides an upper bound to the growth. But we would want to keep this upper bound as low as possible, so it would be more useful to say that .

1

4

2

9

3

7

While big-O notation has been around for centuries (particularly in number theory), in the 1950s, it started to be used to describe the complexity of algorithms.

Returning to our probably of finding a number in an ordered list, we can now express our time and space complexity using big-O notation (in terms of the list size ):

Time Complexity

Space Complexity

?

?

Remember that the constant scaling doesnt matter in big-O notation. So 1 is used to mean constant time/space.

Big O notation

Time Complexity

We say the time complexity of the algorithm is

constant time

linear time

quadratic time

exponential time

logarithmic time

Well see some examples of more algorithms and their complexity in a second, but lets see how we might describe algorithms based on their complexity

?

?

?

?

polynomial time

?

Sets and lists

A data structure is, unsurprisingly, some way of structuring data, whether as a tree, a set, a list, a table, etc.

Theres two main ways of representing a collection of items: lists and sets.

Lists

Does ordering of items matter?

Yes

Sets

No. and are the same set.

Duplicates allowed?

Yes

No

Example

?

?

?

?

Binary Search

1

3

4

7

9

12

15

20

Suppose we have either a set or list where the items are in ascending order. We want to determine if the number 14 is in the list.

Previously, when the items were unordered, we have to scan through the whole list (a simple algorithm where the time complexity was ).

But can we do better?

More specifically, seeing if an item is within an unsorted list is known as a linear search. (Because we have to check every item, taking time linear in n!)

Binary Search

1

3

4

7

9

12

15

20

This line represents where the number were looking for could possibly be. At the start of a binary search, the number could be anywhere.

A sensible thing to do is to look at the number just after the centre. That way, we can round down our search by half in one step.

In this case , so we know that if the number is in the collection, it must be in the second half of it.

Looking to see if 14 is in our list/set.

Binary Search

1

3

4

7

9

12

15

20

Now we looking halfway across what we have left to check. The number just after the halfway point is 15.

, so if 14 is in our collection, it must be to the left of this point.


Binary Search

1

3

4

7

9

12

15

20

Now wed compare our number 14 against the 12.

Now since , we now know that 14 is not in the collection of items.


Binary Search

1

3

4

7

9

12

15

20

We can see on each step, we half the number of items that need to be search. The number of steps (i.e. the time complexity) in terms of the number of items must therefore be:

Time Complexity

?

This makes sense when you think about it. If , then , i.e. we can half 16 four times until we get to 1, so only 4 steps are needed.

You might be wondering why we wrote instead of . This is because changing the base of a log only scales by a constant, and as we saw, big-O notation doesnt care about constant scaling. So the base is irrelevant.

Space Complexity

?

We only ever looking at one number at a time, so only need a constant amount of memory.

14

Sorted vs unsorted lists

Keeping our list sorted, or leaving it unsorted, has advantages either way. Weve already seen that keeping the list sorted makes it much quicker to see if the list contains an item or not.

What is the time complexity of the best algorithm to do each of these tasks?

Sorted

Unsorted

Seeing if the list contains a particular value.

Adding an item to the list.

We find the correct position to insert in time using a binary search. If we have some easy to splice in the new item somewhere in the middle of the list, without having to move the items after up to make space, then were done. However, if we do have to move up the items after (e.g. the values are stored in an array), then it takes time to shift the items up, hence its time overall.

We can just stick the item on the end!

Merging two lists (of size and respectively, where )

Start with the largest list, with its items. Then insert each of the items from the second list into it. Each insert operation costs time (from above). But theres items to add.

?

Easy again. Just have the end of the first list somehow link to the start of the first list so that theyre joined together.

?

?

?

Sorted vs unsorted lists

We can see that the advantage of keeping the list unsorted is that its much quicker to insert new items into the list. However, its much slower to find/retrieve an item in the list, because we cant exploit binary search.

So its a trade-off.

Sorted

Unsorted

Seeing if the list contains a particular value.

Adding an item to the list.

Merging two lists (of size and respectively, where )

Hash Table

Hash Tables are structure which allow us to do certain operations to do with collections much more quickly: e.g. inserting a value into the collection, and retrieving!

Imagine we had 10 buckets to put new values into. Suppose we had a rule which decided what bucket to put a value into:

Find the remainder when is divided by 10 (i.e. )

1

2

3

4

5

6

7

8

9

0

Hash Table

1

2

3

4

5

6

7

8

9

0

2

31

67

42

19

112

55

57

29

33

69

We can use our mod 10 hash function to insert new values into our hash table.

Hash Table

1

2

3

4

5

6

7

8

9

0

2

31

67

42

19

112

55

57

29

33

69

The great thing about a hash table is that if we want to check if some value is contained within it, we only need to check within the bucket it corresponds to.

e.g. Is 65 in our hash table?

Using the same hash function, wed just check Bucket 5. At this point, we might just do a linear search of the items in the bucket to see if the 65 matches. In this case, wed conclude that 65 isnt part of our collection of numbers.

Hash Table

1

2

3

4

5

6

7

8

9

0

2

31

67

42

19

112

55

57

29

33

69

Suppose weve put n items in a hash table with k buckets:

Operation

Time Complexity

O(n/k)

But only if our chosen hash function distributes items fairly evenly across buckets. But if our data tended to have 1 as the last digit, mod 10 would be a bad hash function because all the items would end up in the same bucket. The result would be that if we wanted to then check if 71 was in our collection, wed end up having to check every item still! Using mod p where p is a prime, reduces this problem.

O(1)

Presuming the hash function takes a constant amount of time to evaluate, we just stick the new item at the top of the correct bucket. We could always keep the buckets sorted. In which case, insertion would take O(log(n/k)) time.

Seeing if some number is contained in our collection.

Inserting a new item into the hash table structure.

?

?


The Towers of Hanoi is a classic game in which the aim is to get the tower (composed of varying sized discs) from the first peg to the last peg. Theres one spare peg available.

The only rule is that a larger disc can never be on top of a larger disc, i.e. on any peg the discs must be in decreasing size order from bottom to top.

Theres two questions we might ask:

1. For n discs, what is the minimum number of moves required to win?

2. Is there an algorithm which generates the sequence of moves required?


We can answer both questions at the same time.

Suppose HANOI(START,SPARE,GOAL,n) is a function which generates a sequence of moves for n discs, where START is the start peg, SPARE is the spare peg and GOAL is the goal peg.

Then we can define an algorithm as such:


Recursively solve the problem of moving n-1 pegs from the start peg to the spare peg.

i.e. HANOI(START,GOAL,SPARE, n-1)

(notice that weve made the original goal peg the new spare peg and vice versa)

Its quite common to define a function in terms of itself but with smaller arguments. Its recommended you first look at some of the examples in the Recurrence Relations section of the RZC Combinatorics slides to get your head around this.


Next move the 1 remaining disc (or whatever disc is at the top of the peg) from the start to goal peg.

i.e. MOVE(START,GOAL)


Finally, recursively solve the problem moving n-1 discs from the spare peg to the target peg.

i.e. HANOI(SPARE,START,GOAL, n-1)

Notice here that the original start peg is now the spare peg, and the spare peg the start peg.


Putting this together, we have the algorithm:

FUNCTION HANOI(START, SPARE, GOAL, n) =

HANOI(START, GOAL, SPARE, n-1),

MOVE(START, GOAL),

HANOI(SPARE, START, GOAL, n-1)

But just like recurrences in maths, we need a base case, to say what happens when we only have to solve the problem when n=1 (i.e. we have one disc):

FUNCTION HANOI(START, SPARE, GOAL, 1) =

MOVE(START, GOAL)


We can see this algorithm in action. If the 3 pegs are A, B and C, and we have 3 discs, then we want to execute HANOI(A, B, C, 3) to get our moves:

HANOI(A, B, C, 3)

= HANOI(A, C, B, 2), MOVE(A, C), HANOI(B, A, C, 2)

= HANOI(A, B, C, 1), MOVE(A, B), HANOI(C, B, A, 1), MOVE(A, C),

HANOI(B, C, A, 1), MOVE(B, C), HANOI(A, B, C, 1)

= MOVE(A, C), MOVE(A, B), MOVE(C, A), MOVE(A, C), MOVE(B, A),

MOVE(B, C), MOVE(A, C)

A

B

C


The same approach applies when counting the minimum number of moves.

Let F(n) be the number of moves required to move n discs to the target peg.

We require F(n-1) moves to move n-1 discs from the start to spare peg.

We require 1 move to move the remaining disc to the goal peg.

We require F(n-1) moves to move n-1 discs from the spare to goal peg.

This gives us the recurrence relation F(n) = 2F(n-1) + 1

And our base case is F(1) = 1, since it only requires 1 move to move 1 disc.

But just writing out the first few terms in this sequence, its easy to spot that the position-to-term formula is F(n) = 2n 1.

?

?

Sorting Algorithms

One very fundamental algorithm in Computer Science is sorting a collection of items so that they are in order (whether in numerical order, or some order weve defined).

Well look at the main well-known algorithms, and look at their time complexity.

2

31

67

42

19

112

55

Bubble Sort

2

31

67

42

19

112

55

At the end of the first pass*, we can guarantee that the largest number will be at the end of the list.

We then repeat the process, but we can now ignore the last number (because its in the correct position). This continues, until eventually on the last pass, we only need to compare the first two items.

* A pass in an algorithm means that weve looked through all the values (or some subset of them) within this stage. You can think of a pass as someone checking your university personal statement and making corrections, before you give this updated draft to another person for an additional pass.

Click to Animate

This looks at each pair of numbers first, starting with the 1st and 2nd, then the 2nd and 3rd , and swaps them if theyre in the wrong order:

Bubble Sort

2

31

67

42

19

112

55

O(n2)

The first pass requires n-1 comparisons, the next pass requires n-2 comparisons, and so on, giving us the sum of an arithmetic sequence.

So the exact number of comparison is n(n-1)

This is growth quadratic in n, i.e. O(n2)

Time Complexity?

?

2

31

67

42

19

112

55

Merge Sort

4

First treat each individual value as an individual list (with 1 item in it!)

Then we repeatedly merge each pair of lists, until we only have 1 big fat list.

2

31

67

42

19

112

55

4

31

42

19

55

2

67

112

4

31

42

19

55

2

67

112

4

Well go into more detail on this merge operation on the next slide.

Merge Sort

31

42

19

55

2

67

112

4

At each point in the algorithm, we know each smaller list will be in order.

Merging two sorted lists can be done quite quickly (click the button below):

Click to Animate

General jist: Start with a marker at the beginning of each list.

Compare the two elements at the markers. The lowest value gets put in the new list, and the marker at that item used moves up one. Then repeat!

New merged list

Merge Sort

O(n log n)

Each merging phase requires exactly n steps because when merging each pair of lists, each comparison puts an element in our new list. So theres exactly n comparisons.

Theres log2 n phases, because similarly to the binary search, each phase halves the number of mini-lists.

Time Complexity?

?

Bogosort

The Bogosort, also known as Stupid Sort, is intentionally a joke sorting algorithm, but provides some educational value. It simply goes like this:

Put all the elements of the list in a completely random order.

Check if the elements are in order. If so, youre done. If not, then go back to Step 1.

We can describe time complexity in different ways: the worst-case behaviour (i.e. the longest amount of time the algorithm can possibly take) and the average-case behaviour (i.e. how long we expect the algorithm to take on average)

Worst Case Time Complexity?

The algorithm theoretically may never terminate, because the order may be wrong every time.

Average Case Time Complexity?

O(n n!)

There are n! possible ways the items can be ordered. Presuming no duplicates in the list, theres a 1 in n! chance that the list is in the correct order. We therefore expect to have to repeat Step 1 n! times.

Each check in Step 2 requires checking all the elements, which is O(n) time.

It might be worth checking out the Geometric Distribution in the RZC Probability slides.

?

?

rzc-computerscience

Documents