algorithms november 27, 2001. administrivia homework assignment 6 –if you forgot to put your name...

60
Algorithms November 27, 2001

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Algorithms

November 27, 2001

Administrivia

• Homework Assignment 6– If you forgot to put your name on it, let me know

• Homework Assignment 7– Due next Tuesday

• Lab 6 (Visual Basic Part 2)– This week; due Friday

The big picture

• We built a computer• We built an operating system to control the

computer• We attached the computer to a network• We wrote a compiler to make programming the

computer easier• We share CPU and disk across the network

• Need to talk about algorithms

Algorithms

• Recipes for doing computations

• The underpinnings of programming – Think out your algorithm– Show that it works– Determine it’s efficiency– Write it as a program

What is an algorithm

• Algorithm is a recipe

• Has – Inputs

– Rules

– Evaluation Criteria

– Output

When do we use algorithms?

• Always!• Assignment 5

– Step 1 -- Create a message of between 150 and 200 characters that you wish to transmit.

– Step 2 -- Give an encoding of the alphabet – Step 3 -- Use the compression ideas we discussed to

compress your message. – Step 4 -- Write your compressed message as a

sequence of hexadecimal digits in this encoding. – Step 5 -- Now you are ready to create the message

to be hidden. Your message will …– Step 6 -- We now consider a picture that could be

displayed on your web page.

Examples of problems

• Baking cookies

• Putting things in alphabetical order

• Being a web search engine

Chocolate chip cookies

Chocolate chip cookies

• Input– flour (2 ¼ c)– baking soda (1t) – salt (1t)– butter (1c)– granulated sugar (3/4 c)– brown sugar(3/4c) – vanilla(1t) – eggs (2)– chocolate chip morsels (2c)– chopped nuts (1c)

• Output– 5 dozen cookies

Chocolate chip cookies

• Steps in the algorithm– Combine flour, baking soda, and salt in small bowl.– Beat butter, granulated sugar, brown sugar and vanilla in

large bowl – Add eggs one at a time Beating after adding each egg– Gradually beat in flour mixture– Stir in morsels and nuts– Drop by rounded tablespoons onto ungreased baking

sheets– Bake 9-11 minutes– Let stand for 2 minute

Chocolate chip cookie algorithm

• Primitives– Inputs

• Flour, baking soda, salt, butter, brown sugar, granulated sugar, vanilla, egg, morsels, nuts

• Alternatively, chocolate chip cookie mix

• Alternatively, wheat, sugar cane, hen, …

– Operators• Combine, Beat, Gradually beat, Stir, Drop, Bake, Let

stand

Chocolate chip cookie algorithm

• Execution– First 2 steps can be done in parallel?

• Parbegin (Combine(),Beat()) Parend

– Machine dependencies• Ovens vary (Bake 9-11 minutes)

• Ingredients vary and so need to be handled differently

Chocolate chip cookie algorithm

• Algorithm testing– Proof of the pudding is in the eating– How do we mechanize this?

Chocolate chip cookie algorithm

• Comparing different algorithms– Quality of input/output map– User time– Machine (oven) time

Putting things in alphabetical order

• Data set sizes– Course list for COS 111 40 students

– PU directory assistance 10,000 people

– Manhattan phone book 1 million people

– Social Security database 1 billion records

– Long distance call billing records 100 billion/year

• Different methods for different tasks– Fast for large

– Simple for small

A simple method for sorting

• Find smallest value -- put it first in list

• Find second smallest value -- put it second

• …

• Find next smallest value – put it next

• …

• When no more values, you’re done

How it works

57

190

219

34

How it works

Find smallest value -- put it first in list

57

190

219

34

34

190

219

57

How it works

Find second smallest value -- put it second

57

190

219

34

34

190

219

57

34

57

219

190

How it works

Finish the sorting

57

190

219

34

34

190

219

57

34

57

219

190

34

57

190

219

A simple method for sorting

• To sort array x = {x[1],x[2], … , x[n]}

For I = 1 to n For J = I+1 to n

If (x[I] > x[J]) Then swap their valuesnext

next

Another sorting algorithm

• Sorting by Merging

• Key idea It’s easy to merge 2 sorted lists

• Sort larger lists by – Sort smaller lists– Merge the results

• How do we sort smaller lists?

Merging 2 sorted lists

190

219

463

155

255

355

Merging 2 sorted lists

190

219

463

155

255

355

Start at the top of each list

Merging 2 sorted lists

190

219

463

155

255

355

190 is bigger than 155

Merging 2 sorted lists

190

219

463

155

255

355

155

Record 155 and move the arrow

Merging 2 sorted lists

190

219

463

155

255

355

155

190

190 is less than 255

Merging 2 sorted lists

190

219

463

155

255

355

155

190

219

255

355

463Finished when at the end of each list

Sort then merge

157

227

345

134

157

227

345

134

Subdivide

Sort then merge

157

227

345

134

157

227

345

134

157

227

134

345

Subdivide Sort piecesBy merging

Sort then merge

157

227

345

134

157

227

345

134

157

227

134

345

134

157

227

345

Subdivide Sort piecesBy merging

Merge

SortMerge algorithm

Function SortMerge(x,1,n)If n = 1 then

Return

End if

Mid = (1+ n)/2

SortMerge(x,1, Mid )

SortMerge(x, Mid +1, n)

Merge(x,1, Mid , Mid +1, n)

End Function

Does it work?

• Have to be careful about stopping

• There are always a lot of things going on

Sort(n) Sort(n/2)

Sort(n/2)

Merge

Sort(n/4)

Sort(n/4)

Merge

Sort(n/2)

Merge

Sort(n/8)

Sort(n/8)

Merge

Sort(n/4)

Merge

Sort(n/2)

Merge

Divide and conquer

• Use recursion– reduce solving for problem of size n to solving

two problems of size n/2 – then combine the solutions

• S(n) = 2 S(n/2) + M(n/2,n/2)

• Solving a sorting problem of size n requires solving 2 sorting problems of size n/2 and doing a merge of 2 sets of size n/2

Comparing running times

N Insertion (ms) SortMerge(ms)

100 1 0

200 2 0

1000 58 1

10,000 5841 11

100,000 626943 162

1,000,000 70626916 3421

Comparing running times

N Insertion (ms) SortMerge(ms)

100 1 0

200 2 0

1000 58 1

10,000 5841 11

100,000 626943 162

1,000,000 70626916 3421

Reducing 20 hours to 3 seconds

Searching

• Once a list is in alphabetical order, how do you find things in it?

• For example, is COS 111 on the list of courses that satisfy the (EC) Epistemology and Cognition requirement?

EC coursesPHI 201

PHI 204

PHI 301

PHI 304

PHI 312

PHI 321

PHI 333

PHI 338

PSY 255

PSY 306

PSY 307

PSY 316

AAS 391

ANT 201

COS 302

FRS 135

FRS 137

GER 306

HUM 365

LIN 213

LIN 302

LIN 306

LIN 315

PHI 200

Searching for COS 111

Compare to the middle

AAS 391

ANT 201

COS 302

FRS 135

FRS 137

GER 306

HUM 365

LIN 213

LIN 302

LIN 306

LIN 315

PHI 200

PHI 201

PHI 204

PHI 301

PHI 304

PHI 312

PHI 321

PHI 333

PHI 338

PSY 255

PSY 306

PSY 307

PSY 316 COS 111

Searching

Compare to the middle

If smaller search first half

If larger search second half

AAS 391

ANT 201

COS 302

FRS 135

FRS 137

GER 306

HUM 365

LIN 213

LIN 302

LIN 306

LIN 315

PHI 200

PHI 201

PHI 204

PHI 301

PHI 304

PHI 312

PHI 321

PHI 333

PHI 338

PSY 255

PSY 306

PSY 307

PSY 316 COS 111

Repeat

Compare to the middle

If smaller search first half

If larger search second half

AAS 391

ANT 201

COS 302

FRS 135

FRS 137

GER 306

HUM 365

LIN 213

LIN 302

LIN 306

LIN 315

PHI 200 COS 111

Building indicesPHI 201

PHI 204

PHI 301

PHI 304

PHI 312

PHI 321

PHI 333

PHI 338

PSY 255

PSY 306

PSY 307

PSY 316

AAS 391

ANT 201

COS 302

FRS 135

FRS 137

GER 306

HUM 365

LIN 213

LIN 302

LIN 306

LIN 315

PHI 200

AAS

ANT

COS

FRS

GER

HUM

LIN

PHI

PSY

Search indices then dataPHI 201

PHI 204

PHI 301

PHI 304

PHI 312

PHI 321

PHI 333

PHI 338

PSY 255

PSY 306

PSY 307

PSY 316

AAS 391

ANT 201

COS 302

FRS 135

FRS 137

GER 306

HUM 365

LIN 213

LIN 302

LIN 306

LIN 315

PHI 200

AAS

ANT

COS

FRS

GER

HUM

LIN

PHI

PSY

COS 111

How do we describe algorithms?

• Pseudocode– Combines English, Visual Basic constructs

– Works with various types of primitives• Could be + - / *

• Could be more complex things

– Describes how data is organized

– Describes operations on the data

– Is meant to be higher level than programming

Searching with indices (pseudocode)

• Build the indices– Do this by going through the list and

determining where department names change– Store the results in an array called Indices

• Search the indices– Do a binary search on the array Indices

• Do this by comparing to the middle element– Then use binary search to compare to the upper half– Or use binary search to compare to the lower half

Building a web search engine

• Crawl the web• Organize the results for fast query processing• Process queries

Crawl the web

• Every month use TCP/IP to go to all reachable web pages– 1.5B pages, 10 Kbytes/page, so 15 terabytes

• Can compress an average page to 3Kbytes

• Numeracy– Crawl 1.5B pages in 14 days so

• Crawl 100M pages per day• Crawl 4M pages per hour• Crawl 1,000 pages per second

Organize the results

• Put into alphabetical order• Build indices• Make multiple copies so that searching can

proceed in parallel.

• When you update, you rebuild the indices

Process queries

• Look up indices

• Look up words/phrases– Advertiser can buy a word or phrase

• This search gives you internal addresses of web pages– Look them up to build results page

Searching time

• Want to answer a query in less than ½ second

• Use PageRank to get good results

Page Rank

• The web is a collection of links– A document’s importance is determined by

• How many pages point to it

• How important those pages are

– This is its PageRank

• Used for determining– How often to crawl a page– How to order pages presented.

Remaining subtask

• Matching strings– Is this the word computer?

• Comparing strings– Did the word computer occur before or after?

How does string matching work?

• State machines – Move along states as long as you keep

matching– Back off when you miss a match

State machine – looking for abcd

Read a Read b Read c

Read d

Other

Other

Other

Sa Sb ScSd

OK

State machine – looking for abcdRead a Read b Read c

Read d

Other

Other

Other

Sa Sb ScSd

OK

What happens if input is abccadbacabcd?

Sa Sb Sc Sd Sa Sb Sa Sa Sb Sa Sb Sc Sd OK

State machine – looking for abcdRead a Read b Read c

Read d

Other

Other

Other

Sa Sb ScSd

OK

What happens if input is abcabcd?

Sa Sb Sc Sd Sa Sa Sa Sa

State machine – looking for abcd

Read a Read b Read c

Read d

Other

OtherOther

Sa Sb ScSd

OK

Read a Read a

Read a

Larger search challenges

• Allow strings to have don’t cares– Starts with a and ends with e– Has come number of copies of the substring ab

• Finding strings close to your string– For spelling corection

Algorithms -- summary

• Methods of modeling processes

• Understand at a high level

• Make sure your reasoning is correct

• Worry about efficiency in situations where that matters

• Write as pseudocode

What’s next

• Problems for which there are no algorithms

• Problems for which all algorithms run slowly

• Applications of problems where algorithms run slowly