algorithms november 27, 2001. administrivia homework assignment 6 –if you forgot to put your name...
Post on 22-Dec-2015
215 Views
Preview:
TRANSCRIPT
Administrivia
• Homework Assignment 6– If you forgot to put your name on it, let me know
• Homework Assignment 7– Due next Tuesday
• Lab 6 (Visual Basic Part 2)– This week; due Friday
The big picture
• We built a computer• We built an operating system to control the
computer• We attached the computer to a network• We wrote a compiler to make programming the
computer easier• We share CPU and disk across the network
• Need to talk about algorithms
Algorithms
• Recipes for doing computations
• The underpinnings of programming – Think out your algorithm– Show that it works– Determine it’s efficiency– Write it as a program
When do we use algorithms?
• Always!• Assignment 5
– Step 1 -- Create a message of between 150 and 200 characters that you wish to transmit.
– Step 2 -- Give an encoding of the alphabet – Step 3 -- Use the compression ideas we discussed to
compress your message. – Step 4 -- Write your compressed message as a
sequence of hexadecimal digits in this encoding. – Step 5 -- Now you are ready to create the message
to be hidden. Your message will …– Step 6 -- We now consider a picture that could be
displayed on your web page.
Examples of problems
• Baking cookies
• Putting things in alphabetical order
• Being a web search engine
Chocolate chip cookies
• Input– flour (2 ¼ c)– baking soda (1t) – salt (1t)– butter (1c)– granulated sugar (3/4 c)– brown sugar(3/4c) – vanilla(1t) – eggs (2)– chocolate chip morsels (2c)– chopped nuts (1c)
• Output– 5 dozen cookies
Chocolate chip cookies
• Steps in the algorithm– Combine flour, baking soda, and salt in small bowl.– Beat butter, granulated sugar, brown sugar and vanilla in
large bowl – Add eggs one at a time Beating after adding each egg– Gradually beat in flour mixture– Stir in morsels and nuts– Drop by rounded tablespoons onto ungreased baking
sheets– Bake 9-11 minutes– Let stand for 2 minute
Chocolate chip cookie algorithm
• Primitives– Inputs
• Flour, baking soda, salt, butter, brown sugar, granulated sugar, vanilla, egg, morsels, nuts
• Alternatively, chocolate chip cookie mix
• Alternatively, wheat, sugar cane, hen, …
– Operators• Combine, Beat, Gradually beat, Stir, Drop, Bake, Let
stand
Chocolate chip cookie algorithm
• Execution– First 2 steps can be done in parallel?
• Parbegin (Combine(),Beat()) Parend
– Machine dependencies• Ovens vary (Bake 9-11 minutes)
• Ingredients vary and so need to be handled differently
Chocolate chip cookie algorithm
• Algorithm testing– Proof of the pudding is in the eating– How do we mechanize this?
Chocolate chip cookie algorithm
• Comparing different algorithms– Quality of input/output map– User time– Machine (oven) time
Putting things in alphabetical order
• Data set sizes– Course list for COS 111 40 students
– PU directory assistance 10,000 people
– Manhattan phone book 1 million people
– Social Security database 1 billion records
– Long distance call billing records 100 billion/year
• Different methods for different tasks– Fast for large
– Simple for small
A simple method for sorting
• Find smallest value -- put it first in list
• Find second smallest value -- put it second
• …
• Find next smallest value – put it next
• …
• When no more values, you’re done
A simple method for sorting
• To sort array x = {x[1],x[2], … , x[n]}
For I = 1 to n For J = I+1 to n
If (x[I] > x[J]) Then swap their valuesnext
next
Another sorting algorithm
• Sorting by Merging
• Key idea It’s easy to merge 2 sorted lists
• Sort larger lists by – Sort smaller lists– Merge the results
• How do we sort smaller lists?
Merging 2 sorted lists
190
219
463
155
255
355
155
190
219
255
355
463Finished when at the end of each list
Sort then merge
157
227
345
134
157
227
345
134
157
227
134
345
134
157
227
345
Subdivide Sort piecesBy merging
Merge
SortMerge algorithm
Function SortMerge(x,1,n)If n = 1 then
Return
End if
Mid = (1+ n)/2
SortMerge(x,1, Mid )
SortMerge(x, Mid +1, n)
Merge(x,1, Mid , Mid +1, n)
End Function
Does it work?
• Have to be careful about stopping
• There are always a lot of things going on
Sort(n) Sort(n/2)
Sort(n/2)
Merge
Sort(n/4)
Sort(n/4)
Merge
Sort(n/2)
Merge
Sort(n/8)
Sort(n/8)
Merge
Sort(n/4)
Merge
Sort(n/2)
Merge
Divide and conquer
• Use recursion– reduce solving for problem of size n to solving
two problems of size n/2 – then combine the solutions
• S(n) = 2 S(n/2) + M(n/2,n/2)
• Solving a sorting problem of size n requires solving 2 sorting problems of size n/2 and doing a merge of 2 sets of size n/2
Comparing running times
N Insertion (ms) SortMerge(ms)
100 1 0
200 2 0
1000 58 1
10,000 5841 11
100,000 626943 162
1,000,000 70626916 3421
Comparing running times
N Insertion (ms) SortMerge(ms)
100 1 0
200 2 0
1000 58 1
10,000 5841 11
100,000 626943 162
1,000,000 70626916 3421
Reducing 20 hours to 3 seconds
Searching
• Once a list is in alphabetical order, how do you find things in it?
• For example, is COS 111 on the list of courses that satisfy the (EC) Epistemology and Cognition requirement?
EC coursesPHI 201
PHI 204
PHI 301
PHI 304
PHI 312
PHI 321
PHI 333
PHI 338
PSY 255
PSY 306
PSY 307
PSY 316
AAS 391
ANT 201
COS 302
FRS 135
FRS 137
GER 306
HUM 365
LIN 213
LIN 302
LIN 306
LIN 315
PHI 200
Searching for COS 111
Compare to the middle
AAS 391
ANT 201
COS 302
FRS 135
FRS 137
GER 306
HUM 365
LIN 213
LIN 302
LIN 306
LIN 315
PHI 200
PHI 201
PHI 204
PHI 301
PHI 304
PHI 312
PHI 321
PHI 333
PHI 338
PSY 255
PSY 306
PSY 307
PSY 316 COS 111
Searching
Compare to the middle
If smaller search first half
If larger search second half
AAS 391
ANT 201
COS 302
FRS 135
FRS 137
GER 306
HUM 365
LIN 213
LIN 302
LIN 306
LIN 315
PHI 200
PHI 201
PHI 204
PHI 301
PHI 304
PHI 312
PHI 321
PHI 333
PHI 338
PSY 255
PSY 306
PSY 307
PSY 316 COS 111
Repeat
Compare to the middle
If smaller search first half
If larger search second half
AAS 391
ANT 201
COS 302
FRS 135
FRS 137
GER 306
HUM 365
LIN 213
LIN 302
LIN 306
LIN 315
PHI 200 COS 111
Building indicesPHI 201
PHI 204
PHI 301
PHI 304
PHI 312
PHI 321
PHI 333
PHI 338
PSY 255
PSY 306
PSY 307
PSY 316
AAS 391
ANT 201
COS 302
FRS 135
FRS 137
GER 306
HUM 365
LIN 213
LIN 302
LIN 306
LIN 315
PHI 200
AAS
ANT
COS
FRS
GER
HUM
LIN
PHI
PSY
Search indices then dataPHI 201
PHI 204
PHI 301
PHI 304
PHI 312
PHI 321
PHI 333
PHI 338
PSY 255
PSY 306
PSY 307
PSY 316
AAS 391
ANT 201
COS 302
FRS 135
FRS 137
GER 306
HUM 365
LIN 213
LIN 302
LIN 306
LIN 315
PHI 200
AAS
ANT
COS
FRS
GER
HUM
LIN
PHI
PSY
COS 111
How do we describe algorithms?
• Pseudocode– Combines English, Visual Basic constructs
– Works with various types of primitives• Could be + - / *
• Could be more complex things
– Describes how data is organized
– Describes operations on the data
– Is meant to be higher level than programming
Searching with indices (pseudocode)
• Build the indices– Do this by going through the list and
determining where department names change– Store the results in an array called Indices
• Search the indices– Do a binary search on the array Indices
• Do this by comparing to the middle element– Then use binary search to compare to the upper half– Or use binary search to compare to the lower half
Building a web search engine
• Crawl the web• Organize the results for fast query processing• Process queries
Crawl the web
• Every month use TCP/IP to go to all reachable web pages– 1.5B pages, 10 Kbytes/page, so 15 terabytes
• Can compress an average page to 3Kbytes
• Numeracy– Crawl 1.5B pages in 14 days so
• Crawl 100M pages per day• Crawl 4M pages per hour• Crawl 1,000 pages per second
Organize the results
• Put into alphabetical order• Build indices• Make multiple copies so that searching can
proceed in parallel.
• When you update, you rebuild the indices
Process queries
• Look up indices
• Look up words/phrases– Advertiser can buy a word or phrase
• This search gives you internal addresses of web pages– Look them up to build results page
Page Rank
• The web is a collection of links– A document’s importance is determined by
• How many pages point to it
• How important those pages are
– This is its PageRank
• Used for determining– How often to crawl a page– How to order pages presented.
Remaining subtask
• Matching strings– Is this the word computer?
• Comparing strings– Did the word computer occur before or after?
How does string matching work?
• State machines – Move along states as long as you keep
matching– Back off when you miss a match
State machine – looking for abcdRead a Read b Read c
Read d
Other
Other
Other
Sa Sb ScSd
OK
What happens if input is abccadbacabcd?
Sa Sb Sc Sd Sa Sb Sa Sa Sb Sa Sb Sc Sd OK
State machine – looking for abcdRead a Read b Read c
Read d
Other
Other
Other
Sa Sb ScSd
OK
What happens if input is abcabcd?
Sa Sb Sc Sd Sa Sa Sa Sa
State machine – looking for abcd
Read a Read b Read c
Read d
Other
OtherOther
Sa Sb ScSd
OK
Read a Read a
Read a
Larger search challenges
• Allow strings to have don’t cares– Starts with a and ends with e– Has come number of copies of the substring ab
• Finding strings close to your string– For spelling corection
Algorithms -- summary
• Methods of modeling processes
• Understand at a high level
• Make sure your reasoning is correct
• Worry about efficiency in situations where that matters
• Write as pseudocode
top related