cs 240: data structures tuesday, july 24 th searching, hashing graphs

33
CS 240: Data CS 240: Data Structures Structures Tuesday, July 24 Tuesday, July 24 th th Searching, Hashing Searching, Hashing Graphs Graphs

Upload: maud-french

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

CS 240: Data StructuresCS 240: Data Structures

Tuesday, July 24Tuesday, July 24thth

Searching, HashingSearching, Hashing

GraphsGraphs

AssignmentsAssignments

Self-Revisions for Lab 5 and Lab 6 are Self-Revisions for Lab 5 and Lab 6 are due next Tuesday (the writeup is pivotal in due next Tuesday (the writeup is pivotal in terms of a grade) – July 31terms of a grade) – July 31stst at 4pm. at 4pm.

Project 1 revisions need to be submitted Project 1 revisions need to be submitted by next Thursday, August 2by next Thursday, August 2ndnd at 4pm. at 4pm.New defenses can be scheduledNew defenses can be scheduled

The TestThe Test

The code for Lab 7 will be released early (before The code for Lab 7 will be released early (before the lab is due). the lab is due). Make sure you understand the questions.Make sure you understand the questions. Your lab 7 grade will be based on your answer to the Your lab 7 grade will be based on your answer to the

Lab 7 questions on the test.Lab 7 questions on the test. Know the first three sorts and either quicksort or Know the first three sorts and either quicksort or

mergesort. mergesort. You need to understand how to do a radix sort.You need to understand how to do a radix sort.

Know two of the lists from the presentation today Know two of the lists from the presentation today – one should be what you actually did today.– one should be what you actually did today.

Puppets.Puppets.

SearchesSearches

Linear SearchLinear SearchBinary SearchBinary Search

Not too good with linked list…Not too good with linked list…This is why we add more links (trees!)This is why we add more links (trees!)

Hashing, an insertion/searching comboHashing, an insertion/searching comboSee Chapter 9.3, Hash Tables (p. 530)See Chapter 9.3, Hash Tables (p. 530)

HashingHashing

What is hashing?What is hashing?Corned beef hash:Corned beef hash:

Hash(Corn(Beef)):---------------------

HashingHashing

Hash Browns:Hash Browns:

Hash( ):-----------

HashingHashing

So, what does this So, what does this have to do with have to do with anything?anything?

Well….

Maybe we should look at real hash browns…

Much better!

HashingHashing

The point is:The point is:

We have no idea what is in corned beef hash.

We have no idea what is in hash browns.

However,

Hash(Corn(Beef)):---------------------

No matter what Beef is. Beef is generic!

The same with hash browns too…

HashingHashing

Hashing lets us represent some “data” with Hashing lets us represent some “data” with some smaller “data”.some smaller “data”.

The representation is not perfect!Look at the corned beef hash!

But it is consistent. That makes it useful!

HashingHashing

Ok, back to seriousness for a moment:Ok, back to seriousness for a moment:

Remember the algorithmic complexity of our Remember the algorithmic complexity of our various searches?various searches?Linear Search =Linear Search =Binary Search =Binary Search =Balanced Binary Search =Balanced Binary Search =Why do we care if it is balanced?Why do we care if it is balanced?

Because this tree is as bad as a linear search!

We’ll leave fixing this for another time.

HashingHashing

Other than making corned beef, there are Other than making corned beef, there are other, more useful, hashing schemes:other, more useful, hashing schemes:

Consider this:

Instead of putting all the records of computers, Binghamton University decides to keep only paper records of grade due to malicious CS students changing their grades each morning.

Now, you need some money. You get this cushy work-study job pulling up folders to answer grade requests. Sound good, right?

HashingHashing

So, if I ask you for the grades for “El So, if I ask you for the grades for “El Zilcho” (first name “El”, last name “Zilcho”) Zilcho” (first name “El”, last name “Zilcho”) how do you find them?how do you find them?

Linear search right? We start from Alan Aardvark!

You are a born bureaucrat!

You start by going to “Z”. But, how did you know to do that (if nobody suggested this, stop lesson, go home and cry)?

HashingHashing

Hashing by first letter is a common hash.Hashing by first letter is a common hash.

With a small enough list we can search pretty quickly!

//firstletterhash represents h(x)//tohash represents xint firstletterhash(string tohash){

return(int(tohash.at(0))%26);}

HashingHashing

The first letter implementation requires The first letter implementation requires that we have 26 entries.that we have 26 entries.

If we only have a few entries we are If we only have a few entries we are wasting space!wasting space!

A tradeoff decision must be made!A tradeoff decision must be made!

What are the tradeoffs?

HashingHashing

Ok, we are done. Ok, we are done.

•You know all there is to know about You know all there is to know about hashing. hashing.

•Cool. Cool. •A winner is you.A winner is you.

Alright, quick quiz. Let us make a first-letter hash table.

Add the following: Apple

Alabama

Uh oh. Now what?

HashingHashing

We have a collision!We have a collision!

One solution is linear probing:

Finding stuff isn’t too much harder.What about deleting stuff?

HashingHashing

Some options:Some options:

Larger tableLarger tableDifferent collision schemeDifferent collision schemeBetter hash function (MD5?)Better hash function (MD5?)

Protip: Hash tables should be about 1.5-2 times as large as the number of items to store to keep collisions low.

But… I really like linear probingBut… I really like linear probing

The point is: “too bad”You have more to learn!

There is always more. Look how long I’ve been here…. No, don’t. It makes me feel old.

But… I really like linear probingBut… I really like linear probing

Linear probing can cluster data!

You can probe quadratically:

i – 1, i + 1, i – 4, i + 4, i – 9, i + 9, <i – n2, i + n2>

Better… but…

How about a secondary hash? These can be really useful!

Casting, Mapping, Folding, Shifting

More hashes? Are you sold?More hashes? Are you sold?

Well, some of you Well, some of you may have thought of may have thought of this:this:

Isn’t this similar to the example we started with?

HashesHashes

How long should the hash function take?How long should the hash function take?Moreover, why does it matter?Moreover, why does it matter?

No matter what the data is (as long as it is the correct type) the hash function needs to be able to evaluate it!

HashesHashes

Some theory:Some theory: If If Load FactorLoad Factor = = Num Elements in TableNum Elements in Table / /

Table SizeTable Size

When we don’t use a linked list (we use When we don’t use a linked list (we use probing) our probing) our load factorload factor should be < 0.5 should be < 0.5

But, if we do use a linked list then we want But, if we do use a linked list then we want to to load factorload factor to be closer to 1. to be closer to 1.

Why?

Open addressing/Closed Hashing: Use Open addressing/Closed Hashing: Use ProbingProbing

Closed addressing/Open Hashing: Use Closed addressing/Open Hashing: Use chaining (linked list)chaining (linked list)

Uh oh.Uh oh.

New topic.New topic.

You will miss the hash.You will miss the hash.

Maybe not.Maybe not.

Yes, you can sort the chaining hash table.Yes, you can sort the chaining hash table.

GraphsGraphs

So far, all of our Nodes only point to one So far, all of our Nodes only point to one other nodeother nodeThis changed today with the linked list This changed today with the linked list

presentations:presentations:Next and previous pointerNext and previous pointerMultiple pointers based on pieces of dataMultiple pointers based on pieces of data

But, they can point to multiple nodes!But, they can point to multiple nodes!

TreesTrees

First, we generally don’t count a previous First, we generally don’t count a previous pointer as a pointer.pointer as a pointer.

Our linked lists point to 1 other node (not Our linked lists point to 1 other node (not counting special lists) hence a unary list.counting special lists) hence a unary list.

However, we can point to two different However, we can point to two different nodes. A path “next” and “othernext”. nodes. A path “next” and “othernext”. For a tree: “left” and “right”For a tree: “left” and “right”

Graphs…Graphs…

We will talk more about trees next week.We will talk more about trees next week.A graph has an unlimited number of A graph has an unlimited number of

pointers that can pointer anwhere.pointers that can pointer anwhere.

Start

RepresentationRepresentation

Now, our Node needs new data:Now, our Node needs new data: A list of Node* instead of just “next”A list of Node* instead of just “next”

Some way to select a “next”Some way to select a “next”

Graphs will often take distance between Nodes Graphs will often take distance between Nodes into account (so far, our distances have been into account (so far, our distances have been irrelevant)irrelevant) Hence each Node* is associated with a distanceHence each Node* is associated with a distance

We can store this as a “pair<Node*,int>”We can store this as a “pair<Node*,int>” Requires #include<algorithm>Requires #include<algorithm>

Linked ListLinked List

A Linked List is a subset of Graph.A Linked List is a subset of Graph. It has nodes with only 1 Node* (list size == It has nodes with only 1 Node* (list size ==

1)1)And the distance between each Node is And the distance between each Node is

the same (no value is needed, but we the same (no value is needed, but we might as well say 0).might as well say 0).