trees (and sets, and hashing) in java

24
DSA/BP/2006 Trees 2 1 Trees (and sets, and hashing) in Java Brian Palmer MACS

Upload: bpfanpage

Post on 26-May-2015

164 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 1

Trees (and sets, and hashing) in Java

Brian Palmer

MACS

Page 2: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 2

No duplication

The structures we have looked at so far (stack, queue, and list) have all allowed duplication

For example, a stack of opening brackets that only allowed one instance of each kind of bracket would not have been much use in the stack coursework

On the other hand, we sometimes want a structure not to allow duplication: all elements must be unique

Page 3: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 3

Reasons for uniqueness

There are various theoretical reasons, not to do with computing, for the usefulness of collections of unique elements

In practical terms, it's a matter of searchingIn many applications, we want to search for

thingsSearching a list returns, by default, only the

first matching element. We can't be sure that's the element we want.

Page 4: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 4

Sets

In Java, a set is (almost inevitably) an interfaceA set is a collection that contains no duplicate

elements. More formally, sets contain no pair of elements e1

and e2 such that e1.equals(e2), and at most one null element.

As implied by its name, this interface models the mathematical set abstraction.

(These three paragraphs are copied straight from the Java documentation)

Page 5: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 5

The Set <E> interface

This contains methods such as add, clear, contains, equals, isEmpty, iterator, remove, and size

If you attempt to add an element e1 to a set that already contains e1, no change will take place

In a set, the entire element provides the uniqueness

This is fine when we're dealing with simple elements, but it can make reference awkward

Page 6: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 6

Maps

Map is yet another Java interfaceA map is an object that maps keys to values. A

map cannot contain duplicate keys; each key can map to at most one value.

For example, in a collection of persons, each might have a unique id, which would be used as the key, with the other relevant data making up the value to be stored

Likewise, a collection of policies (for insurance, say) would use the policy number as the key, and the rest of the data as the value

Page 7: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 7

The Map<K,V> interface

The interface has such methods as these:clear()

containsKey(Object key)

containsValue(Object value)

get(Object key)

put(K key, V value)

remove(Object key)

It offers no iterator, but it does offer an entrySet(), which is a kind of Set and does offer an iterator

Page 8: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 8

Speed of access

We are discussing collections where the most common activity will be searching; where searching will be more common than, say, adding and removing

We have looked, in theory, at binary search trees, which is O(log n) is both the worst and the expected cases

We should take a quick look at hashing, whose worst case is dreadful, O(n), but whose expected case is very fast, O(1)

Page 9: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 9

Hash tables

We can regard a hash table as in effect an arrayThe parts of the array are known, for some reason,

as bucketsAs in maps, things consist of a key and a valueThe key is hashed, and the resulting hashcode is

used to determine which bucket the value should be stored in

To retrieve a value, the key is hashed again and the relevant bucket is found

This idea is used so often, that every object in Java has a hashCode() method

Page 10: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 10

The problem with hashing

If all goes well, we straight to the right bucket

However, different keys might produce hashcodes that lead to the same bucket

In that case, there's some work to do, which will be discussed in DSA2

In Java, the array is never more than three parts full: there is a load factor of 0.75

This gives a workable balance between wasting space and wasting time on rehash operations

Page 11: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 11

Red-black trees

A red-black tree is kind of binary search tree

In a red-black tree, every element is regarded as having an associated colour, either red or black

The root is black

Red rule: If an element is red, none of its children can be red: they must be black

Path rule: the number of black elements must be the same in all paths from the root to elements with no children or with one child

Page 12: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 12

How the rules work out

If a red element has any children, it must have two (black)

Almost all non-leaves have two childrenRed-black trees are therefore good and bushyFor n elements, the height is log n, even in the

worst caseThis makes it good for searchingKeeping the rules can be fiddly when adding or

removing an element, but can still be done in log n time

Page 13: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 13

A red-black tree53

37

11 41

67

61 73

5923

Page 14: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 14

Another version of the red-black rules

1. Every node is either red or black

2. The root is black

3. If a node is red, its children must be black

4. Every path from a node to a null link must contain the same number of black nodes

• This set of rules is drawn from Data Structures... by Weiss, who shows a red-black tree after the insertion sequence 10, 85, 15, 70, 20, 60, 30, 50, 65, 80, 90, 40, 5, 55

Page 15: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 15

Yet another version

Root property: the root is blackExternal property: every external node is black (null nodes

are black)Internal property: the children of a red node are blackDepth property: all external nodes must have the same

black depth, defined as the number of black ancestors minus one (a node is an ancestor of itself)

(These from Data Structures etc by Goodrich and Tamassia, who use the sequence 4, 7, 12, 15, 3, 5, 14, 18, 16, 17)

Page 16: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 16

Putting things into a red-black tree

A new node finds its place in ordinary binary search tree fashion

We often have then to shift things around in the tree to fit the rules

All new nodes are red

If its parent is also red, we have a problem

This is sometimes called a double-red problem

(Removing a node is even more awkward, and we won't be looking at it in detail.)

Page 17: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 17

When the uncle is black

• When the sibling of the parent is black (or null) we have these possibilities:

10

20

30

30

20

10

30

10

20

10

30

20

node

parent

grandparent

uncle

Page 18: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 18

Rotation

• With the straight line groups, we simply rotate and switch the colours of the old and new apex (parent and grandparent). For example

10

20

30 10

20

30

Page 19: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 19

Double rotation

• With the bent-line groups, we need first to get them into a straight line. For example,

30

10

20

30

20

10

Page 20: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 20

When the uncle is red

• We do a recolouring: parent and uncle switch from red to black, grandparent switches from black to red (unless it's the root)

30

20

10

40

30

20

10

40

Page 21: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 21

Animation

An excellent animation of red-black trees can be found athttp://www.ececs.uc.edu/~franco/C321/html/RedBlack/redblack.htmlThe rules they list are:1. Every node has a value. 2. The value of any node is greater than the value of its left

child and less than the value of its right child. 3. Every node is colored either red or black. 4. Every red node that is not a leaf has only black children. 5. Every path from the root to a leaf contains the same

number of black nodes. 6. The root node is black.

Page 22: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 22

SortedMap

This is a Java interface that extends the Map interface

It guarantees that elements are in ascending key order

The keys must implement the Comparable interface

Strings and Integers implement the Comparable interface and make natural keys

Page 23: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 23

Tree collections

The collections framework in Java has two explicit tree classes: TreeSet and TreeMap

In a TreeSet, the element is the basis of uniqueness

In a TreeMap, the keys are unique (though the values stored under those keys might very well be the same or similar)

Page 24: Trees (and sets, and hashing) in Java

DSA/BP/2006 Trees 2 24

TreeMap

A TreeMap is an implementation of the SortedMap interface

It is based on a red-black treeLook-up works as in a phone book: you

have a key, and use it to find a valueTreeMap does not supply an iterator, but it

does supply what is called a set view of the mappings (entrySet())

This is a set, and a set provides an iterator