trees (and sets, and hashing) in java
TRANSCRIPT
DSA/BP/2006 Trees 2 1
Trees (and sets, and hashing) in Java
Brian Palmer
MACS
DSA/BP/2006 Trees 2 2
No duplication
The structures we have looked at so far (stack, queue, and list) have all allowed duplication
For example, a stack of opening brackets that only allowed one instance of each kind of bracket would not have been much use in the stack coursework
On the other hand, we sometimes want a structure not to allow duplication: all elements must be unique
DSA/BP/2006 Trees 2 3
Reasons for uniqueness
There are various theoretical reasons, not to do with computing, for the usefulness of collections of unique elements
In practical terms, it's a matter of searchingIn many applications, we want to search for
thingsSearching a list returns, by default, only the
first matching element. We can't be sure that's the element we want.
DSA/BP/2006 Trees 2 4
Sets
In Java, a set is (almost inevitably) an interfaceA set is a collection that contains no duplicate
elements. More formally, sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at most one null element.
As implied by its name, this interface models the mathematical set abstraction.
(These three paragraphs are copied straight from the Java documentation)
DSA/BP/2006 Trees 2 5
The Set <E> interface
This contains methods such as add, clear, contains, equals, isEmpty, iterator, remove, and size
If you attempt to add an element e1 to a set that already contains e1, no change will take place
In a set, the entire element provides the uniqueness
This is fine when we're dealing with simple elements, but it can make reference awkward
DSA/BP/2006 Trees 2 6
Maps
Map is yet another Java interfaceA map is an object that maps keys to values. A
map cannot contain duplicate keys; each key can map to at most one value.
For example, in a collection of persons, each might have a unique id, which would be used as the key, with the other relevant data making up the value to be stored
Likewise, a collection of policies (for insurance, say) would use the policy number as the key, and the rest of the data as the value
DSA/BP/2006 Trees 2 7
The Map<K,V> interface
The interface has such methods as these:clear()
containsKey(Object key)
containsValue(Object value)
get(Object key)
put(K key, V value)
remove(Object key)
It offers no iterator, but it does offer an entrySet(), which is a kind of Set and does offer an iterator
DSA/BP/2006 Trees 2 8
Speed of access
We are discussing collections where the most common activity will be searching; where searching will be more common than, say, adding and removing
We have looked, in theory, at binary search trees, which is O(log n) is both the worst and the expected cases
We should take a quick look at hashing, whose worst case is dreadful, O(n), but whose expected case is very fast, O(1)
DSA/BP/2006 Trees 2 9
Hash tables
We can regard a hash table as in effect an arrayThe parts of the array are known, for some reason,
as bucketsAs in maps, things consist of a key and a valueThe key is hashed, and the resulting hashcode is
used to determine which bucket the value should be stored in
To retrieve a value, the key is hashed again and the relevant bucket is found
This idea is used so often, that every object in Java has a hashCode() method
DSA/BP/2006 Trees 2 10
The problem with hashing
If all goes well, we straight to the right bucket
However, different keys might produce hashcodes that lead to the same bucket
In that case, there's some work to do, which will be discussed in DSA2
In Java, the array is never more than three parts full: there is a load factor of 0.75
This gives a workable balance between wasting space and wasting time on rehash operations
DSA/BP/2006 Trees 2 11
Red-black trees
A red-black tree is kind of binary search tree
In a red-black tree, every element is regarded as having an associated colour, either red or black
The root is black
Red rule: If an element is red, none of its children can be red: they must be black
Path rule: the number of black elements must be the same in all paths from the root to elements with no children or with one child
DSA/BP/2006 Trees 2 12
How the rules work out
If a red element has any children, it must have two (black)
Almost all non-leaves have two childrenRed-black trees are therefore good and bushyFor n elements, the height is log n, even in the
worst caseThis makes it good for searchingKeeping the rules can be fiddly when adding or
removing an element, but can still be done in log n time
DSA/BP/2006 Trees 2 13
A red-black tree53
37
11 41
67
61 73
5923
DSA/BP/2006 Trees 2 14
Another version of the red-black rules
1. Every node is either red or black
2. The root is black
3. If a node is red, its children must be black
4. Every path from a node to a null link must contain the same number of black nodes
• This set of rules is drawn from Data Structures... by Weiss, who shows a red-black tree after the insertion sequence 10, 85, 15, 70, 20, 60, 30, 50, 65, 80, 90, 40, 5, 55
DSA/BP/2006 Trees 2 15
Yet another version
Root property: the root is blackExternal property: every external node is black (null nodes
are black)Internal property: the children of a red node are blackDepth property: all external nodes must have the same
black depth, defined as the number of black ancestors minus one (a node is an ancestor of itself)
(These from Data Structures etc by Goodrich and Tamassia, who use the sequence 4, 7, 12, 15, 3, 5, 14, 18, 16, 17)
DSA/BP/2006 Trees 2 16
Putting things into a red-black tree
A new node finds its place in ordinary binary search tree fashion
We often have then to shift things around in the tree to fit the rules
All new nodes are red
If its parent is also red, we have a problem
This is sometimes called a double-red problem
(Removing a node is even more awkward, and we won't be looking at it in detail.)
DSA/BP/2006 Trees 2 17
When the uncle is black
• When the sibling of the parent is black (or null) we have these possibilities:
10
20
30
30
20
10
30
10
20
10
30
20
node
parent
grandparent
uncle
DSA/BP/2006 Trees 2 18
Rotation
• With the straight line groups, we simply rotate and switch the colours of the old and new apex (parent and grandparent). For example
10
20
30 10
20
30
DSA/BP/2006 Trees 2 19
Double rotation
• With the bent-line groups, we need first to get them into a straight line. For example,
30
10
20
30
20
10
DSA/BP/2006 Trees 2 20
When the uncle is red
• We do a recolouring: parent and uncle switch from red to black, grandparent switches from black to red (unless it's the root)
30
20
10
40
30
20
10
40
DSA/BP/2006 Trees 2 21
Animation
An excellent animation of red-black trees can be found athttp://www.ececs.uc.edu/~franco/C321/html/RedBlack/redblack.htmlThe rules they list are:1. Every node has a value. 2. The value of any node is greater than the value of its left
child and less than the value of its right child. 3. Every node is colored either red or black. 4. Every red node that is not a leaf has only black children. 5. Every path from the root to a leaf contains the same
number of black nodes. 6. The root node is black.
DSA/BP/2006 Trees 2 22
SortedMap
This is a Java interface that extends the Map interface
It guarantees that elements are in ascending key order
The keys must implement the Comparable interface
Strings and Integers implement the Comparable interface and make natural keys
DSA/BP/2006 Trees 2 23
Tree collections
The collections framework in Java has two explicit tree classes: TreeSet and TreeMap
In a TreeSet, the element is the basis of uniqueness
In a TreeMap, the keys are unique (though the values stored under those keys might very well be the same or similar)
DSA/BP/2006 Trees 2 24
TreeMap
A TreeMap is an implementation of the SortedMap interface
It is based on a red-black treeLook-up works as in a phone book: you
have a key, and use it to find a valueTreeMap does not supply an iterator, but it
does supply what is called a set view of the mappings (entrySet())
This is a set, and a set provides an iterator