avl trees amanuel lemma cs252 algoithms dec. 14, 2000
TRANSCRIPT
AVL Trees
Amanuel Lemma
CS252 Algoithms
Dec. 14, 2000
Why do operations on ordinary binary search trees
take time as much as O(n)?
– Each operation takes time proportional to the height
of the tree O(h) which is n in the worst case( for
skewed (unbalanced trees).
So the idea here is to improve running times by
keeping the height with in a certain bound.
AVL trees are ‘balanced’ binary search trees with the
following height balance property :
– For every internal node v of an AVL tree, the heights
of the children of v can differ by at most one.
The Basics
Central Theme
Claim : The height balance property gives us the
following result :
• The height of an AVL tree storing n keys is
at most 1.44 (log n).
Justification: Instead of finding the max height h
directly, we can first determine the minimum
number of internal nodes that an AVL tree of
height h can have and then deduce the
height.
Min-number of Nodes Let min # of internal nodes at height h be n(h) , then
n(h) = n(h-1) + n(h-2) + 1 where n(1) = 1 and n(2) = 2
This is a fibonacci progression which is exponential in h.
n(h) + 1 5 [(1+ 5) / 2 ] h+3 h 1.44 (log n(h))
Implementation
In addition to implementing the basic binary search tree
we need to store additional information at each internal
node of the AVL tree : balance factor of the node,
bal (v) = height (v.rightChild) – height(v.leftChild)
If we are to maintain an AVL tree, the balance factor of each
internal node must either be –1, 0, or 1. If this rule is
violated, we need to restructure the tree so as to maintain
the height. Obviously, operations such as insert() and
remove() will affect the balance factor of nodes.
Restructuring is done through rotation routines.
Insertion First part involves the same method of insertion into an
ordinary BST. Secondly, we have to update the
balance factor of the ancestors of the inserted node.
Third we restructure the tree through rotations.
Fortunately there is only one case that causes a
problem : If we insert in an already one higher
subtree then the balance factor will become 2 (right
high) or -2(left high).
This is fixed by performing one rotation of the nodes
which restores balance not only locally but also
globally. So in the case of insertion one rotation
suffices.
Example of an Insert Example : if we perform insert(32) in the subtree on the
left, the subtree rooted at (40) becomes unbalance (bal= -2) and a rotation based on inorder traversal (32->35->40) fixes the problem
Analysis of Insertion Steps in insertion : find where to insert + insert + one rotation Since we have to go down from the root to some
external node when finding the place to insert the key find() takes O(log n). (height of AVL = 1.44 log n)
Both insert and rotation involve a constant number of pointer assignments—O(1).
Therefore : O(log n) + O(1) + O(1) Insert () is O(log n)
Rotations Rotations involves re-assigning a constant number of
pointers depending on the inorder traversal of the subtree so as to restore the height balance property. So each rotation takes constant time O(1).
There are two types :
Single Rotations : reassign at most 6 pointers
Double Rotations: reassign at most 10 pointers
(assuming the tree is doubly linked) Rotations have 2 properties: (1) the in order visit of the
elements remains the same after the rotation as before and (2) the overall height of the tree is the same after the rotation
Algorithm for Rotation This algorithm combines single and double
rotation into one routine. Another possibility is to have separate routines for both. (e.g. LEDA)
Node x , y = x.parent , z = y.parentAlgorithm restructure(x) {1) Let (a,b,c) be a in-order listing of the nodes x, y, and z
and (T0,T1,T2,T3) be in-order listing of the of the four subtrees of x,y, z.
2) Replace the subtrees rooted at z with sub tree of b.3) Let a be the left child of b and let T0 and T1 be the left
and right subtrees of a, respectively.4) Let c be the right child of b and let T2and T3 be the left
and right subtrees of c respectively.
Examples of Single and Double
Remove Remove operations also involve the same
rotation techniques but are more complicated than insert for the following reasons:– Can remove from anywhere in tree, possibly
creating holes in the tree. (As in ordinary BST, we deal with this by replacing it with the key that comes before it in an inorder traversal of the tree).
– From the deleted node to the root of the tree there can be at most one unbalanced node.
– Local restructuring of that node doesn’t have global effect. So we have to go up to all the ancestors (up to the root) of the node, updating balance factor and doing rotations when necessary.
Example of Remove remove( 50) is shown below. In this case
remove is done from the root so does not require rotations
Example of Remove Here remove (70) requires a single rotation
Remove involves the following operations and worst case times:Find + replace with + restructure all the way
in-order previous up to the root
Analysis of Remove
As before all operations done are proportional to the height of the tree which is 1.44log n for an AVL tree.
So Find traverses all the way to an external node from the root in worst case O(log n)
Replace can be shown to take 1/2(1.44log n) on average but O(log n) in the worst case
There could be at most O(log n) ancestors of a node and so the heights and balance factors of at most O(log n) nodes are affected. Since rotations (single restructure operations) can be done in O(1) time, the height balance property can be maintained in O(log n) time.
Remove(total) = O(log n) + O(log n) + O(log n)
So remove takes O(log n) time
Analysis of Remove cont.
LEDA implementation
AVL trees in LEDA are implemented as an instance of the dictionary ADT. They are leaf oriented (data is stored only in the leaves) and doubly linked. They can also be initialized as an instance of binary trees.
At each node the balance factor is stored. AVL implementation also includes functions
rotation(u,w,dir) and double_rotation(u,w,x,dir)
Snapshots from LEDA
Experimenting with LEDA
SizeTree Type
Insert
time
Look-up
time
Delete
Time
Total
Time
104AVL 0.0500 0.0200 0.0400 0.1100
BST 0.0400 0.0200 0.0300 0.0900
105AVL 0.8900 0.6400 0.9500 2.4800
BST 1.0400 0.7900 0.9500 2.7800
106AVL 21.6100 17.7100 20.850 60.170
BST 17.5800 16.1000 16.770 50.450
unsorted entries (chosen at random)
All values are CPU time in seconds.
Experimenting continuedsorted entries (1,2,3…10n)
SizeTree Type
Insert
time
Look-up
time
Delete
Time
Total
Time
104AVL 0.0500 0.0200 0.0300 0.1000
BST 3.9900 3.9700 0.0200 7.9800
105AVL 0.6100 0.3500 0.4500 1.4100
BST - - - -
106AVL 5.7300 4.2400 4.9800 14.950
BST - - - -
‘-’ indicates the program stalled
Conclusion As can be seen from the tables for the case
where the data is chosen at random, ordinary BST performs somewhat better because of the overhead of rotation operations in AVL trees. When the entries are already sorted, which happens often enough in the real world, the BST reduces to a linear structure and thus takes time O(n) per operation while AVL maintain O(log n) time as can be inferred from the results.
So dictionary implementations depending on the ‘expected’ data should use AVL trees.
References Texts:
Goodrich, Michael and Tamassia, Roberto : Data Structures and Algorithms in Java.Kruse, Robert L. : Data Structures & Program Design.
Applets:http://www.seanet.com/users/arsen/avltree.htmlhttp://chaos.iu.hioslo.no/~kjenslj/java/applets/latest /applet.html
Others:http://www.cs.bgu.ac.il/~cgproj/LEDA/dictionary.html