avl trees amanuel lemma cs252 algoithms dec. 14, 2000

AVL Trees

Amanuel Lemma

CS252 Algoithms

Dec. 14, 2000

Why do operations on ordinary binary search trees

take time as much as O(n)?

– Each operation takes time proportional to the height

of the tree O(h) which is n in the worst case( for

skewed (unbalanced trees).

So the idea here is to improve running times by

keeping the height with in a certain bound.

AVL trees are ‘balanced’ binary search trees with the

following height balance property :

– For every internal node v of an AVL tree, the heights

of the children of v can differ by at most one.

The Basics

Central Theme

Claim : The height balance property gives us the

following result :

• The height of an AVL tree storing n keys is

at most 1.44 (log n).

Justification: Instead of finding the max height h

directly, we can first determine the minimum

number of internal nodes that an AVL tree of

height h can have and then deduce the

height.

Min-number of Nodes Let min # of internal nodes at height h be n(h) , then

n(h) = n(h-1) + n(h-2) + 1 where n(1) = 1 and n(2) = 2

This is a fibonacci progression which is exponential in h.

n(h) + 1 5 [(1+ 5) / 2 ] h+3 h 1.44 (log n(h))

Implementation

In addition to implementing the basic binary search tree

we need to store additional information at each internal

node of the AVL tree : balance factor of the node,

bal (v) = height (v.rightChild) – height(v.leftChild)

If we are to maintain an AVL tree, the balance factor of each

internal node must either be –1, 0, or 1. If this rule is

violated, we need to restructure the tree so as to maintain

the height. Obviously, operations such as insert() and

remove() will affect the balance factor of nodes.

Restructuring is done through rotation routines.

Insertion First part involves the same method of insertion into an

ordinary BST. Secondly, we have to update the

balance factor of the ancestors of the inserted node.

Third we restructure the tree through rotations.

Fortunately there is only one case that causes a

problem : If we insert in an already one higher

subtree then the balance factor will become 2 (right

high) or -2(left high).

This is fixed by performing one rotation of the nodes

which restores balance not only locally but also

globally. So in the case of insertion one rotation

suffices.

Example of an Insert Example : if we perform insert(32) in the subtree on the

left, the subtree rooted at (40) becomes unbalance (bal= -2) and a rotation based on inorder traversal (32->35->40) fixes the problem

Analysis of Insertion Steps in insertion : find where to insert + insert + one rotation Since we have to go down from the root to some

external node when finding the place to insert the key find() takes O(log n). (height of AVL = 1.44 log n)

Both insert and rotation involve a constant number of pointer assignments—O(1).

Therefore : O(log n) + O(1) + O(1) Insert () is O(log n)

Rotations Rotations involves re-assigning a constant number of

pointers depending on the inorder traversal of the subtree so as to restore the height balance property. So each rotation takes constant time O(1).

There are two types :

Single Rotations : reassign at most 6 pointers

Double Rotations: reassign at most 10 pointers

(assuming the tree is doubly linked) Rotations have 2 properties: (1) the in order visit of the

elements remains the same after the rotation as before and (2) the overall height of the tree is the same after the rotation

Algorithm for Rotation This algorithm combines single and double

rotation into one routine. Another possibility is to have separate routines for both. (e.g. LEDA)

Node x , y = x.parent , z = y.parentAlgorithm restructure(x) {1) Let (a,b,c) be a in-order listing of the nodes x, y, and z

and (T0,T1,T2,T3) be in-order listing of the of the four subtrees of x,y, z.

2) Replace the subtrees rooted at z with sub tree of b.3) Let a be the left child of b and let T0 and T1 be the left

and right subtrees of a, respectively.4) Let c be the right child of b and let T2and T3 be the left

and right subtrees of c respectively.

Examples of Single and Double

Remove Remove operations also involve the same

rotation techniques but are more complicated than insert for the following reasons:– Can remove from anywhere in tree, possibly

creating holes in the tree. (As in ordinary BST, we deal with this by replacing it with the key that comes before it in an inorder traversal of the tree).

– From the deleted node to the root of the tree there can be at most one unbalanced node.

– Local restructuring of that node doesn’t have global effect. So we have to go up to all the ancestors (up to the root) of the node, updating balance factor and doing rotations when necessary.

Example of Remove remove( 50) is shown below. In this case

remove is done from the root so does not require rotations

Example of Remove Here remove (70) requires a single rotation

Remove involves the following operations and worst case times:Find + replace with + restructure all the way

in-order previous up to the root

Analysis of Remove

As before all operations done are proportional to the height of the tree which is 1.44log n for an AVL tree.

So Find traverses all the way to an external node from the root in worst case O(log n)

Replace can be shown to take 1/2(1.44log n) on average but O(log n) in the worst case

There could be at most O(log n) ancestors of a node and so the heights and balance factors of at most O(log n) nodes are affected. Since rotations (single restructure operations) can be done in O(1) time, the height balance property can be maintained in O(log n) time.

Remove(total) = O(log n) + O(log n) + O(log n)

So remove takes O(log n) time

Analysis of Remove cont.

LEDA implementation

AVL trees in LEDA are implemented as an instance of the dictionary ADT. They are leaf oriented (data is stored only in the leaves) and doubly linked. They can also be initialized as an instance of binary trees.

At each node the balance factor is stored. AVL implementation also includes functions

rotation(u,w,dir) and double_rotation(u,w,x,dir)

Snapshots from LEDA

Experimenting with LEDA

SizeTree Type

Insert

time

Look-up

time

Delete

Time

Total

Time

104AVL 0.0500 0.0200 0.0400 0.1100

BST 0.0400 0.0200 0.0300 0.0900

105AVL 0.8900 0.6400 0.9500 2.4800

BST 1.0400 0.7900 0.9500 2.7800

106AVL 21.6100 17.7100 20.850 60.170

BST 17.5800 16.1000 16.770 50.450

unsorted entries (chosen at random)

All values are CPU time in seconds.

Experimenting continuedsorted entries (1,2,3…10n)

SizeTree Type

Insert

time

Look-up

time

Delete

Time

Total

Time

104AVL 0.0500 0.0200 0.0300 0.1000

BST 3.9900 3.9700 0.0200 7.9800

105AVL 0.6100 0.3500 0.4500 1.4100

BST - - - -

106AVL 5.7300 4.2400 4.9800 14.950

BST - - - -

‘-’ indicates the program stalled

Conclusion As can be seen from the tables for the case

where the data is chosen at random, ordinary BST performs somewhat better because of the overhead of rotation operations in AVL trees. When the entries are already sorted, which happens often enough in the real world, the BST reduces to a linear structure and thus takes time O(n) per operation while AVL maintain O(log n) time as can be inferred from the results.

So dictionary implementations depending on the ‘expected’ data should use AVL trees.

References Texts:

Goodrich, Michael and Tamassia, Roberto : Data Structures and Algorithms in Java.Kruse, Robert L. : Data Structures & Program Design.

Applets:http://www.seanet.com/users/arsen/avltree.htmlhttp://chaos.iu.hioslo.no/~kjenslj/java/applets/latest /applet.html

Others:http://www.cs.bgu.ac.il/~cgproj/LEDA/dictionary.html

http://chaos.iu.hioslo.no/~kjenslj/java/applets/latest

avl trees amanuel lemma cs252 algoithms dec. 14, 2000

Documents