algorithms and data structures - cie.bgu.tum.demundani/algdata/part04.pdf · algorithms and data...

Technische Universität München

Algorithms and Data Structures

PD Dr. rer. nat. habil. Ralf‐Peter MundaniComputation in Engineering / BGU

Scientific Computing in Computer Science / INF

Summer Term 2018

PD Dr. Ralf‐Peter Mundani – Algorithms and Data Structures – Summer Term 2018 2


Part 4: Searching

source: pinterest.com



overview

fundamentals sequential search binary search binary tree search top‐down 2‐3‐4 trees red‐black trees AVL trees hashing



Fundamentals

idea of searching next to sorting, searching is also one of the most common operations

performed by computers therefore, items to be sought after are extended with a (unique) key,

forming so called key‐value pairs (e.g. TUMonline ID and student data)

let A a1, a2, , an be a sequence of n key‐value pairs in random increasing decreasing order

for some given search key x, searching looks for ai A where aikey x

in general, search algorithms are comparison‐based



overview




Sequential Search

how it works idea: sequentially processing all records each time a record is sought records: 1‐dimensional array of n key‐value pairs

naïve approach start with first (last) element of array A check for i‐th element if A(i)key x and continue otherwise in ascending (descending) order until corresponding element is found or end of array has been reached

properties: records in random order

key0value

key1value

key2value

key3value

keyn1value

keyn3value

keyn2value

...A:



Sequential Search

how it works (cont’d) implementation of sequential sort, where A denotes an array of n1

records (record A(n) is needed for termination) and x is the search key

procedure SEQ_SEARCH (A, n, x)A(n)key xA(n)value falsei1loopi i1

until A(i)key xreturn (A(i)value)

end

trillion‐dollar question: what is the complexity of this algorithm?



Sequential Search

complexity analysis obviously, the sequential search has linear complexity again: for reasons of simplicity we only count the comparisons

properties n1 comparisons are needed for an unsuccessful search about n2 comparisons are needed for a successful search (on the average); assuming that each record is equally likely to be sought, the average number of comparisons is

T(n) (1 2 3 n)n (n1)2

hence, sequential search belongs to class (n) independent from the order of the records



overview




Binary Search

how it works based on the DAC strategy idea: successively divide the records into two halves and further process

the one where the search key is determined to belong to

therefore records should be sorted in ascending (descending) order pick record in the middle and compare its key k to the search key x

stop in case k x (i.e. successful search) continue with the first (second) half in case x is smaller than k continue with the second (first) half in case x is larger than k

records with keys k key k

... ...

records with keys k



Binary Search

how it works (cont’d) implementation of binary search, where A denotes an array of n records

sorted in ascending order and x is the search key

procedure BINARY_SEARCH (A, n, x)left 0; right n1loopm (leftright) div 2if x A(m)key then rightm1 else leftm1 fi

until left right A(m)key xif A(m)key x then return (A(m)value) else return (false) fi

end

(n) to be beaten: what is the complexity of binary search?



Binary Search

complexity analysis for the body of the loop statement

m (leftright) div 2if x A(m)key then rightm1 else leftm1 fi

there is one comparison with constant costs cC and some computations assignments to be neglected

the exact number of iterations depends on search key x

loop## costs 1cC for the loop body

until left right A(m)key x

but as long as the second condition A(m)key x is not fulfilled, the number of records is halved in every step (further comparisons as well as additional comparison at the end to be neglected only constant factor)



Binary Search

complexity analysis (cont’d) hence, T(n) 1cC T(n2) with T(1) T(0) 0 assuming n 2k, i.e. k log2(n), repeated substitution leads to

T(n) 1cC T(n2)T(n2) 1cC T(n4)

T(2) 1cC T(1)T(1) 0

the total cost can be computed as follows

T(n) kcC log2(n)cC

hence, binary search belongs to the class (log2(n))

...



overview




Binary Tree Search

how it works procedures such as binary search suggest itself for tree structures hence, definition of a binary search tree

each node stores one record, i.e. key‐value pair records with smaller keys are stored in left subtree records with larger or equal keys are stored in right subtree

eb m

a d g r

ifc s

sample binary search tree



Binary Tree Search

how it works (cont’d) idea of a binary tree search (BTS) for given search key x

1) start at root node2) compare x to node’s key k

stop in case k x go to left subtree in case x k go to right subtree in case x k

3) continue with step 2)

this procedure stops either a record with key x has been found or, in case there’s no such record, current subtree becomes empty

BTS to be expected of (log2(n)) in case of ‘well‐balanced’ binary search trees (problem of (un)balanced trees to be discussed in next section)



Binary Tree Search

how it works (cont’d) implementation of BTS where T indicates a binary tree (root node) and x is

the search key

procedure BTS (T, x)casex Tkey : return (Tvalue)x Tkey : if Tleft NIL then BTS (Tleft, x) else return (false) fix Tkey : if Tright NIL then BTS (Tright, x) else return (false) fi

esacend

drawback of this implementation?



improved BTS recursive implementation of BTS requires too many comparisons hence, a small modification to the data structure can solve that problem

Binary Tree Search

modified binary search tree

eb m

a d g r

ifc s

HEAD

TAIL



Binary Tree Search

improved BTS (cont’d) like in case of linked lists there are two dummy nodes HEAD and TAIL hence, instead of linking to NIL for missing left right children, each leaf

node links to TAIL for the termination of unsuccessful searches

further definitions HEAD contains a key smaller (e.g. ) than all other key values TAIL contains the value false left link of HEAD is not used (pointing to TAIL) while its right link points to the tree’s root node

empty trees to be represented by directly pointing from HEAD to TAIL

assuming there’s a primitive bintree (consisting at least of a key key and links left and right to children nodes), the procedure BTS as well as some other procedures for inserting deleting nodes can easily be implemented



Binary Tree Search

improved BTS (cont’d) implementation of BTS for modified data structure where T indicates the

HEAD element and x the search key

procedure ADVANCED_BTS (T, x)tmp TTleftkey xloopif x tmpkey then tmp tmpleft else tmp tmpright fi

until tmpkey xreturn (tmpvalue)

end

due to the assignment Tleftkey x (i.e. TAILkey x) this procedure terminates in any case and can handle empty trees as well



Binary Tree Search

insertion of nodes to insert new nodes into a binary search tree, first an unsuccessful search

has to be performed afterward, the new node is inserted (as new child) where the unsuccessful

search terminated

eb m

a d g r

ifc s

o eb m

a d g r

ifc o s



Binary Tree Search

insertion of nodes (cont’d) implementation of INSERT where T indicates the HEAD element and key

and val the corresponding key‐value pair of the new node to be inserted

procedure INSERT (T, key, val)newkey key; newvalue val; newleft Tleft; newright Tlefttmp T

loopparent tmpif key tmpkey then tmp tmpleft else tmp tmpright fi

until tmp Tleft

if key parentkey then parentleft new else parentright new fiend

unsuccessful se

arch



Binary Tree Search

insertion of nodes (cont’d) problem: order of keys to be inserted influences the tree structure

hence, some sorting or balancing strategy is inevitable

a d g

m

b me

g

e

d

b

a

order: a, b, d, e, g, m order: e, m, b, a, g, d



Binary Tree Search

deletion of nodes as seen in chapter 2, the following cases have to be distinguished

a) deleting a leaf nodeb) deleting an internal node with one childc) deleting an internal node with two children

cases a) and b) are easy to process ( see also part 2: trees)

eb m

a d g r

ifc scase a) case b) case c)



deletion of nodes (cont’d) reminder: deleting an internal node with two children1

1) find the in‐order successor (IOS) of the node to be deleted2) copy IOS to the node to be deleted3) i. if IOS has no children, simply delete it

ii. if IOS has a right child ( ), set IOS’s parent ( ) to IOS’s right child

Binary Tree Search

in‐order successor IOSnode to be deleted

1 as suggested by T. HIBBARD in 1962, guarantying that the heights of the subject subtrees are changed by at most one



deletion of nodes (cont’d) example: deleting node e

even this looks quite simple, the implementation is cumbersome and therefore related to literature

Binary Tree Search

in‐order successornode to be deleted

eb

a d g r

ifc s

m

fb

a d g r

ic s

m



Binary Tree Search

complexity analysis properties

in worst case of a skewed tree BTS requires n comparisons on average, for a tree built from n random keys BTS requires about 2ln(n) comparisons

definitions number of comparisons for a successful search to some node conforms to the distance d of that node from the root

sum of these distances di for all nodes with i 1, ..., n is called internal path length (IPL) of a tree

hence, average number of comparisons to be computed as IPLn



Binary Tree Search

complexity analysis (cont’d) the internal path length for a random binary search tree with n nodes can

be expressed as recurrence

with IPL1 1

properties n1 takes into account that the root contributes 1 to the distance of all other (i.e. n1) nodes

for n keys, each one is equally likely to be chosen first (i.e. to become root), thus picking k‐th key leads to two (random) subtrees with k1 and nk nodes

to be resolved as already seen for quicksort

(1)



Binary Tree Search

complexity analysis (cont’d) as IPL0 IPL1 IPLn1 is the same as IPLn1 IPLn2 IPL0 the

recurrence in (1) can be rewritten as

next, multiplying both sides of (2) by n and subtracting the same formula for n1 leads to

finally, the recurrence can be simplified to

(2)

(3)



Binary Tree Search

complexity analysis (cont’d) dividing both sides of (3) by n(n1) gives the following recurrence

that telescopes (harmonic series), thus

which implies the stated result

as 2ln(n) 1.38log2(n), the average number of comparisons of BTS is only about 38% higher than the best case under the assumption of a perfectly balanced tree

T(n)



overview




Top‐Down 2‐3‐4 Trees

fundamentals problem of bad worst‐case performance (i.e. (n)) for BTS need for some technique to balance trees, thus worst case never occurs question: what means balanced?

this implies modifications to the location of nodes (so‐called split and rotate operations) that change a tree’s structure

in general, these operations are quite easy to describe and understand nevertheless, balanced‐tree algorithms are often difficult to implement

1

In case of binary trees, the number of nodes in both subtrees is quite the same and heights of the left and right subtree differ only by 1 (AVL).



top‐down 2‐3‐4 trees binary search trees are not flexible enough to eliminate the worst case hence, structures that allow to store more than one key are necessary

to be distinguished

2‐nodes: the ‘typical’ node with one key k and two links

3‐nodes: node with two keys k1 k2 and three links (one for all nodesx k1, one for all nodes k1 x k2, and one for all nodes k2 x)

4‐nodes: node with three keys k1 k2 k3 and four links (one for all nodes x k1, one for all nodes k1 x k2, one for all nodes k2 x k3, and one for all nodes k3 x)


k

2‐node

k1 k2

3‐node

k1 k2 k3

4‐node




top‐down 2‐3‐4 trees (cont’d) example of a top‐down 2‐3‐4 (TD234) tree

searching is quite the same as in case of BTS

when inserting a new key different possibilities can arise key to be inserted into 2‐node turns into 3‐node key to be inserted into 3‐node turns into 4‐node key to be inserted into 4‐node turns into ???

a c kg j r

f n




tree operators inserting a new key into a 2‐node

inserting a new key into a 3‐node

s

a c kg j r

f n

b

a c kg j

f n

r s

f n

ca b r skg j

a c kg j

f n

r s




tree operators (cont’d) inserting a new key into a 4‐node

approach1) split the respective 4‐node into two 2‐nodes2) pass the middle key up to the parent (and turn into 3‐ or 4‐node)3) store the new key at the corresponding 2‐node

questionsa) what if to split a 4‐node whose parent is also a 4‐nodeb) what if the root node is a 4‐node

h f n

ca b r skg j g hca b r sk

nf j




case a) splitting a 4‐node with 4‐node parent idea: one could split the parent also but this could keep going all the

way up the tree root hence, assure that any node visited on the way down is not a 4‐node

therefore

i. turn 2‐node linking to a 4‐node into a 3‐node linking to two 2‐nodes

ii. turn 3‐node linking to a 4‐node into a 4‐node linking to two 2‐nodes

case i. case ii.




case a) splitting a 4‐node with 4‐node parent (cont’d) properties

two 2‐nodes have the same number of links as one 4‐node, hence, nothing below the split node has to be changed transformations are purely local

on the way downward, these transformations ensure that at the bottom (leaf level) either a 2‐node or a 3‐node is reached, thus, the new key can be directly inserted

as split operations occur on the way from the top of the tree down to the bottom, the trees are called top‐down 2‐3‐4 trees

1 2 3 41 2 3 4




case b) splitting a 4‐node root as the root has no parent, no key can be passed upwards hence, the root is split into three 2‐nodes

only this split operation (!) makes the tree grow one level higher

important issue: even balancing was not explicitly discussed so far, the resulting trees are perfectly balanced

k1 k2 k3k2

k3k1




complexity analysis searches in a TD234 tree with n nodes visit at most log2(n1) nodes

1) the distance from the root to all leave nodes is the same2) transformations (except splitting the root node) have no effect on

the distance of any node from the root3) when splitting the root, the distance of all nodes is increased by one4) hence, if all nodes are 2‐nodes, the stated result holds since the tree

is like a full binary tree (that has a height of log2(n))5) in all other cases (i.e. there are 3‐nodes and 4‐nodes) the height can

only be lower

nevertheless, TD234 trees are difficult to implement and entail overhead due to the manipulation of more complex node structures

thus, we are looking for a ‘simple’ implementation that provides the same properties as TD234 trees do



overview




Red‐Black Trees

fundamentals at first introduced in 1972 by RUDOLF BAYER, named red‐black trees in

1978 by LEONIDAS J. GUIBAS und ROBERT SEDGEWICK

red‐black (RB) trees are self‐balancing binary search trees w/ guaranteed (log2(n)) access for typical operations such as insert, delete, or search

every node of an RB tree has one additional attribute, called colour, with two values red and black

colouring properties to be satisfied1) each node is either red or black2) the root node is always black (sometimes omitted)3) if a node is red, then both its children are black4) every path from a given node to any of its descendant leaf nodes

contains the same number of black nodes number of black nodes from the root to a node is called the node’s black depth

5) all external nodes (NIL) are black



NIL

Red‐Black Trees

fundamentals (cont’d) example

due to 3), no path from the root to any leaf contains more red than black nodes; in the extreme case the shortest path only contains black nodes

due to 4), the number of black nodes on all paths must be same, hence the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf any search in a RB tree with n nodes requires fewer than 2log2(n2) comparisons

eb m

a d g r

ifc s



Red‐Black Trees

connexion to top‐down 2‐3‐4 trees RB trees easily to be transferred into TD234 trees (and vice versa) hence, red children to be included into the black parent node (or split with

the black node becoming parent and the red nodes becoming children)

1 2 3 4

1 2 3 4

4‐node

1 2

3

1 2 3

left‐oriented 3‐node

1

2 3

1 2 3

right‐oriented 3‐node



connexion to top‐down 2‐3‐4 trees (cont’d) previous example (either orientation of 3‐nodes is fine)

observation BTS procedure for binary search trees works w/o modifications as it doesn’t have to examine the colour of nodes no search overhead due to balancing mechanism

colour only important for inserting keys (keeping track of TD234 rules)

Red‐Black Trees

a c kg j r

f nf

c n

a j r

kg



b

Red‐Black Trees

inserting keys similar approach as in case of binary search trees

first of all perform an unsuccessful search (to find respective parent) insert new key as corresponding left right child to above parent with colour red in order to preserve tree’s black depth

if RB property 3) is violated, the tree needs to be repaired

example: inserting key b into the following RB tree

question: how does this work?

fc n

a j r

kg

fb n

a j r

kg

c



Red‐Black Trees

inserting keys (cont’d) five cases to be distinguished

i. new node is root recolour it from red to black ( RB property 2)ii. parent node is black clearly RB properties 3) and 4) are satisfiediii. parent and uncle node are both red colour flip

1 2

3 4

GP U

N 5

N: new node G: grandparent nodeP: parent node U: uncle node

1 2

3 4

GP U

N 5

observation: for any path through the grandparent the number of black nodes on these paths has not changed, hence RB property 4) is satisfied

nevertheless, RB properties 2) and or 3) may be violated, hence a tail‐recursive processing starting at grandparent node might become necessary



Red‐Black Trees


iv. parent is red, uncle is black, new node is left or right unaligned with P and G left right rotation

1

2 3

4

GP U

N 5


1 2

3 4

GN U

P 5

observation 2: subtrees labelled 1, 2, and 3 were attached to red nodes before the rotation and are attached to red nodes after the rotation RB property 4) not violated by the rotation

observation 1: left rotation switches roles of N and P, but RB property 3) still violated further (right) rotation necessary (i.e. case v.)



Red‐Black Trees


v. parent is red, uncle is black, new node is left or right aligned withP and G right left rotation


1 2

3 4

GP U

N 5

observation 2: RB property 4) also remains satisfied as any path to subtrees labelled 1, 2, 3, 4, or 5 went through G before and now goes through P

observation 1: right rotation switches roles of P and G and performs a colour change, i.e. parent node becomes black and grandparent node becomes red RB property 3) is satisfied

1 2 3

4

PN G

5

U



l

Red‐Black Trees

inserting keys (cont’d) example: inserting key l

fc n

a j r

kg

l

fc n

a j r

kg

l

fc

na

j

rk

g

l

f

c

n

a

j

rkg left rotation

right rotation

colour flip



Red‐Black Trees

deleting keys again, several cases to be distinguished for both red and black nodes

a) node is a leafb) node has one childc) node has two children

case a) for red node: due to RB property 2 the parent must be black deleting the (red) node does not change tree’s black depth

case b) for red node: due to RB property 2 both parent and child must be black deleting the (red) node does not change tree’s black depth

case a) for red node case b) for red node

easy going NIL



Red‐Black Trees

deleting keys (cont’d) case c) for red node: due to RB property 2 the parent and both children

must be black deleting the (red) node might change tree’s black depth

i. first, find node’s in‐order successor ( IOS)ii. copy IOS’s key k to red node w/o changing red node’s colouriii. delete IOS (further activities depending on IOS’s colour)

if IOS is red and has no children case a) for red node if IOS is red and has a black right child case b) for red node if IOS is black case a) or b) for black node

k

kk



Red‐Black Trees

deleting keys (cont’d) case a) for black node: deleting leaf Dmight change tree’s black depth

if so, further operations become necessary

D D

D

D

rotation

BD1 in left childcontinue with case b)

black depth black depth

black depth

some cases for deleting a black leaf



Red‐Black Trees

deleting keys (cont’d) case b) for black node: deleting inner node D

some cases for deleting a black inner node with one child

D1

2 3

12 3

colour flip

12 3

black depth

i)

1 2

3

D

4

1 2 3 4

ii)

12

3 4

rotation

colour flip

BD1 in subtrees 3, 4 continue for red node with iv)

BD1 in right subtree




Red‐Black Trees

deleting keys (cont’d) case b) for black node: deleting inner node D

some cases for deleting a black inner node with one child

iii)

1 2

3

D

4

1 2 3 4

iv)

1 2

3

D

4

1 2 3 4

colour flip

1 2 3 4BD is same for all subtrees, but still smaller than before continue on higher level

colour flip

1 2 3 4

black depth BD1 in right subtree




Red‐Black Trees

deleting keys (cont’d) case c) for black node

i. first, find node’s in‐order successor ( IOS)ii. copy IOS’s key k to black node w/o changing black node’s colouriii. delete IOS (further activities depending on IOS’s colour)

if IOS is red and has no children case a) for red node if IOS is red and has a black right child case b) for red node if IOS is black and has no children case a) for black node if IOS is black and has a right child case b) for black node

observation: practically all six cases (a—c for red and black nodes) can be reduced to the above critical cases that need further processing

for further (bloody ) details on deleting nodes in RB trees refer to literature (e.g. Introduction to Algorithms, T.H. CORMEN et al.)

k



g

Red‐Black Trees

deleting keys (cont’d) example: deleting keys g, j, and f (in that order)

l

f

c

n

a

j

rk

l

f

c n

a

j

rk

lf

c n

a r

k

l

c n

a r

k

colour flip

right rotation



Red‐Black Trees

red‐black BSTs in the wild1 (SEDGEWICK (2012) on an episode of ‘Missing’)

1 https://www.coursera.org/learn/introduction‐to‐algorithms/lecture/HIlHd/b‐trees‐optional

INT. FBI HQ – NIGHT: Antonio is at THE COMPUTER as Jess explains herself to Nicole and Pollock. The CONFERENCE TABLE is covered with OPEN REFERENCE BOOKS, TOURIST GUIDES, MAPS and REAMS OF PRINTOUTS

JESS: It was the red door again.

POLLOCK: I thought the red door was the storage container.

JESS: But it wasn’t red anymore. It was black.

ANTONIO: So red turning to black means... what?

POLLOCK: Budget deficits? Red ink, black ink?

NICOLE: Yes, I’m sure that’s what it is. But maybe we should come up with a couple other options, just in case.

Antonio refers to his COMPUTER SCREEN, which is filled with mathematical equations.

ANTONIO: It could be an algorithm from a binary search tree. A red‐black tree tracks every simple path from a node to a descendant leaf with the same number of black nodes.

JESS: Does that help you with girls?



overview




AVL Trees

fundamentals introduced in 1962 by GEORGI M. ADELSON‐VELSKY and EVGENII M. LANDIS,

named AVL trees according to the last name of their inventors AVL trees are self‐balancing binary search trees where insert, delete, and

search operations all take (log2(n)) time in both average and worst case compared to RB trees, AVL trees are more strictly balanced and, thus, are

expected to be faster

AVL condition balance factor BF(n) of some node n is defined as height difference

BF(n) height(nleft) height(nright)

of its two subtrees with roots nleft and nright

a binary (search) tree is defined to be an AVL tree if the condition

1 BF(n) 1

holds for every node n in the tree



AVL Trees

fundamentals (cont’d) some examples

a node n with balance factor

BF(n) 0 is called left‐heavy BF(n) 0 is called balanced BF(n) 0 is called right‐heavy

AVL tree no AVL tree no AVL tree

02

0

2

1

0

0

1

01 0

1

0

1

0

1 0

0

1 20



AVL Trees

fundamentals (cont’d)

Theorem: An AVL tree with n nodes has at least a height of log2(n1) and at most a height of 1.4405log2(n1).

let A(h) denote the minimal number of nodes an AVL tree of height h has obviously, a tree of height 0 only contains the root node, hence A(0) 1 furthermore, A(1) 2 holds as every tree of height 1 must have at least

two nodes and also fulfils the AVL condition

let an AVL tree of height h 2 be given both subtrees left and right from root are also AVL trees as the tree has a height of h, one of its subtrees must have a height of h1,

the other of at least h2 in order to fulfil the AVL condition for the root

hence, for h 2 we get the following recursion A(h) 1 A(h1) A(h2)

lower bound holds for all binary trees, having at most 2d1 1 nodes for a tree of depth d



fundamentals (cont’d) recursion A(h) 1 A(h1) A(h2) with A(0) 1, A(1) 2 leads to

comparing this with the FIBONACCI series results in A(h) Fh3 1, h 0

via induction follows base case for h 0, 1 directly follows from values above inductive step for h 2 follows from

AVL Trees

A(0) A(1) A(2) A(4)A(3) A(5) A(6) A(8)A(7)

33,20,12,7,4,2,1, 88,54, ...A(h):

FIBONACCI: 8,5,3,2,1,1,0, 21,13, 34, 55, 89, ...F0 F2 F3 F4 F5F1 F8F7F6 F9 F10 F11

due toFn Fn1 Fn2

1 Fh2 1 Fh1 1 Fh3 1A(h) 1 A(h1) A(h2)

with hypothesis



AVL Trees

fundamentals (cont’d) a closed form for the FIBONACCI series with h 0 is given by

from the definition of A follows that n A(h) holds for every AVL tree of height h and with n nodes, hence

with follows

and finally (via some simple transformations) h 1.4405log2(n1)



AVL Trees

inserting keys every node of an AVL tree stores one additional attribute BF(n), its balance hence, after each insert or delete operation these values must be updated

insertion of new elements happens in the same way as for standard binary search trees (i.e. performing an unsuccessful search first)

after insertion the AVL condition has to be checked and – if necessary – the tree must be re‐balanced (via single or double rotations)

therefore, ascend from the new element to the root as only for nodes along this path the balance factors might have changed all other nodes are unaffected by the insertion and, thus, still satisfy the AVL condition

we consider the case to reach some node v via its right child (the other case to reach v from its left child is analogously and will not be discussed)

if the height of v’s right subtree has not grown, the re‐balancing can be stopped as neither the balance of v nor of any node above v has changed all those nodes still satisfy the AVL condition



AVL Trees

inserting keys (cont’d) three cases to be distinguished (denoting the balance before insertion)

i. BF(v) 1 correct balance of v by setting BF(v) 0 height of subtree with root v hasn’t changed, thus all balance factors above vstay unmodified and the re‐balancing can stop

ii. BF(v) 0 correct balance of v by setting BF(v) 1 height of subtree with root v has grown by one, thus the re‐balancing must be continued with the parent of v or can stop in case v is root

iii. BF(v) 1 node is unbalanced with current balance factor of 2

hh2

TL TR

v

new element



AVL Trees

inserting keys (cont’d) case iii. has four sub‐cases to be distinguished

a) the height of x’s right subtree has grown by one via a left rotationthe AVL condition in node v becomes restored

TL

21

vx

h2

hh1 h1

xv

TL 1 2

observation: re‐balancing stops as the height of the subject subtree after the left rotation is the same as before the insertion and, thus, all nodes above this subtree do not need to be changed



AVL Trees


b) heights of x’s left subtree as well as of w’s right subtree have grown by one via a double rotation (i.e. RL rotation, first right at x–w and then left at v–w) the AVL condition in node v becomes restored

observation: re‐balancing stops as the height of the subject subtree after the double rotation is the same as before the insertion and, thus, all nodes above this subtree do not need to be changed

h

h2h1

v

312

xw

TLh2

h3

1

2

TL

vw

x

h1h2

wv x

TL 21

3



AVL Trees


c) heights of x’s left subtree as well as of w’s left subtree have grownby one via a double rotation (i.e. RL rotation, first right at x–w and then left at v–w) the AVL condition in node v becomes restored

h

h2h1

v

321

xw

TL


h

h2h1

xTL

1 23

vw

h1h2

wv x

TL 12

3



AVL Trees


d) special case of previous situation (case c) with empty subtrees TL, 1, 2, and 3 after a double rotation (i.e. RL rotation, first right at x–wand then left at v–w) holds BF(w) BF(v) BF(x) 0


vx

w0

1

2v

w

x

1

0

2w

v x0

0

0



AVL Trees

inserting keys (cont’d) some remarks

re‐balancing might propagate until the root node if repeatedly case ii. (i.e. BF(v) 0) is reached

whenever case iii. is reached, i.e. a single or double rotation is applied, re‐balancing immediately stops

in case node v is reached from its left child all rotations are mirrored, i.e. left rotations become right rotations and RL double rotations become LR double rotations

as all rotations can be implemented with (1) and an unsuccessful search for finding the right place to insert an element takes (log2(n)), the cost for inserting a new key into an AVL tree is as follows

T(n) (log2(n)) (1) (log2(n))



AVL Trees

inserting keys (cont’d) example: inserting keys 4, 5, 7, 2, 1, 3, 6 (in that order) into an empty tree

6

4

2 5

731

4

2 5

731

3

5

2 7

41

4

2 6

5 731

5

4 7

2

5

2 7

41

1

5

4 7

2

5

4 7

4

5

7

4 4

5

RL

LR

L

R

0

0

1 4

0

1

2

0 0

0

0

1 0

1 2

4 0

0

1

2

0

00

0

15

0

0

0

1

1

2

00 0

0

0

15

0

0 0

0

1

2

10 0 0 0

00

0



AVL Trees

deleting keys principally, same approach as for standard binary trees afterwards the AVL condition has to be checked and – if necessary – the

tree must be re‐balanced (via single or double rotations) therefore, ascend from the deleted element to the root as only for nodes

along this path the balance factors might have changed all other nodes are unaffected by the deletion and, thus, still satisfy the AVL condition

but in contrast to insertion, here several single or double rotations along the path towards the root might become necessary, nevertheless for each node along that path at most one

for tree height h, there are at most h1 nodes along that path as for every node there is at most one single or double rotation which

takes (1), the cost for deleting a key from an AVL tree can be computed

T(n) (log2(n)) (1) (log2(n))



AVL Trees

deleting keys (cont’d) example: deleting keys n and a (in that order)

f

c

n

a

j

rg0

1

0

0

1

11 f

c

r

a

j

g

0

0

0

1

12

j

c

a

jf

rg000

1 00c j

f

rg00

0 01

a

R

n



overview




Hashing

fundamentals if the universe U of search keys is not too large, an efficient approach with

search, insert, and delete operations taking (1) time can be realised simply implement an array of size U that for every key k U has one

entry storing the corresponding record or NIL in case it does not exist typically U is large while m U with m denoting the number of records

hence, so called hash function

h : U 0, 1, ..., m1

maps U to the set 0, 1, ..., m1 with typically m U for key k an element is stored at h(k) instead of position k within the array

due to m U, hash function h : U 0, 1, ..., m1 cannot be injective hence, for every hash function two different keys k1 U and k2 U with

h(k1) h(k2) exist, i.e. the keys collide collision resolution inevitable



Hashing

fundamentals (cont’d) problem: how to choose a good hash function in general, a hash function should map keys of U as equal as possible to

the set 0, 1, ..., m1, hence h1(i) k U h(k) i should have more or less the same size for every i

if keys of U are non‐negative integers, simple hash functions are given via division method h(k) kmod m multiplication method h(k) with A (0, 1)

multiplication method choose some constant value A (0, 1) for key k compute the decimal places of kA via 0, 1) multiplying this value with m and rounding it down (to the next integer) delivers the respective position h(k) in the hash table

choosing A 0.61803... has empirically proven as good



Hashing

fundamentals (cont’d) division method

h(k) kmod m computed as remainder of division (i.e. 0 h(k) m) problem: how to choose m best

any integer key to be expressed regarding some radix r as

k anrn an1rn1 ... a2r2 a1r a0 example:

r 10 k 1042 1103 0102 4101 2 r 16 k 1042 4162 1161 2

choosing m r turns out that h(k) a0, i.e. the hash value depends on the least significand digits only

hence, choosing m as prime number has empirically proven as good

question: how to proceed with character keys?



Hashing

fundamentals (cont’d) encoding character keys

numerical representation (with radix r 29) to be computed as

k cn29n cn129n1 ... c2292 c1291 c0

with ci denoting the lexicographic position of i‐th character according to the table above

hence, ‘como’ leads to

k 3293 15292 13291 15 86174

21

u

26252423222019181716151413121110987654321

zyxwvtsrqponmlkjihgfedcba

encoding schema of characters (according to their lexicographic position)

c o m o



Hashing

fundamentals (cont’d) encoding character keys

choosing m 211 leads to

h(como) 86174 mod 211 86 or 86174 86 (mod 211)

further keys

h(kiev) 276015 mod 211 27 or 276015 27 (mod 211)h(lodz) 305425 mod 211 108 or 305425 108 (mod 211)h(oslo) 382177 mod 211 56 or 382177 56 (mod 211)h(pisa) 398345 mod 211 188 or 398345 188 (mod 211)h(umea) 523248 mod 211 179 or 523248 179 (mod 211)h(roco) 451719 mod 211 179 or 451719 179 (mod 211)

unfortunately, ‘roco’ belongs to the same coset as ‘umea’, hence, some more investigation about collision resolution is necessary



Hashing

collision resolution: closed addressing open hashing array of size m stores pointers to NIL or the first element of a linked list linked list at i‐th position contains all records with keys k where h(k) i example

due to usage of several linked lists also referred to as separate chaining question: how good or bad is this approach?

n

a

i

r

s

g

h

e

c

0

1

2

3

4

5

6

5 5 1 4 3 1 2 0 0

s e a r c h i n g

19 5 1 18 3 8 9 14 7

key

h(k)

#lex

choosing m 7 h(k) kmod 7



Hashing

collision resolution: closed addressing open hashing (cont’d) obviously, the cost for a search depends on the length of a list

in the worst case (unlikely for large values of n and reasonable hash functions), all records collide, i.e. they will be stored to the same list search takes (n) time

in the average case, all keys are equally distributed among all list, hence for m n the sequential search complexity reduces to nm

in the previous example, six records will be found in the first place and three records in the second place assuming each record equally likely to be sought, a successful search takes (61 32) 9 1.33 time

in general, for a hash table with load factor nm, closed addressing needs (1 ) time both in case of a successful and unsuccessful search

hence, under the assumption n (m) follows that search, insert, and delete operations take (1) time



Hashing

collision resolution: open addressing closed hashing instead of linked lists, an array of fixed size m n is used as hash table therefore, we consider the following hash function

h: U 0, 1, ..., m1 0, 1, ..., m1

when inserting a new key, it must firstly be tested if position h(k, 0) is empty (thus the key can be stored there), otherwise h(k, 1) will be tested next, then h(k, 2) and so on

for searching a key a similar approach can be realised

i. h(k, 0) k successful searchii. h(k, 0) is empty unsuccessful searchiii. h(k, 0) k continue with h(k, 1), h(k, 2), ... until case i. or ii. applies

above method is called probing and comes up with different ways (e.g. linear, quadratic, double hashing) to find next positions h(k, 1), h(k, 2), ...



Hashing

collision resolution: open addressing closed hashing (cont’d) as long as no records are deleted from the hash table, search operations

work fine as any empty position terminates an unsuccessful search problem: deleting an element might influence correctness of search

deletion of k5 leads to an unsuccessful search for k2 as h(k2, 1) is empty remedy: additional flag (e.g. one bit with value F(alse)) necessary in order

to indicate that at the corresponding position a record has been deleted unsuccessful search iff h(k, ) is empty and corresponding bit is T(rue)

in the worst case, all positions have already been deleted (m) time hence, closed hashing is efficient for tables with no or few deletions only

k5k1 k6k4k3 k2k7

h(k2, 0) h(k2, 2)

h(k2, 1)

F



Hashing

collision resolution: open addressing closed hashing (cont’d) assuming a regular hash function h: U 0, 1, ..., m1 already exists,

then linear probing successively tests for a key k U the positions h(k), h(k) 1, h(k) 2 and so on

formally, we can define

h(k, i) (h(k) i) mod m

example

8 5 1 7 3 8 9 3 7

s e a r c h i n g

19 5 1 18 3 8 9 14 7

key

h(k)

#lex

ia n hg e sc r0 2 31 54 876 109

1114

m 11

00000



Hashing

collision resolution: open addressing closed hashing (cont’d) in the previous example, five records will be found in the first place (0),

three records in the second (1), and one record in the fifth (4) assuming each record is equally likely to be sought, a successful search

takes (51 32 15) 9 1.77 probes on average

as closed hashing is not very efficient in case of frequent deletions, the following analysis is based on the assumption that no deletions apply at all

again, the load factor of a hash table is defined as nm hence, an unsuccessful search needs at most 1 (1 ) probes and a

successful search at most probes

caution: as approaches 1, the number of necessary probes for an unsuccessful search increases tremendously

50% 2, 80% 5, 90% 10, 95% 20



Hashing

collision resolution: open addressing closed hashing (cont’d) unfortunately, linear probing tends to form longer consecutive blocks

which have negative impact on the cost for search, insert, and delete

something about probabilities

10 2 3 54 6 7

m 8position filled

position empty

when adding a new key k, we assume h(k) spreads it uniformly within 0, 1, ..., 7

the new key will be inserted at any of the four empty position that do not (!) havethe same probability

Pr[i 1] Pr[h(k) 0, 1] 28 14 Pr[i 2] Pr[h(k) 2] 18

Pr[i 6] Pr[h(k) 3, 4, 5, 6] 48 12 Pr[i 7] Pr[h(k) 7] 18

assume the position for insertion is denoted as i, then we get probabilities Pr[i]

hence, it is very likely that a new element is inserted at position 6 and the blockgrows in length



Hashing

collision resolution: open addressing closed hashing (cont’d) an alternative to linear probing is quadratic probing therefore, following positions (each with mod m) are successively tested

h(k), h(k) 12, h(k) 12, h(k) 22, h(k) 22, h(k) 32, h(k) 32, ...

for two keys k1 and k2 with h(k1, 0) h(k2, 1) in contrast to linear probing different sequences are passed and thus h(k1, 1) h(k2, 2) typically does not hold hence, longer consecutive blocks are avoided

another approach with even better properties is called double hashing idea: next to h(k) a second hash function h : U 0, 1, ..., m1 exists hence, h(k) is defined as

h(k, i) (h(k) ih(k)) mod m

not all choices of h and h are reasonable (e.g. h should not become 0); for more details refer to Introduction to Algorithms, T.H. CORMEN et al.



overview


algorithms and data structures - cie.bgu.tum.demundani/algdata/part04.pdf · algorithms and data...

Documents