set and map adts: b-trees - university of washington · brian chan jade watkins yuma tou elena...

37
CSE373, Winter 2020 L08: B-Trees Set and Map ADTs: B-Trees CSE 373 Winter 2020 Instructor: Hannah C. Tang Teaching Assistants: Aaron Johnston Ethan Knutson Nathan Lipiarski Amanda Park Farrell Fileas Sam Long Anish Velagapudi Howard Xiao Yifan Bai Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan

Upload: others

Post on 17-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Set and Map ADTs: B-TreesCSE 373 Winter 2020

Instructor: Hannah C. Tang

Teaching Assistants:

Aaron Johnston Ethan Knutson Nathan Lipiarski

Amanda Park Farrell Fileas Sam Long

Anish Velagapudi Howard Xiao Yifan Bai

Brian Chan Jade Watkins Yuma Tou

Elena Spasova Lea Quan

Page 2: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Announcements

❖ Asymptotic Analysis: handout coming soon

❖🎊Workshops 🎊

▪ Student-centered study groups; bring your questions!

▪ Friday 11:30-12:30 @ CSE 203

❖ Extra Drop-in Time 🎇🎇🎇

▪ Saturday morning: 10:30am-12pm @ Odegaard 117E

❖ “tl;dr” slides are your per-topic learning objectives

3

Page 3: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Questions from Reading Quiz

❖ Why is the reading quiz borkened💔?

❖ Why is BST height in O(N2)?

❖ Why is BST height NOT in Θ(N)?

❖ How do you calculate average depth? Why do we care about depth? How does it relate to height?

4

Page 4: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Lecture Outline

❖ BST Remove (cont.)

❖ BST Tree Height

❖ 2-3 Trees

❖ B-Trees

5

Page 5: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Binary Search Trees: Remove

❖ 3 cases based on the number of children

1. Key has no children

2. Key has one child

3. Key has two children

❖ In each case, we must maintain the Binary Search Tree Invariant!

6

dog

baby glug

ant

frog

cat ears hippo

Page 6: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

BST Remove: Case #1: Leaf

❖ Remove the node with the value hippo

7

dog

baby glug

ant

frog

cat ears hippo

BSTNode remove(BSTNode n) {

}

Page 7: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

BST Remove: Case #2: One Child

❖ Remove the node with the value ears

▪ What does the BST invariant say about the descendant's values?

8

dog

baby glug

ant

frog

cat ears hippo

BSTNode remove(BSTNode n) {

}

Page 8: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

BST Remove: Case #3: Two Children

❖ Remove the node with the value dog

❖ The replacement node:

▪ Must be ≻ than all keys in left subtree

▪ Must be ≺ than all keys in right subtree

9

dog

baby glug

ant

frog

cat ears hippo

Page 9: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

BST Remove: Case #3: Two Children

❖ Remove the node with the value dog

❖ The replacement node:

▪ Must be ≻ than all keys in left subtree: predecessor (cat)

▪ Must be ≺ than all keys in right subtree: successor (ears)

❖ The predecessor or successor have either 0 or 1 children

10

dog

baby glug

ant

frog

cat ears hippo

Page 10: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

BST Remove: Case #3: Two Children

11

dog

baby glug

ant

frog

cat ears hippo

cat

baby glug

ant

frog

ears hippo

ears

baby glug

ant cat frog hippo

Page 11: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Aside: Finding the largest (or smallest) node

❖ The predecessor is the largest node in the left subtree

❖ The successor is the smallest node in the right subtree

❖ How do you find the largest (and smallest) node in a tree?

▪ Remember that subtrees are trees too

12

dog

baby glug

ant

frog

cat ears hippo

BSTNode largest(BSTNode n) {

while (n.right != null) {

n = n.right;

}

return n;

}

Page 12: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

tl;dr

❖ Binary Search Trees implement both the Set and Map ADTs

❖ Binary Search Trees are recursively defined

❖ Binary Search Trees can be an efficient Map/Set ADT implementations

13

LinkedList Map, Worst Case

BST Map,Worst Case

Find Θ(N) Θ(h)

Add Θ(N) Θ(h)

Remove Θ(N) Θ(h)

🤔

Page 13: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Lecture Outline

❖ BST Remove (cont.)

❖ BST Tree Height

❖ 2-3 Trees

❖ B-Trees

14

Page 14: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Binary Search Tree: Height

❖ Suppose we want to build a BST out of {1, 2, 3, 4, 5, 6, 7}

❖ Give a sequence of add operations that result in:

▪ a spindly tree (“worst case”)

▪ a bushy tree (“best case”)

15

4

2

6

1

3

5

7

4

2 6

1 3 5 7

Height: 6Average Depth: 3

Height: 2Average Depth: 1.43

Page 15: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Randomization: Mathematical Analysis

❖ Binary search tree height is in O(N)

▪ Worst case height: Θ(N)

▪ Best case height: Θ(log N)

▪ Θ(log N) via randomized insertion

• Randomized insertion with randomized deletion is still Θ(log N) height

❖ BSTs are frequently concerned with best- and worst-case tree structure

16

The Height of a Randomized Binary Search Tree (Reed/STOC 2000)

Average Depth of a Randomized BST

If N distinct keys are inserted in

random order, the expected average

depth is

~ 2 ln N = Θ(log N).

Total Height of a Randomized BST

If N distinct keys are inserted in

random order, the expected height is

~ 4.311 ln N = Θ(log N).

Page 16: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

What About “Real World” BSTs?

❖ These examples are contrived! What about real-world workloads?

❖ An approximation of the real-world: inserting random numbers

17

Random Insertion into a BST (Kevin Wayne/Princeton)https://www.youtube.com/watch?v=5dGkblzqdmc

Random trees have Θ(log N) average depth and height

Random trees are bushy, not spindly

Page 17: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Randomization is Pretty Good!

❖ BSTs have great runtime if we insert keys randomly

▪ Θ(log N) per insertion

❖ But:

▪ We can’t always insert our keys in a random order. Why?

▪ What if we need guaranteedΘ(log N) runtime?

18

e

b g

o

n p

m

q

r

s

Page 18: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Bounding the Height

❖ Recall that a Binary Search Tree’s invariant is:

▪ The left subtree only contains values <k

▪ The right subtree only contains values >k

❖ What invariants could we add, to bound the height to log N?

19

4

2

6

1

3

5

7

4

2 6

1 3 5 7

Page 19: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Bounding the Height: Example Invariant

❖ Hypothesis: Every node has either 0 or 2 children

❖ Analysis: What is the worst-case height for this tree?

20

4

2

6

1

3

5

7

4

2 6

1 3 5 7

Page 20: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

pollev.com/uwcse373

What is the worst-case height of a BST where every node must have either 0 or 2 children?

A. Θ(1)

B. Θ(log2N)

C. Θ(N)

D. Θ(N log2N)

E. Θ(N2)

21

H(N) ∈ Θ(N)

How do you add a node to this tree?

Page 21: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Adding Nodes Creates Worst-case Height Trees

❖ Unbalanced growth leads to worst-case height trees

❖ When does adding a new node affect the height of a tree?

▪ Can you explain in terms of the subtrees (ie, recursively)?

22

2

1 3

6

5 7

4

8

9

10

Page 22: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Your Turn: Generate Some Invariants

❖ Generate an invariant that might balance your tree

▪ Is it strong enough to roughly-balance the tree?

▪ Is it flexible enough to be maintainable?

23

Page 23: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Lecture Outline

❖ BST Remove (cont.)

❖ BST Tree Height

❖ 2-3 Trees

❖ B-Trees

24

Page 24: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Bounding the Height: Overstuff the leaves

❖ If we never add new leaves, the tree can never get unbalanced

▪ Instead: Overstuff existing leaves to avoid adding new leaves

25

5

2 7

15

14 16 17

13

5

2 7

15

14 16 17 18

13

5

2 7

15

14 16 17 18 19 20 21 22 23 24 25

13

Page 25: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Overstuffed Leaves: Promote the Keys

❖ Set a limit L on number of keys

▪ e.g. L=3

❖ If any node has more than L keys, give (“promote”) a key to the parent

▪ e.g. the left-middle key

▪ Why not the leftmost or rightmost?

26

5

2 7

15

14 16 17 18 19

13

Promote to parent

5

2 7

15 17

14 16 18 19

13

Page 26: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Promoting Keys Splits the Leaf Node

❖ Set a limit L on number of keys

▪ e.g. L=3

❖ If any node has more than L keys, give (“promote”) a key to the parent

▪ e.g. the left-middle key

▪ Why not the leftmost or rightmost?

▪ Promoting a key splits the old overstuffed node into two new parts: left and right.

27

5

2 7

15

14 16 17 18 19

13

Promote to parent

5

2 7

15 17

14 18 19

13

16

Page 27: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Practice: Adding More Keys

❖ Suppose we add the keys 20 and 21.

❖ If our cap is at most L=3 keys per node, draw the post-split tree.

28

5

2 7

15 17

14 18 19 20 21

13

16

5

2 7

15 17 19

13

14 1816 20 21

Page 28: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Promoting Keys Can Cascade Into Ancestors

❖ Add 25 and 26

29

5

2 7

15 17 19

13

14 1816 20 21 25 26

5

2 7

15 17 19 21

13

14 1816 25 2620

??

Page 29: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Overstuffing the Root Node

❖ If promotions can cascade up the tree, we may eventually need to split the root.

❖ Splitting the root is the only time a tree grows in height!

30

22 23 24 25155

13 17 21

19

24 25155

13 17 21 23

19 22

24 25155

21 23

19 22

17

13

Page 30: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

2-3, 2-3-4, and B-Trees

❖ We chose limit L=3 keys in each node. Formally, this is called a 2-3-4 Tree: each non-leaf node can have 2, 3, or 4 children

❖ 2-3 Tree. Choose L=2 keys. Each non-leaf node can have 2 or 3 children

❖ B-Trees are the generalization of this idea for any choice of L

31

2-3-4 Tree

Max 3 keys and 4 non-null children per node.

s u w

r y ztn p

oe

b g

m q

v

2-3 Tree

Max 2 keys and 3 non-null children per node.

s u

r tn p

oe

b g

m q

v

Page 31: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

2-3 Tree Practice

❖ Give an insertion order for the keys {1, 2, 3, 4, 5, 6, 7} that results in:

▪ a max-height 2-3 Tree

▪ a min-height 2-3 Tree

Demo: https://www.cs.usfca.edu/~galles/visualization/BTree.html

32

2 6

4

1 3 5 7

3 5

1 2 6 74

1, 2, 3, 4, 5, 6, 7 2, 3, 4, 5, 6, 1, 7

Page 32: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Lecture Outline

❖ BST Remove (cont.)

❖ BST Tree Height

❖ 2-3 Trees

❖ B-Trees

33

Page 33: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

B-Tree Invariants

❖ B-Tree’s invariants guarantee “bushy” trees (ie, H(N) ∈ Θ(log2N))

1. All leaves must be the same depth from the root

• Achieved because the tree’s height only grows from the root

2. A non-leaf node with k keys must have exactly k + 1 non-null children

• Achieved because we remove two keys from an overstuffed child: one is promoted to the parent and the other becomes the new child of the newly-promoted parent key

3. A non-leaf non-root node must have at least ceil(L/2) children

• (A non-leaf root node must have >=2 children)

❖ Why are these invalid B-Trees?

34

2 3 5 6 7

4

1

1 2 3 6 7

4 5

Page 34: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

B-Tree Invariants Bound Its Height

❖ Smallest possible height (“shortest tree”) is when all nodes have L keys

▪ H(N) ~ logL + 1N ∈ Θ(log N)

35

❖ Largest possible height (“tallest tree”) is when all non-leaf nodes have just 1 key

▪ H(N) ~ log2N ∈ Θ(log N)

* *

*

* * * * *

N=8, L=2, H(N) = 2

N=8, L=2, H(N) = 1

* * * *

* *

* *

Page 35: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Search Runtime

❖ Shortest-case number of nodes to inspect: logL + 1N

❖ Shortest-case number of keys to inspect per node: L

❖ Runtime: L logL + 1N ∈ Θ(log N)

36

❖ Tallest-case number of nodes to inspect: log2N + 1

❖ Tallest-case number of keys to inspect per node: 1

❖ Runtime: log2N + 1 ∈ Θ(log N)

* *

*

* * * * *

N=8, L=2, H(N) = 2

N=8, L=2, H(N) = 1

* * * *

* *

* *

Page 36: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

Insertion Runtime

❖ Shortest-case number of nodes to inspect: logL + 1N

❖ Shortest-case number of keys to inspect per node: L

❖ Shortest-case number of splits: logL + 1N

❖ Runtime: 2L logL + 1N ∈ Θ(log N)

37

❖ Tallest-case number of nodes to inspect: log2N + 1

❖ Tallest-case number of keys to inspect per node: 1

❖ Tallest-case number of splits: log2N + 1

❖ Runtime: 2log2N + 2 ∈ Θ(log N)

* *

*

* * * * *

N=8, L=2, H(N) = 2

N=8, L=2, H(N) = 1

* * * *

* *

* *

Page 37: Set and Map ADTs: B-Trees - University of Washington · Brian Chan Jade Watkins Yuma Tou Elena Spasova Lea Quan. L08: B-Trees CSE373, Winter 2020 Announcements

CSE373, Winter 2020L08: B-Trees

tl;dr❖ Search Trees have great runtimes most of the time

▪ But they struggle with sorted (or mostly-sorted) input

▪ Must bound the height if we need runtime guarantees

❖ Plain BSTs: simple to reason about/implement. A good starting point

❖ B-Trees are a Search Tree variant that binds the height to Θ(log N) by only allowing the tree to grow from its root

▪ A good choice for a Map and/or Set implementation

38

LinkedList Map, Worst

Case

BST Map,Worst Case

B-Tree Map,Worst Case

Find Θ(N) h = Θ(N) Θ(log N)

Add Θ(N) h = Θ(N) Θ(log N)

Remove Θ(N) h = Θ(N) Θ(log N)