sequitur algorithm cse 589 applied algorithms data structure...

18
1 CSE 589 Applied Algorithms Autumn 2001 Sequitur Conclusion Image Compression Vector Quantization Nearest Neighbor Search CSE 589 - Lecture 7 - Autumn 2001 2 Sequitur Algorithm Input the first symbol s to create the production S -> s; repeat match an existing rule: create a new rule: remove a rule: input a new symbol: until no symbols left A -> ….XY… B -> XY A -> ….B…. B -> XY A -> ….XY…. B -> ....XY.... A -> ….C.... B -> ....C.... C -> XY A -> ….B…. B -> X 1 X 2 …X k A -> …. X 1 X 2 …X k …. S -> X 1 X 2 …X k S -> X 1 X 2 …X k s CSE 589 - Lecture 7 - Autumn 2001 3 Data Structure Example S -> CeBCe A -> be B -> AA C -> bB 2 2 C 0 C e B C e b e A A 2 b B B A S BC eB Ce be AA bB digram table current digram reference count CSE 589 - Lecture 7 - Autumn 2001 4 Basic Encoding a Grammar S -> DBD A -> be B -> AA D -> bBe b 000 e 001 S 010 A 011 B 100 D 101 # 110 D B D # b e # A A # b B e 101 100 101 110 000 001 110 011 011 110 000 100 001 39 bits Grammar Symbol Code Grammar Code |Grammar Code| = ) 1 ( log ) 1 ( 2 + + - + a r r s r = number of rules s = sum of right hand sides a = number in original symbol alphabet CSE 589 - Lecture 7 - Autumn 2001 5 Better Encoding of the Grammar Nevill-Manning and Witten suggest a more efficient encoding of the grammar that resembles LZ77. The first time a nonterminal is sent, its right hand side is transmitted instead. The second time a nonterminal is sent the new production rule is established with a pointer to the previous occurrence sent along with the length of the rule. Subsequently, the nonterminal is represented by the index of the production rule. CSE 589 - Lecture 7 - Autumn 2001 6 Compression Quality Neville-Manning and Witten 1997 size compress gzip sequitur PPMC bib 111261 3.35 2.51 2.48 2.12 book 768771 3.46 3.35 2.82 2.52 geo 102400 6.08 5.34 4.74 5.01 obj2 246814 4.17 2.63 2.68 2.77 pic 513216 0.97 0.82 0.90 0.98 progc 38611 3.87 2.68 2.83 2.49 Files from the Calgary Corpus Units in bits per character (8 bits) compress - based on LZW gzip - based on LZ77 PPMC - adaptive arithmetic coding with context

Upload: others

Post on 20-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

1

CSE 589Applied Algorithms

Autumn 2001

Sequitur ConclusionImage CompressionVector Quantization

Nearest Neighbor SearchCSE 589 - Lecture 7 - Autumn

20012

Sequitur Algorithm

Input the first symbol s to create the production S -> s;repeat

match an existing rule:

create a new rule:

remove a rule:

input a new symbol:

until no symbols left

A -> ….XY…B -> XY

A -> ….B….B -> XY

A -> ….XY….B -> ....XY....

A -> ….C....B -> ....C....C -> XY

A -> ….B….B -> X1X2…Xk

A -> …. X1X2…Xk ….

S -> X1X2…Xk S -> X1X2…Xks

CSE 589 - Lecture 7 - Autumn 2001

3

Data Structure Example

S -> CeBCeA -> beB -> AAC -> bB

2

2

C

0 C e B C e

b e

A A

2 b B

B

A

S

BCeBCebeAAbB

digram table

current digram

reference count

CSE 589 - Lecture 7 - Autumn 2001

4

Basic Encoding a Grammar

S -> DBDA -> beB -> AAD -> bBe

b 000e 001S 010A 011B 100D 101# 110

D B D # b e # A A # b B e101 100 101 110 000 001 110 011 011 110 000 100 001 39 bits

Grammar Symbol Code

Grammar Code

|Grammar Code| = )1(log)1( 2 ++−+ arrs

r = number of ruless = sum of right hand sidesa = number in original symbol alphabet

CSE 589 - Lecture 7 - Autumn 2001

5

Better Encoding of the Grammar

• Nevill-Manning and Witten suggest a more efficient encoding of the grammar that resembles LZ77.– The first time a nonterminal is sent, its right hand

side is transmitted instead.– The second time a nonterminal is sent the new

production rule is established with a pointer to the previous occurrence sent along with the length of the rule.

– Subsequently, the nonterminal is represented by the index of the production rule.

CSE 589 - Lecture 7 - Autumn 2001

6

Compression Quality

• Neville-Manning and Witten 1997

size compress gzip sequitur PPMCbib 111261 3.35 2.51 2.48 2.12book 768771 3.46 3.35 2.82 2.52geo 102400 6.08 5.34 4.74 5.01obj2 246814 4.17 2.63 2.68 2.77pic 513216 0.97 0.82 0.90 0.98progc 38611 3.87 2.68 2.83 2.49

Files from the Calgary CorpusUnits in bits per character (8 bits)compress - based on LZWgzip - based on LZ77PPMC - adaptive arithmetic coding with context

Page 2: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

2

CSE 589 - Lecture 7 - Autumn 2001

7

Notes on Sequitur

• Very new and different from the standards.• Yields compression and hierarchical structure

simultaneously.• With clever encoding is competitive with the

best of the standards.• Practical linear time encoding and decoding.• Alternatives

– Off-line algorithms – (i) find the most frequent digram, (ii) find the longest repeated substring

CSE 589 - Lecture 7 - Autumn 2001

8

Exercise

• Apply Sequitur to the stringaaaaaaaaaaaaaaaa (16 a’s).

• What is the length of the code assuming a 2 symbol alphabet?

CSE 589 - Lecture 7 - Autumn 2001

9

Lossy Image Compression Methods

• Basic theory - trade-off between bit rate and distortion.

• Vector quantization (VQ).– A indices of set of representative blocks can be

used to code an image, yielding good compression. Requires training.

• Wavelet Compression.– An image can be decomposed into a low resolution

version and the detail needed to recover the original. Sending most significant bits of the wavelet coded yields excellent compression.

CSE 589 - Lecture 7 - Autumn 2001

10

JPEG Standard

• JPEG - Joint Photographic Experts Group– Current image compression standard. Uses

discrete cosine transform, scalar quantization, and Huffman coding.

– JPEG 2000 uses to wavelet compression.

CSE 589 - Lecture 7 - Autumn 2001

11

Barbara

original JPEG

VQ Wavelet-SPIHT

32:1 compression ratio.25 bits/pixel (8 bits)

CSE 589 - Lecture 7 - Autumn 2001

12

JPEG

Page 3: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

3

CSE 589 - Lecture 7 - Autumn 2001

13

VQ

CSE 589 - Lecture 7 - Autumn 2001

14

SPIHT

CSE 589 - Lecture 7 - Autumn 2001

15

Original

CSE 589 - Lecture 7 - Autumn 2001

16

Distortion

• Lossy compression:• Measure of distortion is commonly mean

squared error (MSE). Assume x has n real components (pixels).

Encoder Decoder

compressedoriginal

x y x̂

decompressed

xx ˆ≠

2

1

)ˆ(1 ∑

=

−=n

iii xx

nMSE

CSE 589 - Lecture 7 - Autumn 2001

17

PSNR

• Peak Signal to Noise Ratio (PSNR) is the standard way to measure fidelity.

• PSNR is measured in decibels (dB).• .5 to 1 dB is said to be a perceptible

difference.

)(log102

10 MSE

mPSNR =

where m is the maximum value of a pixel possible. For gray scale images (8 bits per pixel) m = 255.

CSE 589 - Lecture 7 - Autumn 2001

18

Rate-Fidelity Curve

20222426283032343638

0.02

0.13

0.23

0.33

0.42

0.52

0.65

0.73

0.83

0.92

bits per pixel

PS

NR

SPIHT coded Barbara

Properties:- Increasing- Slope decreasing

Page 4: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

4

CSE 589 - Lecture 7 - Autumn 2001

19

PSNR is not Everything

PSNR = 25.8 dB PSNR = 25.8 dB

VQ

CSE 589 - Lecture 7 - Autumn 2001

20

PSNR Reflects Fidelity (1)

PSNR 25.8.63 bpp12.8 : 1

VQ

CSE 589 - Lecture 7 - Autumn 2001

21

PSNR Reflects Fidelity (2)

PSNR 24.2.31 bpp25.6 : 1

VQ

CSE 589 - Lecture 7 - Autumn 2001

22

PSNR Reflects Fidelity (2)

PSNR 23.2.16 bpp51.2 : 1

VQ

CSE 589 - Lecture 7 - Autumn 2001

23

Vector Quantization

1

2

n

.

.

.

source imagecodebook

1

2

n

.

.

.

codebook

i

index of nearest codeword

decoded image

CSE 589 - Lecture 7 - Autumn 2001

24

Vector Quantization Facts

• The image is partitioned into a x b blocks.• The codebook has n representative a x b

blocks called codewords, each with an index.• Compression is

• Example: a = b = 4 and n = 1,024– compression is 10/16 = .63 bpp– compression ratio is 8 : .63 = 12.8 : 1

ab

n2log bpp

Page 5: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

5

CSE 589 - Lecture 7 - Autumn 2001

25

Examples

4 x 4 blocks.63 bpp

4 x 8 blocks.31 bpp

8 x 8 blocks.16 bpp

Codebook size = 1,024

CSE 589 - Lecture 7 - Autumn 2001

26

Encoding and Decoding

• Encoding:– Scan the a x b blocks of the image. For each

block find the nearest codeword in the code book and output its index.

– Nearest neighbor search.

• Decoding:– For each index output the codeword with that

index into the destination image.– Table lookup.

CSE 589 - Lecture 7 - Autumn 2001

27

The Codebook

• Both encoder and decoder must have the same codebook.

• The codebook must be useful for many images and be stored someplace.

• The codebook must be designed properly to be effective.

• Design requires a representative training set.• These are major drawbacks to VQ.

CSE 589 - Lecture 7 - Autumn 2001

28

Codebook Design Problem

• Input: A training set X of vectors of dimension d and a number n. (d = a x b and n is number of codewords)

• Ouput: n vectors c1, c2, ... cn that minimizes the sum of the distances from each member of the training set to its nearest codeword. That is minimizes

where cn(x) is the nearest codeword to x.

( )∑ ∑∑∈

=∈

−=−Xx

d

i xnXx

xn ixicxc1

2)()( )()(

CSE 589 - Lecture 7 - Autumn 2001

29

Algorithms for Codebook Design

• The optimal codebook design problem appears to be a NP-hard problem.

• There is a very effective method, called the generalized Lloyd algorithm (GLA) for finding a good local minimum.

• GLA is also known in the statistics community as the k-means algorithm.

• GLA is slow.

CSE 589 - Lecture 7 - Autumn 2001

30

GLA

• Start with an initial codebook c1, c2, ... cn and training set X.

• Iterate:– Partition X into X1, X2, ..., Xn where Xi includes the

members of X that are closest to ci.– Let xi be the centroid of Xi .

– if is sufficiently small then quit.otherwise continue the iteration with the new codebook c1, c2, ... cn = x1, x2, ... xn .

∑∈

=iXxi

i xX

x1

∑ =−n

i ii cx1

Page 6: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

6

CSE 589 - Lecture 7 - Autumn 2001

31

GLA Example (1)

codeword

trainingvector

ci

CSE 589 - Lecture 7 - Autumn 2001

32

GLA Example (2)

codeword

trainingvector

ci

Xi

CSE 589 - Lecture 7 - Autumn 2001

33

GLA Example (3)

codeword

trainingvector

centroid

ci

Xi

xi

CSE 589 - Lecture 7 - Autumn 2001

34

GLA Example (4)

codeword

trainingvector

CSE 589 - Lecture 7 - Autumn 2001

35

GLA Example (5)

codeword

trainingvector

CSE 589 - Lecture 7 - Autumn 2001

36

GLA Example (6)

codeword

trainingvector

centroid

Page 7: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

7

CSE 589 - Lecture 7 - Autumn 2001

37

GLA Example (7)

codeword

trainingvector

CSE 589 - Lecture 7 - Autumn 2001

38

GLA Example (8)

codeword

trainingvector

CSE 589 - Lecture 7 - Autumn 2001

39

GLA Example (9)

codeword

trainingvector

centroid

CSE 589 - Lecture 7 - Autumn 2001

40

GLA Example (10)

codeword

trainingvector

centroid

CSE 589 - Lecture 7 - Autumn 2001

41

Codebook

1 x 2 codewords

Note: codewordsdiagonally spread

CSE 589 - Lecture 7 - Autumn 2001

42

GLA Advice

• Time per iteration is dominated by the partitioning step, which is m nearest neighbor searches where m is the training set size. – Average time per iteration O(m log n) assuming d

is small.

• Training set size.– Training set should be at least 20 training vectors

per code word to get reasonable performance.– To small a training set results in “over training”.

• Number of iterations can be large.

Page 8: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

8

CSE 589 - Lecture 7 - Autumn 2001

43

Nearest Neighbor Search

• Preprocess a set of n vectors V in ddimensions into a search data structure.

• Input: A query vector q.• Ouput: The vector v in V that is nearest to q.

That is, the vector v in V that minimizes

( )∑=

−=−d

i

iqivqv1

22)()((

CSE 589 - Lecture 7 - Autumn 2001

44

NNS in VQ

• Used in codebook design.– Used in GLA to partition the training set.– Since codebook design is seldom done then

speed of NNS is not too big a issue.

• Used in VQ encoding.– Codebook size is commonly 1,000 or more.– Naive linear search would make encoding too

slow.– Can we do better than linear search?

CSE 589 - Lecture 7 - Autumn 2001

45

Naive Linear Search

• Keep track of the current best vector, best-v, and best distance squared, best-squared-d. – For an unsearched vector v compute ||v -q||2 to

see if it smaller than best-squared-d. – If so then replace best-v.– If d is moderately large it is a good idea not to

compute the squared distance completely. Bail out when k < d and

∑=

−≤−−k

i

iqiv1

2))()((dsquaredbest

CSE 589 - Lecture 7 - Autumn 2001

46

Orchard’s Method

• Invented by Orchard (1991)• Uses a “guess codeword”. The guess

codeword is the codeword of an adjacent coded vector.

• Orchard’s Principle.– if r is the distance between the guess codeword

and the query vector q then the nearest codeword to q must be within a distance of 2r of the guess codeword.

CSE 589 - Lecture 7 - Autumn 2001

47

Orchard’s Principle

query

codeword

r 2r

guess codeword

Nearest codeword within distance r of guess codeword.

CSE 589 - Lecture 7 - Autumn 2001

48

Orchard Data Structure

6 5 4 2 3 8 7

3 6 7 5 8 1 4

5 2 7 8 6 4 1

1

1

2

3

4

5

6

7

8

• For each codeword sort the other codewords by distance to the codeword.

2

3

.

.

.

indexdistance

A[1..8,1..7]

Page 9: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

9

CSE 589 - Lecture 7 - Autumn 2001

49

Basic Orchard Algorithm

Let i be the index of initial guess codeword;r := b := ||ci - q||;best-index := i;j := 1;while A[i,j].distance < 2*r do

if ||cj - q|| < b then best-index := j;b := ||cj - q||;

j := j + 1;

This algorithm searches all the codewords within a distance 2r of the guess codeword.

CSE 589 - Lecture 7 - Autumn 2001

50

Orchard Improvements

• Early bailout using squared distance:– ||cj - q|| < b is done by early bailout using the

comparison ||cj - q||2 < b2 .

• Switching Lists:– When a nearer codeword is found then switch the

search to its list.– Care must be taken to avoid doing a distance

computation the same codeword twice. Marking visited codewords solves this problem.

CSE 589 - Lecture 7 - Autumn 2001

51

Switching Lists (1)

query

unvisitedcodeword

r

2r

best codeword

visitedcodeword

CSE 589 - Lecture 7 - Autumn 2001

52

Switching Lists (2)

query

unvisitedcodeword

r

2r

best codeword

visitedcodeword

CSE 589 - Lecture 7 - Autumn 2001

53

Orchard Notes

• Very fast.– Appears O(log n) average time per search, but

there is no proof of this performance.

• Requires too much memory.– Requires O(n2) memory for the sorted lists.– Modification for large n. Just store the first m

closest to make an n x m array. If the search runs off the array then revert to linear search.

CSE 589 - Lecture 7 - Autumn 2001

54

k-d Tree

• Jon Bentley, 1975• Tree used to store spatial data.

– Nearest neighbor search.– Range queries.– Fast look-up

• k-d tree are guaranteed log2 n depth where n is the number of points in the set.– Traditionally, k-d trees store points in d-

dimensional space which are equivalent to vectors in d-dimensional space.

Page 10: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

10

CSE 589 - Lecture 7 - Autumn 2001

55

k-d Tree Construction

• If there is just one point, form a leaf with that point.

• Otherwise, divide the points in half by a line perpendicular to one of the axes.

• Recursively construct k-d trees for the two sets of points.

• Division strategies– divide points perpendicular to the axis with widest

spread.– divide in a round-robin fashion.

CSE 589 - Lecture 7 - Autumn 2001

56

x

y

k-d Tree Construction (1)

ab

f

c

gh

ed

i

divide perpendicular to the widest spread.

CSE 589 - Lecture 7 - Autumn 2001

57

y

k-d Tree Construction (2)

x

ab

c

gh

ed

i s1

s1

x

f

CSE 589 - Lecture 7 - Autumn 2001

58

y

k-d Tree Construction (3)

x

ab

c

gh

ed

i s1

s2y

s1

s2

x

f

CSE 589 - Lecture 7 - Autumn 2001

59

y

k-d Tree Construction (4)

x

ab

c

gh

ed

i s1

s2y

s3x

s1

s2

s3

x

f

CSE 589 - Lecture 7 - Autumn 2001

60

y

k-d Tree Construction (5)

x

ab

c

gh

ed

i s1

s2y

s3x

s1

s2

s3

a

x

f

Page 11: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

11

CSE 589 - Lecture 7 - Autumn 2001

61

y

k-d Tree Construction (6)

x

ab

c

gh

ed

i s1

s2y

s3x

s1

s2

s3

a b

x

f

CSE 589 - Lecture 7 - Autumn 2001

62

y

k-d Tree Construction (7)

x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s1

s2

s3

s4

a b

x

f

CSE 589 - Lecture 7 - Autumn 2001

63

y

k-d Tree Construction (8)

x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s5x

s1

s2

s3

s4

s5

a b

x

f

CSE 589 - Lecture 7 - Autumn 2001

64

y

k-d Tree Construction (9)

x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s5x

s1

s2

s3

s4

s5

a b

dx

f

CSE 589 - Lecture 7 - Autumn 2001

65

y

k-d Tree Construction (10)

x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s5x

s1

s2

s3

s4

s5

a b

d ex

f

CSE 589 - Lecture 7 - Autumn 2001

66

y

k-d Tree Construction (11)

x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s5x

s1

s2

s3

s4

s5

a b

d e

g

x

f

Page 12: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

12

CSE 589 - Lecture 7 - Autumn 2001

67

y

k-d Tree Construction (12)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s5x

s1

s2

s3

s4

s5

s6

a b

d e

g

x

f

CSE 589 - Lecture 7 - Autumn 2001

68

y

k-d Tree Construction (13)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s5x

s1

s2

s3

s4

s5

s6

s7a b

d e

g

x

f

CSE 589 - Lecture 7 - Autumn 2001

69

y

k-d Tree Construction (14)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s5x

s1

s2

s3

s4

s5

s6

s7a b

d e

g c

x

f

CSE 589 - Lecture 7 - Autumn 2001

70

y

k-d Tree Construction (15)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s5x

s1

s2

s3

s4

s5

s6

s7a b

d e

g c f

x

f

CSE 589 - Lecture 7 - Autumn 2001

71

y

k-d Tree Construction (16)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f

x

f

CSE 589 - Lecture 7 - Autumn 2001

72

y

k-d Tree Construction (17)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h

x

f

Page 13: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

13

CSE 589 - Lecture 7 - Autumn 2001

73

y

k-d Tree Construction (18)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

CSE 589 - Lecture 7 - Autumn 2001

74

k-d Tree Construction Complexity

• First sort the points in each dimension.– O(dn log n) time and dn storage.– These are stored in A[1..d,1..n]

• Finding the widest spread and equally divide into two subsets can be done in O(dn) time.

• Constructing the k-d tree can be done in O(dn log n) and dn storage

CSE 589 - Lecture 7 - Autumn 2001

75

k-d Tree Codebook Organizaton

CSE 589 - Lecture 7 - Autumn 2001

76

x

y

k-d Tree Splitting

ab

c

gh

ed

i a d g b e i c h fa c b d f e h g i

xy

sorted points in each dimension

f

• max spread is the max offx -ax an iy - ay.

• In the selected dimension themiddle point in the list splits the data.

• To build the sorted lists for the other dimensions scan the sortedlist adding each point to one of two sorted lists.

1 2 3 4 5 6 7 8 9

CSE 589 - Lecture 7 - Autumn 2001

77

Node Structure for k-d Trees

• A node has 5 fields– axis (splitting axis)– value (splitting value)– left (left subtree)– right (right subtree)– point (holds a point if left and right children are null)

CSE 589 - Lecture 7 - Autumn 2001

78

k-d Tree Nearest Neighbor Search

NNS(q, root, p, infinity)initial call

NNS(q: point, n: node, p: ref point w: ref distance)if n.left = n.right = null then {leaf case}

w’ := ||q - n.point||;if w’ < w then w := w’; p := n.point;

elseif w = infinity then

if q(n.axis) < n.value then NNS(q, n.left, p, w);if q(n.axis) + w > n.value then NNS(q, n.right, p, w);

else NNS(q, n.right, p, w);if q(n.axis) - w < n.value then NNS(q, n.left, p, w)

else {w is finite}if q(n.axis) - w < n.value then NNS(q, n.left, p, w)if q(n.axis) + w > n.value then NNS(q, n.right, p, w);

Page 14: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

14

CSE 589 - Lecture 7 - Autumn 2001

79

y

k-d Tree NNS (1)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point

CSE 589 - Lecture 7 - Autumn 2001

80

y

k-d Tree NNS (2)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point

CSE 589 - Lecture 7 - Autumn 2001

81

y

k-d Tree NNS (3)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point

CSE 589 - Lecture 7 - Autumn 2001

82

y

k-d Tree NNS (4)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point

w

CSE 589 - Lecture 7 - Autumn 2001

83

y

k-d Tree NNS (5)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

c f h i

x

f

query point

w

g

CSE 589 - Lecture 7 - Autumn 2001

84

y

k-d Tree NNS (6)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

c f h i

x

f

query point

w

g

Page 15: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

15

CSE 589 - Lecture 7 - Autumn 2001

85

y

k-d Tree NNS (7)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

c f h i

x

f

query point

w

g

CSE 589 - Lecture 7 - Autumn 2001

86

y

k-d Tree NNS (8)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

c f h i

x

f

query point

w

g

CSE 589 - Lecture 7 - Autumn 2001

87

e

y

k-d Tree NNS (9)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

g

CSE 589 - Lecture 7 - Autumn 2001

88

y

k-d Tree NNS (10)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

89

y

k-d Tree NNS (11)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

90

y

k-d Tree NNS (12)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

Page 16: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

16

CSE 589 - Lecture 7 - Autumn 2001

91

y

k-d Tree NNS (13)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

92

y

k-d Tree NNS (14)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

93

y

k-d Tree NNS (15)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

94

y

k-d Tree NNS (16)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

95

y

k-d Tree NNS (17)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

96

y

k-d Tree NNS (18)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

Page 17: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

17

CSE 589 - Lecture 7 - Autumn 2001

97

y

k-d Tree NNS (19)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

98

y

k-d Tree NNS (20)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

99

y

k-d Tree NNS (21)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

CSE 589 - Lecture 7 - Autumn 2001

100

Notes on k-d NNS

• Has been shown to run in O(log n) average time per search in a reasonable model. (Assume d a constant)

• For VQ it appears that O(log n) is correct.• Storage for the k-d tree is O(n).• Preprocessing time is O(n log n) assuming d

is a constant.

CSE 589 - Lecture 7 - Autumn 2001

101

Alternative is PCP-Tree

• Zatloukal, Johnson, Ladner (1999)• Organize a tree using principal components

partitioning.– Partition the data perpendicular to the line that

minimizes the sum of the distances of the points to the line. Eigenvector computation required.

• About is easy to construct as the k-d tree.• In NNS processing per node in the PCP tree

is more expensive than in the k-d tree, but fewer codewords are searched.

CSE 589 - Lecture 7 - Autumn 2001

102

Principal Component Partition

Page 18: Sequitur Algorithm CSE 589 Applied Algorithms Data Structure …courses.cs.washington.edu/courses/csep521/01au/lectures/lecture7.pdf · 2 CSE 589 -Lecture 7 -Autumn 2001 7 Notes on

18

CSE 589 - Lecture 7 - Autumn 2001

103

PCP Tree vs. k-d tree

PCP k-d

CSE 589 - Lecture 7 - Autumn 2001

104

Comparison in Time per Search

0

1

2

3

4

5

6

7

4D 16D 64D

dimension

norm

aliz

ed ti

me

per

sea

rch

Orchard

k-d tree

PCP tree

4,096 codewords

CSE 589 - Lecture 7 - Autumn 2001

105

NNS Summary

• Orchard – fastest– excessive memory– good guess codeword

needed

• k-d tree– good in low dimension– small storage– no guess codeword

• PCP– best in high dimension– fewer codewords

searched than k-d– small storage– no guess codeword

CSE 589 - Lecture 7 - Autumn 2001

106

Notes on VQ

• Works well in some applications.– Requires training

• Has some interesting algorithms.– Codebook design– Nearest neighbor search

• Variable length codes for VQ.– PTSVQ - pruned tree structured VQ (Chou,

Lookabaugh and Gray, 1989)– ECVQ - entropy constrained VQ (Chou,

Lookabaugh and Gray, 1989)