data depth jason burrowes-jones presentation outline background review what is known project...

24
Data Depth Jason Burrowes-Jones

Post on 20-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Data Depth

Jason Burrowes-Jones

Page 2: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Presentation Outline

• Background Review

• What is known

• Project Objectives

• Present Work and Results

• Future Goals

Page 3: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

Background Review

• Smallest half space containing X• Median: a point of max depth (not necessarily a data

point)

Page 4: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

• Why the interest in the median?– ROBUSTNESS i.e. median resists effects of polluted

data.

– Gives a sense of the data from the centre outwards.

• Importance of Data Depth– Eliminating outliers

– Location estimate

Page 5: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

• Inspired by 1-dimensional case

• d(x) = min. #ai in any half space containing X

• Proposed by John Tukey in 1974

Generalization of Depth in R2

1. Tukey “Depth”

Page 6: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

• Rotate line through X 180 degrees.

• Keep count of data points on both sides of the line.

• Depth is smallest count as line is rotated.(red line in this case)

Finding the Tukey Depth of X

• Cost: O(nlog(n))

Page 7: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

• Depth of Tukey Median, μ

• Cn : cost to compute Tukey Median

Page 8: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

• Proposed by Regina Liu in 1989• Simplicial depth is # of triangles that contain X• Median is a point of maximal depth• Always a point such that:

2. Simplicial Depth in R2

)O(n27

nd(x) 2

3

Page 9: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

• Lemma 1– Given points A,B,C

and reference point X, let A/ be any point on ray starting at X and going through A. Then:

BCAxABCx '

Finding the Simplicial Depth of X

X

B C

A

A/

A/

Page 10: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

• Lemma 2– Given the points A, B

and C on a unit circle centered at the origin. Let A* be antipodal to A, then Δ ABC contains the origin if and only if A* is on the short arc joining B and C.

A

B

C

A*

Page 11: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

Algorithm

• Sort points in radial order (θ1,…..,θn)

– Upper half: (θ1,…..,θt)

– Lower half: (θt+1,…..,θn)

for i=1 to t

• Pick the diameter D though θi

• Count triangles in upper half

θ1

θ2θ3

θt

Page 12: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

θ1

θ2θ3

θt

Algorithm

• Sort points in radial order (θ1,…..,θn)

– Upper half: (θ1,…..,θt)

– Lower half: (θt+1,…..,θn)

for i=1 to t

• Pick the diameter D though θi

• Count triangles in upper half

Page 13: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

x

θ1

θ2θ3

θt

Algorithm

• Sort points in radial order (θ1,…..,θn)

– Upper half: (θ1,…..,θt)

– Lower half: (θt+1,…..,θn)

for i=1 to t

• Pick the diameter D though θi

• Count triangles in upper half

Page 14: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Algorithm

• Sort points in radial order (θ1,…..,θn)– Upper half: (θ1,…..,θt)– Lower half: (θt+1,…..,θn)

for i=1 to t• Pick the diameter D though θi

• Count triangles in upper halfx

θ1

θ2θ3

θt

for i=t+1 to n

• Do the same for lower half

• Depth = sum / 3

• Computing d(x) can be done O(n)

Page 15: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

• Computation of simplicial median:

)O(n24

n)d()O(n

27

n 23

23

• Depth of simplicial median:

Page 16: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

What is known?

• Computing a centre point (Jadav, Mukhopadyay).– Tukey Depth– O(n)

• Computing a high tukey depth point in the plane (S. Langerman ,W. Steiger).– Pruning technique– O(n(log n)2)

Page 17: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

What is known

Computing a High Tukey Depth Point in the Plane

S.Peveryford(P)d(Q)if

SsetaforpointdeepaisRQpointA 2

• Definition

• Fact

d(P).d(Q)h,Ppointanythen

Q,pointtheforhalfspacewitnessaishIf

Page 18: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

What is known

• What is a witness half space?– A witness halfspace

defines the depth of a point P.

– It is the halfspace with the fewest number of data points.

– d(P)=k

hk points

n-k points

P

Page 19: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

What is known

• Algorithm Deep(S,A*) A A*, P* any point of A*

pointcentere.g.3

1c0

Acd(Q)thatsuchAQ

While A is not empty

1. Find a point

2. Compute the depth of Q and let h be the witness halfspace for Q in S.

3. If the d(Q) is greater than d(P*) then: P*=Q, end if4. Prune

)( hAAA

Page 20: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Project Objectives

• Efficient Algorithm– Simplicial Median

Page 21: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Present Work and Results

• Find a point P of high simplicial depth.

27

nd(P)

27

nμ)ald(simplici

pointdataanyisXwhered(X)d(P)

)O(n27

nd(P)

3

3

23

Page 22: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Proposal

• Take random sample of size m

• Find simplicial median of this sample.– point would have high depth in the source data.

• Use pruning technique to determine if there is a data point of higher depth.

• n(log(n))c

• Working on details

Page 23: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Future Goals

• Adapt the pruning technique to the simplicial median

• Write a paper– Present at a conference

Page 24: Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals

Background

Acknowledgements

• DIMACS REU

• Sponsors

• William Steiger

• Evil computer scientists and mathematicians