data depth jason burrowes-jones presentation outline background review what is known project...
Post on 20-Dec-2015
225 views
TRANSCRIPT
Data Depth
Jason Burrowes-Jones
Presentation Outline
• Background Review
• What is known
• Project Objectives
• Present Work and Results
• Future Goals
Background
Background Review
• Smallest half space containing X• Median: a point of max depth (not necessarily a data
point)
Background
• Why the interest in the median?– ROBUSTNESS i.e. median resists effects of polluted
data.
– Gives a sense of the data from the centre outwards.
• Importance of Data Depth– Eliminating outliers
– Location estimate
Background
• Inspired by 1-dimensional case
• d(x) = min. #ai in any half space containing X
• Proposed by John Tukey in 1974
Generalization of Depth in R2
1. Tukey “Depth”
Background
• Rotate line through X 180 degrees.
• Keep count of data points on both sides of the line.
• Depth is smallest count as line is rotated.(red line in this case)
Finding the Tukey Depth of X
• Cost: O(nlog(n))
Background
• Depth of Tukey Median, μ
• Cn : cost to compute Tukey Median
Background
• Proposed by Regina Liu in 1989• Simplicial depth is # of triangles that contain X• Median is a point of maximal depth• Always a point such that:
2. Simplicial Depth in R2
)O(n27
nd(x) 2
3
Background
• Lemma 1– Given points A,B,C
and reference point X, let A/ be any point on ray starting at X and going through A. Then:
BCAxABCx '
Finding the Simplicial Depth of X
X
B C
A
A/
A/
Background
• Lemma 2– Given the points A, B
and C on a unit circle centered at the origin. Let A* be antipodal to A, then Δ ABC contains the origin if and only if A* is on the short arc joining B and C.
A
B
C
A*
Background
Algorithm
• Sort points in radial order (θ1,…..,θn)
– Upper half: (θ1,…..,θt)
– Lower half: (θt+1,…..,θn)
for i=1 to t
• Pick the diameter D though θi
• Count triangles in upper half
θ1
θ2θ3
θt
Background
θ1
θ2θ3
θt
Algorithm
• Sort points in radial order (θ1,…..,θn)
– Upper half: (θ1,…..,θt)
– Lower half: (θt+1,…..,θn)
for i=1 to t
• Pick the diameter D though θi
• Count triangles in upper half
Background
x
θ1
θ2θ3
θt
Algorithm
• Sort points in radial order (θ1,…..,θn)
– Upper half: (θ1,…..,θt)
– Lower half: (θt+1,…..,θn)
for i=1 to t
• Pick the diameter D though θi
• Count triangles in upper half
Algorithm
• Sort points in radial order (θ1,…..,θn)– Upper half: (θ1,…..,θt)– Lower half: (θt+1,…..,θn)
for i=1 to t• Pick the diameter D though θi
• Count triangles in upper halfx
θ1
θ2θ3
θt
for i=t+1 to n
• Do the same for lower half
• Depth = sum / 3
• Computing d(x) can be done O(n)
Background
• Computation of simplicial median:
)O(n24
n)d()O(n
27
n 23
23
• Depth of simplicial median:
Background
What is known?
• Computing a centre point (Jadav, Mukhopadyay).– Tukey Depth– O(n)
• Computing a high tukey depth point in the plane (S. Langerman ,W. Steiger).– Pruning technique– O(n(log n)2)
What is known
Computing a High Tukey Depth Point in the Plane
S.Peveryford(P)d(Q)if
SsetaforpointdeepaisRQpointA 2
• Definition
• Fact
d(P).d(Q)h,Ppointanythen
Q,pointtheforhalfspacewitnessaishIf
What is known
• What is a witness half space?– A witness halfspace
defines the depth of a point P.
– It is the halfspace with the fewest number of data points.
– d(P)=k
hk points
n-k points
P
What is known
• Algorithm Deep(S,A*) A A*, P* any point of A*
pointcentere.g.3
1c0
Acd(Q)thatsuchAQ
While A is not empty
1. Find a point
2. Compute the depth of Q and let h be the witness halfspace for Q in S.
3. If the d(Q) is greater than d(P*) then: P*=Q, end if4. Prune
)( hAAA
Project Objectives
• Efficient Algorithm– Simplicial Median
Present Work and Results
• Find a point P of high simplicial depth.
27
nd(P)
27
nμ)ald(simplici
pointdataanyisXwhered(X)d(P)
)O(n27
nd(P)
3
3
23
Proposal
• Take random sample of size m
• Find simplicial median of this sample.– point would have high depth in the source data.
• Use pruning technique to determine if there is a data point of higher depth.
• n(log(n))c
• Working on details
Future Goals
• Adapt the pruning technique to the simplicial median
• Write a paper– Present at a conference
Background
Acknowledgements
• DIMACS REU
• Sponsors
• William Steiger
• Evil computer scientists and mathematicians