agnostically learning decision trees parikshit gopalan msr-silicon valley, iitb00. adam tauman kalai...

48
Agnostically Learning Agnostically Learning Decision Trees Decision Trees Parikshit Gopalan Parikshit Gopalan MSR-Silicon MSR-Silicon Valley Valley , IITB’00. , IITB’00. Adam Tauman Kalai Adam Tauman Kalai MSR-New England MSR-New England Adam R. Klivans Adam R. Klivans UT Austin UT Austin 0 1 0 0 1 1 1 0 X 1 X 2 X 3 0 0 1 1 0 1 0 0 1 1

Upload: alexandra-cantrell

Post on 26-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Agnostically Learning Agnostically Learning Decision TreesDecision Trees

Parikshit GopalanParikshit Gopalan MSR-Silicon ValleyMSR-Silicon Valley, , IITB’00.IITB’00.

Adam Tauman Kalai Adam Tauman Kalai MSR-New EnglandMSR-New EnglandAdam R. KlivansAdam R. Klivans UT AustinUT Austin

0 1

0

0 1

1

1 0 X1

X2 X3

0 01 1

0

1 00

1

1

Page 2: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Computational LearningComputational Learning

Page 3: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Computational LearningComputational Learning

Page 4: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Computational LearningComputational Learning

Learning: Predict f from examples.

x, f(x)

f:{0,1}n ! {0,1}

Page 5: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Valiant’s ModelValiant’s Model

x, f(x)

f:{0,1}n ! {0,1}

Assumption: f comes from a nice concept class.

Halfspaces:

+-

++

+

+

+ +

+ -

-

-

--

--

-

--

Page 6: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Valiant’s ModelValiant’s Model

x, f(x)

f:{0,1}n ! {0,1}

Assumption: f comes from a nice concept class.

Decision Trees:

X1

X2 X3

0 01 1

0

1 00

1

1

Page 7: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The Agnostic Model The Agnostic Model [Kearns-Schapire-[Kearns-Schapire-

Sellie’94]Sellie’94]

x, f(x)

f:{0,1}n ! {0,1}

No assumptions about f.

Learner should do as well as best decision tree.

Decision Trees:

X2 X3

0 01 1

0

1 00

1

1

X1

Page 8: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The Agnostic Model The Agnostic Model [Kearns-Schapire-[Kearns-Schapire-

Sellie’94]Sellie’94]

x, f(x)

No assumptions about f.

Learner should do as well as best decision tree.

Decision Trees:

X2 X3

0 01 1

0

1 00

1

1

X1

Page 9: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Agnostic Model = Noisy Agnostic Model = Noisy LearningLearning

f:{0,1}n ! {0,1}

+ =

Concept: Message Truth table: Encoding Function f: Received word.

Coding: Recover the Message.

Learning: Predict f.

X2 X3

0 01 1

0

1 00

1

1

X1

Page 10: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Uniform Distribution Uniform Distribution Learning for Decision Learning for Decision

TreesTreesNoiseless Setting:

– No queries: nlog n [Ehrenfeucht-Haussler’89].– With queries: poly(n). [Kushilevitz-Mansour’91]

Reconstruction for sparse real polynomials in the l1 norm.

Agnostic Setting:

Polynomial time, uses queries. [G.-Kalai-Klivans’08]

Page 11: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The Fourier Transform The Fourier Transform MethodMethod

Powerful tool for uniform distribution Powerful tool for uniform distribution learning.learning.

Introduced by Introduced by Linial-Mansour-NisanLinial-Mansour-Nisan..– Small depth circuitsSmall depth circuits [Linial-Mansour-Nisan’89][Linial-Mansour-Nisan’89]– DNFsDNFs [Jackson’95][Jackson’95]– Decision treesDecision trees [Kushilevitz-Mansour’94, [Kushilevitz-Mansour’94,

O’Donnell-Servedio’06, G.-Kalai-Klivans’08]O’Donnell-Servedio’06, G.-Kalai-Klivans’08]– Halfspaces, IntersectionsHalfspaces, Intersections [Klivans-O’Donnell-[Klivans-O’Donnell-

Servedio’03, Kalai-Klivans-Mansour-Servedio’05]Servedio’03, Kalai-Klivans-Mansour-Servedio’05]– JuntasJuntas [Mossel-O’Donnell-Servedio’03][Mossel-O’Donnell-Servedio’03]– ParitiesParities [Feldman-G.-Khot-Ponnsuswami’06] [Feldman-G.-Khot-Ponnsuswami’06]

Page 12: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The Fourier PolynomialThe Fourier Polynomial Let Let f:{-1,1}f:{-1,1}nn !! {-1,1} {-1,1}. . Write Write ff as a polynomial. as a polynomial.

– AND:AND: ½ + ½X½ + ½X11 + ½X + ½X22 - ½X - ½X11XX22

– Parity:Parity: XX11XX22

Parity of Parity of ½½ [n] [n]: : (x) = (x) = i i 22 XXii

Write Write f(x) = f(x) = c( c())(x)(x)

– c(c()) =1. =1.Standard Basis

Function f

Parities

Page 13: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The Fourier PolynomialThe Fourier Polynomial

c(c()): :

Weight of Weight of ..

Let Let f:{-1,1}f:{-1,1}nn !! {-1,1} {-1,1}. . Write Write ff as a polynomial. as a polynomial.

– AND:AND: ½ + ½X½ + ½X11 + ½X + ½X22 - ½X - ½X11XX22

– Parity:Parity: XX11XX22

Parity of Parity of ½½ [n] [n]: : (x) = (x) = i i 22 XXii

Write Write f(x) = f(x) = c( c())(x)(x)

– c(c()) =1. =1.

Page 14: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Low Degree FunctionsLow Degree Functions

Sparse Functions: Sparse Functions: Most of the Most of the weight lies on small subsets.weight lies on small subsets. Halfspaces, Small-depth Halfspaces, Small-depth circuits.circuits. Low-degree algorithm. Low-degree algorithm. [Linial-Mansour-Nisan][Linial-Mansour-Nisan] Finds the low-degree Finds the low-degree Fourier coefficients.Fourier coefficients.

Least Squares Regression: Find low-degree P minimizing Ex[ |P(x) – f(x)|2 ].

Page 15: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Sparse FunctionsSparse FunctionsSparse Functions: Sparse Functions: Most of the Most of the weight lies on a few subsets.weight lies on a few subsets.

Decision trees.Decision trees.tt leaves leaves )) O(t)O(t) subsets subsets

Sparse Algorithm. Sparse Algorithm. [Kushilevitz-Mansour’91] [Kushilevitz-Mansour’91]

Sparse l2 Regression:

Find t-sparse P minimizing Ex[ |P(x) – f(x)|2 ].

Page 16: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Sparse Sparse l2 Regression RegressionSparse Functions: Sparse Functions: Most of the Most of the weight lies on a few subsets.weight lies on a few subsets.

Decision trees.Decision trees.tt leaves leaves )) O(t)O(t) subsets subsets

Sparse Algorithm. Sparse Algorithm. [Kushilevitz-Mansour’91][Kushilevitz-Mansour’91]

Sparse l2 Regression:

Find t-sparse P minimizing Ex[ |P(x) – f(x)|2 ].Finding large coefficients: Hadamard decoding.[Kushilevitz-Mansour’91, Goldreich-Levin’89]

Page 17: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Agnostic Learning via Agnostic Learning via l2 Regression?Regression?

-1

+1

f:{-1,1}n ! {-1,1}

Page 18: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Agnostic Learning via Agnostic Learning via l2 Regression?Regression?

-1

+1

X2 X3

0 01 1

0

1 00

1

1

X1

Page 19: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Agnostic Learning via Agnostic Learning via l2 Regression?Regression?

-1

+1

l2 Regression:

Loss |P(x) –f(x)|2

Pay 1 for indecision.

Pay 4 for a mistake.

l1 Regression: [KKMS’05]

Loss |P(x) –f(x)|

Pay 1 for indecision.

Pay 2 for a mistake.

Target f

Best Tree

Page 20: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

-1

+1

l2 Regression:

Loss |P(x) –f(x)|2

Pay 1 for indecision.

Pay 4 for a mistake.

l1 Regression: [KKMS’05]

Loss |P(x) –f(x)|

Pay 1 for indecision.

Pay 2 for a mistake.

Agnostic Learning via Agnostic Learning via l1 Regression?Regression?

Page 21: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

-1

+1

Agnostic Learning via Agnostic Learning via l1 RegressionRegression

Thm [KKMS’05]: l1 Regression always gives a good predictor.

l1 regression for low degree polynomials via Linear Programming.

Target f

Best Tree

Page 22: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Sparse l1 Regression: Find a t-sparse polynomial P minimizing Ex[ |P(x) – f(x)| ].

Why is this Harder:

l2 is basis independent, l1 is not.

Don’t know the support of P.

Agnostically Learning Agnostically Learning Decision TreesDecision Trees

[G.-Kalai-Klivans][G.-Kalai-Klivans]: : Polynomial time algorithm Polynomial time algorithm for for Sparse Sparse l1 RegressionRegression. .

Page 23: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The Gradient-Projection The Gradient-Projection MethodMethod

Variables: c()’s.

Constraint: |c() | · t

Minimize: Ex|P(x) – f(x)|

P(x) = c() (x)

f(x)

Q(x) = d() (x)

L1(P,Q) = |c() – d()|

L2(P,Q) = [ (c() –d())2]1/2

Page 24: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The Gradient-Projection The Gradient-Projection MethodMethod

Variables: c()’s.

Constraint: |c() | · t

Minimize: Ex|P(x) – f(x)|

Gradient

Page 25: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Variables: c()’s.

Constraint: |c() | · t

Minimize: Ex|P(x) – f(x)|

Gradient

Projection

The Gradient-Projection The Gradient-Projection MethodMethod

Page 26: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The GradientThe Gradient

g(x) = sgn[f(x) - P(x)]g(x) = sgn[f(x) - P(x)]

P(x) := P(x) + P(x) := P(x) + g(x). g(x).

Increase P(x) if low.Decrease P(x) if

high.

-1

+1

f(x)

P(x)

Page 27: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The Gradient-Projection The Gradient-Projection MethodMethod

Variables: c()’s.

Constraint: |c() | · t

Minimize: Ex|P(x) – f(x)|

Gradient

Page 28: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Variables: c()’s.

Constraint: |c() | · t

Minimize: Ex|P(x) – f(x)|

Gradient

Projection

The Gradient-Projection The Gradient-Projection MethodMethod

Page 29: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

0

0.05

0.1

0.15

0.2

0.25

0.3

Fourier Spectrum of P

P

Projection onto the LProjection onto the L11 ball ball

Currently: |c()| > t

Want: |c()| · t.

Page 30: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

0

0.05

0.1

0.15

0.2

0.25

0.3

Fourier Spectrum of P

P

Projection onto the LProjection onto the L11 ball ball

Currently: |c()| > t

Want: |c()| · t.

Page 31: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

0

0.05

0.1

0.15

0.2

0.25

0.3

Fourier Spectrum of P

P

Projection onto the LProjection onto the L11 ball ball

Below cutoff: Set to 0.

Above cutoff: Subtract.

Page 32: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Projection onto the LProjection onto the L11 ball ball

0

0.05

0.1

0.15

0.2

0.25

0.3

Fourier Spectrum of Proj(P)

P

Proj(P)

Below cutoff: Set to 0.

Above cutoff: Subtract.

Page 33: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Analysis of Gradient-Projection Analysis of Gradient-Projection [Zinkevich’03][Zinkevich’03]

Progress measure: Squared L2 distance from optimum P*.

Key Equation:

|Pt – P*|2 - |Pt+1 – P*|2 ¸ 2 (L(Pt) – L(P*))

Within of optimal in 1/2 iterations.

Good L2 approximation to Pt suffices.

– 2

How suboptimal current soln is.

Progress made in this step.

Page 34: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

-1

+1

f(x)

P(x)

0

0.05

0.1

0.15

0.2

0.25

0.3

Fourier Spectrum of P

P

GradientGradient

ProjectionProjection

g(x) = sgn[f(x) - P(x)].g(x) = sgn[f(x) - P(x)].

Page 35: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

The GradientThe Gradient

g(x) = sgn[f(x) - P(x)].g(x) = sgn[f(x) - P(x)]. -1

+1

f(x)

P(x)

Compute sparse approximation g’ = KM(g).

Is g a good L2 approximation to g’?

No. Initially g = f.

L2(g,g’) can be as large 1.

Page 36: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Variables: c()’s.

Constraint: |c() | · t

Minimize: Ex|P(x) – f(x)|

Approximate Gradient

Sparse Sparse l1 Regression Regression

Page 37: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Variables: c()’s.

Constraint: |c() | · t

Minimize: Ex|P(x) – f(x)|

Projection Compensat

es

Sparse Sparse l1 Regression Regression

Page 38: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

KM as KM as l2 Approximation Approximation

The KM Algorithm:

Input: g:{-1,1}n ! {-1,1}, and t.

Output: A t-sparse polynomial g’ minimizing

Ex [|g(x) – g’(x)|2]

Run Time: poly(n,t).

Page 39: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

KM as LKM as L11 Approximation Approximation

The KM Algorithm:

Input: A Boolean function g = c()(x).

A error bound

Output: Approximation g’ = c’()(x) s.t

|c() – c’()| · for all ½ [n].

Run Time: poly(n,1/)

Page 40: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

0

0.05

0.1

0.15

0.2

0.25

0.3

g

g' = KM(g)

KM as LKM as L11 Approximation Approximation

1)Identify coefficients larger than .

2) Estimate via sampling, set rest to 0.

Only 1/2

Page 41: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

0

0.05

0.1

0.15

0.2

0.25

0.3

g

g' = KM(g)

KM as LKM as L11 Approximation Approximation

1)Identify coefficients larger than .

2) Estimate via sampling, set rest to 0.

Page 42: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

0

0.05

0.1

0.15

0.2

0.25

0.3

P + g

P + g'

Projection Preserves LProjection Preserves L11 DistanceDistance

L1 distance at most 2 after projection.

Both lines stop within of each other.

Page 43: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

0

0.05

0.1

0.15

0.2

0.25

0.3

P + g

P + g'

Projection Preserves LProjection Preserves L11 DistanceDistance

L1 distance at most 2 after projection.

Both lines stop within of each other.

Else, Blue dominates Red.

Page 44: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

0

0.05

0.1

0.15

0.2

0.25

0.3

P + g

P + g'

Projection Preserves LProjection Preserves L11 DistanceDistance

L1 distance at most 2 after projection.

Projecting onto the L1 ball does not increase L1 distance.

Page 45: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Sparse Sparse l1 Regression Regression

Variables: c()’s.

Constraint: |c() | · t

Minimize: Ex|P(x) – f(x)|

• L1(P, P’) · 2

• L1(P, P’) · 2t

• L2(P, P’)2 · 4t

PP’

Can take = 1/t2.

Page 46: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Sparse L1 Regression: Find a sparse polynomial P minimizing Ex[ |P(x) – f(x)| ].

[G.-Kalai-Klivans’08]:[G.-Kalai-Klivans’08]: Can get within Can get within of optimum in of optimum in poly(t,1/poly(t,1/)) iterations.iterations. Algorithm for Algorithm for SparseSparse ll11 RegressionRegression. .

First polynomial time algorithm for First polynomial time algorithm for Agnostically Learning Sparse Polynomials.Agnostically Learning Sparse Polynomials.

Agnostically Learning Agnostically Learning Decision TreesDecision Trees

Page 47: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Function f: D ! [-1,1], Orthonormal Basis B.

Sparse l2 Regression: Find a t-sparse polynomial P minimizing Ex[ |P(x) – f(x)|2 ].

Sparse l1 Regression: Find a t-sparse polynomial P minimizing Ex[ |P(x) – f(x)| ].

[G.-Kalai-Klivans’08]:[G.-Kalai-Klivans’08]: Given solution to Given solution to l2

Regression, can solve , can solve l1 Regression. Regression.

l1 Regression from Regression from l2

RegressionRegression

Page 48: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1

Problem: Can we agnostically learn DNFs in polynomial time? (uniform dist. with queries)

Noiseless Setting: Jackson’s Harmonic Sieve.

Implies weak learner for depth-3 circuits.

Beyond current Fourier techniques.

Agnostically Learning Agnostically Learning DNFs?DNFs?