joint works with to memory-sample (cmu), ran raz avishay

45
Extractor-Based Approach to Memory-Sample Tradeoffs Joint Works with Pravesh Kothari (CMU), Ran Raz (Princeton) and Avishay Tal (UC Berkeley) Sumegha Garg (Harvard)

Upload: others

Post on 10-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Extractor-Based Approach to Memory-Sample Tradeoffs

Joint Works with Pravesh Kothari (CMU), Ran Raz (Princeton) and Avishay Tal (UC Berkeley)

Sumegha Garg

(Harvard)

Page 2: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

โ— Increasing scale of learning problems โ— Many learning algorithms try to learn a

concept/hypothesis by modeling it as a neural network. The algorithm keeps in the memory some neural network and updates the weights when new samples arrive. Memory used is the size of the network (low if network size is small).

Learning under Memory Constraints?

Page 3: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Brief Description for the Learning ModelA learner tries to learn ๐‘“: ๐ด โ†’ {0,1}, ๐‘“ โˆˆ ๐ป (hypothesis class)

from samples of the form (๐‘Ž!, ๐‘“(๐‘Ž!)), ๐‘Ž", ๐‘“(๐‘Ž") , โ€ฆ

that arrive one by one

Page 4: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

[Shamirโ€™14], [SVWโ€™15]Can one prove unconditional strong lower bounds on the number of samples needed for learning under memory constraints?

(when samples are viewed one by one)

Page 5: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Outline of the Talkโ€ข The first example of such memory-sample tradeoff:

Parity learning [Razโ€™16]

โ€ข Spark in the field: Subsequent results

โ€ข Extractor-based approach to lower bounds [G-Raz-Talโ€™18]

โ€ข Shallow dive into the proofs

โ€ข Implications for refutation of CSPs [G-Kothari-Razโ€™20]

Page 6: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Example: Parity Learning

Page 7: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Parity Learning๐‘“ โˆˆ# 0,1 $ is unknown

A learner tries to learn ๐‘“ = (๐‘“!, ๐‘“", โ€ฆ , ๐‘“$) from

(๐‘Ž!, ๐‘!), ๐‘Ž", ๐‘" , โ€ฆ , (๐‘Ž%, ๐‘%), where โˆ€ ๐‘ก,

๐‘Ž& โˆˆ# 0,1 $ and ๐‘& =< ๐‘Ž&, ๐‘“ > = โˆ‘' ๐‘Ž&' โ‹… ๐‘“' ๐‘š๐‘œ๐‘‘ 2 (inner product mod 2)

Page 8: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Parity LearningA learner tries to learn ๐‘“ = (๐‘“!, ๐‘“", โ€ฆ , ๐‘“$) from

(๐‘Ž!, ๐‘!), ๐‘Ž", ๐‘" , โ€ฆ , (๐‘Ž%, ๐‘%), where โˆ€ ๐‘ก,

๐‘Ž& โˆˆ# 0,1 $ and ๐‘& = โˆ‘' ๐‘Ž&' โ‹… ๐‘“' ๐‘š๐‘œ๐‘‘ 2

In other words, learner gets random binary linear equations in ๐‘“!, ๐‘“", . . , ๐‘“$, one by one, and need to solve them

Page 9: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“! + ๐‘“( + ๐‘“) = 1

Page 10: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“" + ๐‘“) + ๐‘“* = 0

Page 11: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“! + ๐‘“" + ๐‘“) = 1

Page 12: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“" + ๐‘“( = 0

Page 13: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“! + ๐‘“" + ๐‘“) + ๐‘“* = 0

Page 14: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“! + ๐‘“( = 1

Page 15: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“" + ๐‘“( + ๐‘“) + ๐‘“* = 1

Page 16: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“! + ๐‘“" + ๐‘“* = 0

Page 17: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Letโ€™s Try Learning (n=5)๐‘“! + ๐‘“" + ๐‘“* = 0

๐‘“" + ๐‘“( + ๐‘“) + ๐‘“* = 1

๐‘“! + ๐‘“( = 1๐‘“! + ๐‘“" + ๐‘“) + ๐‘“* = 0 ๐‘“" + ๐‘“( = 0

๐‘“! + ๐‘“" + ๐‘“) = 1๐‘“" + ๐‘“) + ๐‘“* = 0

๐‘“! + ๐‘“( + ๐‘“) = 1

๐‘“ = (0,1,1,0,1)

Page 18: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Parity Learnersโ€ข Solve independent linear equations (Gaussian

Elimination)

๐‘ถ(๐’) samples and ๐‘ถ(๐’๐Ÿ) memory

โ€ข Try all possibilities of ๐‘“

๐‘ถ(๐’) memory but exponential number of samples

Page 19: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Razโ€™s Breakthrough โ€™16Any algorithm for parity learning of size ๐‘› requires either memory of ๐‘›"/25 bits or exponential number of samples to learn

Memory-Sample Tradeoff

Page 20: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Motivations from Other Fields Bounded Storage Cryptography: [Razโ€™16, GZโ€™19, VVโ€™16, KRTโ€™16, TTโ€™18, GZโ€™19, JTโ€™19, DTZโ€™20, GZโ€™21] Keyโ€™s length: ๐‘› Encryption/Decryption time: ๐‘›

Unconditional security, if attackerโ€™s memory size < ๐’๐Ÿ

๐Ÿ๐Ÿ“

Complexity Theory: Time-Space Lower Bounds have been studied in many models [BJSโ€™98, Ajtโ€™99, BSSVโ€™00, Forโ€™97, FLvMVโ€™05, Wilโ€™06,โ€ฆ]

Page 21: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Subsequent Results

Page 22: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Subsequent Generalizationsโ€ข [KRTโ€˜17]: Generalization to learning sparse paritiesโ€ข [Razโ€™17, MMโ€™17, MTโ€™17, MMโ€™18, DKSโ€™19, GKLRโ€™21]:

Generalization to larger class of learning problems

Page 23: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

[Garg-Raz-Talโ€™18]Extractor-based approach to prove memory-sample tradeoffs for a large class of learning problemsImplied all the previous results + new lower boundsClosely follows the proof technique of [Razโ€™17]

[Beame-Oveis-Gharan-Yangโ€™18] independently proved related lower bounds

Page 24: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Subsequent Generalizations[Sharan-Sidford-Valiantโ€™19]: Any algorithm performing linear regression over a stream of ๐‘‘-dimensional examples, uses a quadratic amount of memory or exhibits a slower rate of convergence than can be achieved without memory constraintFirst-order methods for learning may have provably slower rate of convergence

Page 25: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

More Related Workโ€ข [GRTโ€™19] Learning under multiple passes over the

streamโ€ข [DSโ€™18, AMNโ€™18, DGKRโ€™19, GKRโ€™20] Distinguishing, testing

under memory or communication constraintsโ€ข [MTโ€™19, GLMโ€™20] Towards characterization of memory-

bounded learningโ€ข [GRZโ€™20] Comments on quantum memory-bounded

learning

Page 26: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Extractor-Based Approach to Lower Bounds [G-Raz-Talโ€™18]

Page 27: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Learning Problem as a Matrix๐ด, ๐น : finite sets, ๐‘€:๐ดร—๐น โ†’ {โˆ’1,1} : a matrix

๐น : concept class = 0,1 $

๐ด : possible samples = 0,1 $.

Page 28: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Learning Problem as a Matrix๐‘€:๐ดร—๐น โ†’ {โˆ’1,1} : a matrix

๐‘“ โˆˆ# ๐น is unknown. A learner tries to learn ๐‘“ from a stream(๐‘Ž!, ๐‘!), ๐‘Ž", ๐‘" , โ€ฆ , ๐‘Ž%, ๐‘% , where โˆ€๐‘ก : ๐‘Ž& โˆˆ# ๐ด and ๐‘& = ๐‘€(๐‘Ž&, ๐‘“)

๐‘“

๐‘€(๐‘Ž!, ๐‘“)๐‘Ž!

๐‘€(๐‘Ž", ๐‘“)๐‘Ž"

Page 29: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Main TheoremAssume that any submatrix of ๐‘€ of at least 2/0|๐ด| rows and at least 2/โ„“|๐น| columns, has a bias of at most 2/2. Then,Any learning algorithm requires either ๐œด(๐’Œ๐’)memory bits or ๐Ÿ๐œด(๐’“)samples

2#$|๐ด|

2#%|๐น|

Page 30: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

ApplicationsLow-degree polynomials: A learner tries to learn an ๐‘›-variate multilinear polynomial ๐‘ of degree at most ๐‘‘ over F", from random evaluations of ๐‘ over F"$

๐›€(๐’๐’…8๐Ÿ) memory or ๐Ÿ๐›€(๐’) samples

Page 31: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

ApplicationsLearning from sparse equations: A learner tries to learn ๐‘“= ๐‘“!, โ€ฆ , ๐‘“$ โˆˆ {0,1}$ , from random sparse linear equations, of sparsity ๐‘™, over F" (each equation depends on ๐‘™ variables)

๐›€(๐’ โ‹… ๐’) memory or ๐Ÿ๐›€(๐’) samplesWe will use this result for

proving sample lower bounds

for refuting CSPs under

memory constraints

Page 32: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Proof Overview

Page 33: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Branching Program (length ๐‘š, width ๐‘‘)

Each layer represents a time step. Each vertex represents a memory state of the learner (๐‘‘ = 2!"!#$%). Each non-leaf vertex has 2&'() outgoing edges, one for each ๐‘Ž, ๐‘ โˆˆ 0,1 &!ร— โˆ’1,1

(๐‘Ž!,๐‘!)

(๐‘Ž, ๐‘)

๐‘š

๐‘‘(๐‘Ž% , ๐‘

% )(๐‘Ž", ๐‘")

Page 34: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Branching Program (length ๐‘š, width ๐‘‘)

The samples ๐‘Ž), ๐‘) , . . , ๐‘Ž!, ๐‘! define a computation-path. Each vertex ๐‘ฃ in the last layer is labeled by I๐‘“* โˆˆ 0,1 &. The output is the label I๐‘“* of the vertex reached by the path

(๐‘Ž!,๐‘!)

(๐‘Ž, ๐‘)

๐‘š

๐‘‘(๐‘Ž% , ๐‘

% )(๐‘Ž", ๐‘")

Page 35: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Proof OverviewPK|M = distribution of ๐‘“ conditioned on the event that the computation-path reaches ๐‘ฃ

Significant vertices: ๐‘ฃ s.t. ||PK|M||" โ‰ฅ 2N โ‹… 2/$

๐‘ƒ๐‘Ÿ ๐‘ฃ = probability that the path reaches ๐‘ฃ

We prove that if ๐‘ฃ is significant, ๐‘ƒ๐‘Ÿ ๐‘ฃ โ‰ค 2/O(0โ‹…N)

Hence, if there are less than 2Q(0โ‹…N) vertices, with high probability, we reach a non-significant vertex

Page 36: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Proof OverviewProgress Function [Razโ€™17]: For layer ๐ฟ',

๐‘' = XMโˆˆR"

๐‘ƒ๐‘Ÿ ๐‘ฃ โ‹… PK|M , PK|S 0

โ— ๐‘! = 2"#$โ‹…&

โ— ๐‘' is very slowly growing: ๐‘! โ‰ˆ ๐‘(โ— If ๐‘  โˆˆ ๐ฟ(, then ๐‘( โ‰ฅ ๐‘ƒ๐‘Ÿ ๐‘  โ‹… 2#)โ‹…& โ‹… 2"#$โ‹…&

Hence: If ๐‘  is significant, ๐‘ƒ๐‘Ÿ ๐‘  โ‰ค 2/O(0โ‹…N)

Page 37: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Hardness of Refuting CSPs [G-Kothari-Razโ€™20]

Page 38: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Refuting CSPs under Streaming Modelโ€ข Fix a predicate ๐‘ƒ: 0,1 0 โ†’ {0,1}

โ€ข With bounded memory, distinguish between CSPs (with predicate ๐‘ƒ) where the constraints are randomly generated vs ones where the constraints are generated using a satisfying assignment

โ€ข Constraints arrive one by one in a stream

Page 39: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Refuting CSPs under Streaming Modelโ€ข Fix a predicate ๐‘ƒ: 0,1 0 โ†’ {0,1}

โ€ข With bounded memory, distinguish between CSPs (with predicate ๐‘ƒ) where the constraints are randomly generated vs ones where the constraints are generated using a satisfying assignment

โ€ข Constraints arrive one by one in a stream Refutation is at least as hard as distinguishing between Random CSPs and satisfiable ones

Page 40: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Refuting CSPs under Streaming Modelโ€ข ๐‘ฅ is chosen uniformly from 0,1 $

โ€ข At each step ๐‘–, ๐‘†' โŠ† [๐‘›] is a randomly chosen (ordered) subset of size ๐‘˜.

โ€ข Distinguisher sees a stream of either ๐‘†', ๐‘ƒ ๐‘ฅT" or ๐‘†', ๐‘' where ๐‘' is chosen uniformly from 0,1

๐‘ฅ&!: projection of ๐‘ฅ to coordinates of ๐‘†'

Page 41: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Reduction from Learning from Sparse EquationsAny algorithm that distinguishes random ๐‘˜ โˆ’XOR (random CSPs with XOR predicate) on ๐‘› variables from

๐‘˜ โˆ’XOR CSPs with a satisfying assignment (under the streaming model),

requires either ๐›€(๐’ โ‹… ๐’Œ) memory or ๐Ÿ๐›€(๐’Œ) constraints

Page 42: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Reduction from Learning from Sparse EquationsAny algorithm that distinguishes random ๐‘˜ โˆ’XOR (random CSPs with XOR predicate) on ๐‘› variables from

๐‘˜ โˆ’XOR CSPs with a satisfying assignment (under the streaming model),

requires either ๐›€(๐’ โ‹… ๐’Œ) memory or ๐Ÿ๐›€(๐’Œ) constraints

For ๐’Œ โ‰ซ ๐ฅ๐จ๐  ๐’, refutation is hard for algorithms using even super-linear memory, when the number of constraints is < ๐Ÿ๐›€(๐’Œ)

Page 43: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Better Sample Bounds for Sub-Linear Memoryโ€ข Stronger lower bounds on the number of constraints

needed to refute

โ€ข Even for CSPs with ๐‘ก-resilient predicates ๐‘ƒ [OWโ€™14, ALโ€™16, KMOWโ€™17]

โ€ข ๐‘ƒ: 0,1 0 โ†’ {0,1} is ๐‘ก-resilient if ๐‘ƒ has zero correlation with every parity function on at most ๐‘ก โˆ’ 1 bits

Page 44: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Better Sample Bounds for Sub-Linear MemoryFor every 0 < ๐œ– < 1, any algorithm that distinguishes

random CSPs (with a ๐‘ก-resilient predicate ๐‘ƒ) on ๐‘› variables

from satisfiable ones (when the constraints arrive in a streaming fashion),

requires either ๐’๐ memory or ๐’๐’•

๐›€(๐’•(๐Ÿ/๐))constraints

Page 45: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay

Thank You!