smart: the stochastic monotone aggregated root-finding algorithm · smart: the stochastic monotone...
TRANSCRIPT
SMART: The Stochastic Monotone Aggregated Root-FindingAlgorithm
Damek Davis1
Department of MathematicsUniversity of California, Los Angeles/
School of Operations Research and Information EngineeringCornell University
1http://www.math.ucla.edu/˜damek0 / 31
minimizex∈Rm
f(x) := 1n
n∑i=1
fi(aTi x)
• The empirical risk minimization problem (ERM)• A = (a1, . . . , an) ∈ Rm×n.• n = number of training examples• m = number of features.
• Nice Properties:• fi : R→ R smooth, one dimensional, convex• ∇(fi ◦ aTi ) : Rm → Rm in one dimensional space
∇(fi ◦ aTi )(x) = aif′i(a
Ti x) ∈ Range(ai).
• So to compute gradient, need one inner product, one scalar derivative.
1 / 31
• Gradient Descent: (fast; high per iteration cost; low memory)
xk+1 = xk − γ
n
n∑i=1
aif′i(aTi xk)
• Need to compute AT xk and all scalar gradients, then sum them together.
• Stochastic Gradient (slow; low per iteration cost; low memory)
Sample ik ∈ {1, . . . , n} uniformlyxk+1 = xk − γkaikf
′ik (aTikx)
• Need γk → 0, which can be slow!
2 / 31
• Gradient Descent: (fast; high per iteration cost; low memory)
xk+1 = xk − γ
n
n∑i=1
aif′i(aTi xk)
• Need to compute AT xk and all scalar gradients, then sum them together.
• Stochastic Gradient (slow; low per iteration cost; low memory)
Sample ik ∈ {1, . . . , n} uniformlyxk+1 = xk − γkaikf
′ik (aTikx)
• Need γk → 0, which can be slow!
2 / 31
• Stochastic Variance Reduced Gradient (SVRG): (fast; some high costiterations, but mostly low; low memory)
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
(aikf
′ik (aTikx
k)− aikf′ik (aTi φk) + 1
n
n∑i=1
aif′i(aTi φk)
)
φk+1 =
{xk if k ≡ 0 mod τ ;
φk otherwise.
• Every τ iterations, recompute:
∇f(xk) =1n
n∑i=1
aif′i(a
Ti x
k)
otherwise, use the ∇f(φk).• Two derivatives computed per iteration.• ∇f(xk) stored =⇒ memory is m-dimensional vector• Strong convexity assumed.
3 / 31
• Finito: (fast; low per iteration cost; HIGH memory)
Sample ik ∈ {1, . . . , n} uniformly
xk+1i =
{1n
∑n
l=1
(xkl − γalf ′l (aTl xkl ))
)if i = ik.
xki otherwise.
• Need to store points xk1 , . . . , xkn AND gradients f ′l (aT1 xk1), . . . , f ′n(aTnxkn).• Strong convexity assumed.
4 / 31
• Stochastic Average Gradient (SAG): (fast; low per iteration cost; lowmemory)
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
n
(aikf
′ik (aTikx
k) +∑i 6=ik
aizki
)
zk+1i =
{f ′ik (aTikx
k) if i = ik;
zki otherwise.
• Memory is n-dimensional vector (zk1 , . . . , zkn).• Biased gradient:
E
[aikf
′ik
(aTikx) +1n
∑i6=ik
aizki | x
k, . . . , x0
]=
1n∇f(xk)+
(1−
1n
) n∑i=1
aizki .
• COMPLICATED PROOF.• First incremental method where strong convexity NOT assumed.
5 / 31
• SAGA: (fast; low per iteration cost; low memory)
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
(aikf
′ik (aTikx
k)− aikzkik + 1
n
n∑i=1
aizki
)
zk+1i =
{f ′ik (aTikx
k) if i = ik;
zki otherwise.
• Memory is n-dimensional vector (zk1 , . . . , zkn).• Unbiased gradient.• Relatively simple proof• Strong convexity NOT assumed.
6 / 31
• Image stolen from SAGA paper• SDCA only solves `2 regularized problem, so we ignored it.• Point: All perform about the same, besides Finito perm, which isn’t
guaranteed to converge.7 / 31
• Today:
• SMART extends incremental aggregated gradient and coordinate descentmethods.
• SMART solves the ERM problem, and this seems to be its the mosteffective use, but it can go much further.
• In addition, SMART recovers: SAGA, Finito, SVRG, and SDCA.
• SAGA seems to be the catalyst for a lot the other methods, so let’sextend SAGA as much as possible.
8 / 31
• What if data matrix is sparse?
• The gradient aikf ′ik (aTikxk) only has a few nonzero components.
• SAGA requires dense update for xk because the sum is densem∑i=1
aizki
• We should only update components of x that are in the support of aik ,i.e., apply mask to the gradient sum.
• Makes gradient biased, no reason it should work.
9 / 31
• To avoid biased gradients, need to scale components of updates.
• Let Ci ⊆ {1, . . . ,m} be the support of ai.• Let eCi
=∑
j∈Ciei ← component mask.
• Let q and pi be vectors of probabilities (easy to precompute)
• Sparse SAGA:
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γeCik� q �
(pi � (aikf
′ik (aTikx
k)− aikzkik ) + 1
n
n∑i=1
aizki
)
zk+1i =
{f ′ik (aTikx
k) if i = ik;
zki otherwise.
10 / 31
• To avoid biased gradients, need to scale components of updates.
• Let Ci ⊆ {1, . . . ,m} be the support of ai.• Let eCi
=∑
j∈Ciei ← component mask.
• Let q and pi be vectors of probabilities (easy to precompute)
• Sparse SAGA:
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γeCik� q �
(pi � (aikf
′ik (aTikx
k)− aikzkik ) + 1
n
n∑i=1
aizki
)
zk+1i =
{f ′ik (aTikx
k) if i = ik;
zki otherwise.
10 / 31
• The sparse update equation is a block coordinate update equation.
• Block Coordinate SAGA:
Sample ik ∈ {1, . . . , n} uniformly and Sk ⊆ {1, . . . ,m} arbitrarily
xk+1 = xk − γeSk � q �
(pi � (aikf
′ik (aTikx
k)− aikzkik ) + 1
n
n∑i=1
aizki
)
zk+1i =
{f ′ik (aTikx
k) if i = ik;
zki otherwise.
• Coordinates and the gradient can be coupled.
• Only one function =⇒ recover block-coordinate descent.
11 / 31
• We only compute one gradient per iteration, but we can gain a bit inperformance if we compute a few more.
• Introducing the trigger graph: G = (V,E).1. V = {1, . . . , n}2. E ⊆ V × V.
• We say that index i in V triggers i in V provided (i, i′) is in E.
• Minibatching SAGA:
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
(aikf
′ik (aTikx
k)− aikzkik + 1
n
n∑i=1
aizki
)
zk+1i =
{f ′ik (aTikx
k) if ik triggers i;
zki otherwise.
• Improves theoretical convergence rate and practical performance.
12 / 31
• We only compute one gradient per iteration, but we can gain a bit inperformance if we compute a few more.
• Introducing the trigger graph: G = (V,E).1. V = {1, . . . , n}2. E ⊆ V × V.
• We say that index i in V triggers i in V provided (i, i′) is in E.
• Minibatching SAGA:
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
(aikf
′ik (aTikx
k)− aikzkik + 1
n
n∑i=1
aizki
)
zk+1i =
{f ′ik (aTikx
k) if ik triggers i;
zki otherwise.
• Improves theoretical convergence rate and practical performance.
12 / 31
• What if we’re not solving ERM problem, but we solve
minn∑i=1
fi(x)
• Then SAGA becomes a high memory method!
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
(∇fik (xk)− ykik + 1
n
n∑i=1
yki
)
yk+1i =
{∇fik (xk) if i = ik;
yki otherwise.
• Question: Instead of saving individual gradients, can we just store thesum 1
n
∑n
i=1 yki , and periodically recompute it?
13 / 31
• What if we’re not solving ERM problem, but we solve
minn∑i=1
fi(x)
• Then SAGA becomes a high memory method!
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
(∇fik (xk)− ykik + 1
n
n∑i=1
yki
)
yk+1i =
{∇fik (xk) if i = ik;
yki otherwise.
• Question: Instead of saving individual gradients, can we just store thesum 1
n
∑n
i=1 yki , and periodically recompute it?
13 / 31
• Randomized delay εk + complete trigger graph = SVRG clone
Sample ik ∈ {1, . . . , n} uniformly and εk ∈ {0, 1}
xk+1 = xk − γ
(∇fik (xk)− ykik + 1
n
n∑i=1
yki
)yk+1i = yki + εk(∇fik (xk)− yki ).
• The trick: yki = ∇fi(φk) for old iterate xk.
Sample ik ∈ {1, . . . , n} uniformly and εk ∈ {0, 1}
xk+1 = xk − γ
(∇fik (xk)−∇fik (φk) + 1
n
n∑i=1
∇fi(φk)
)φk = xk + εk(xk − φk).
• On average gradient, full gradient computed once every E[εk] iterates (canbe chosen however you want.)
14 / 31
• Randomized delay εk + complete trigger graph = SVRG clone
Sample ik ∈ {1, . . . , n} uniformly and εk ∈ {0, 1}
xk+1 = xk − γ
(∇fik (xk)− ykik + 1
n
n∑i=1
yki
)yk+1i = yki + εk(∇fik (xk)− yki ).
• The trick: yki = ∇fi(φk) for old iterate xk.
Sample ik ∈ {1, . . . , n} uniformly and εk ∈ {0, 1}
xk+1 = xk − γ
(∇fik (xk)−∇fik (φk) + 1
n
n∑i=1
∇fi(φk)
)φk = xk + εk(xk − φk).
• On average gradient, full gradient computed once every E[εk] iterates (canbe chosen however you want.)
14 / 31
• Back to ERM problem....
• Can also add importance sampling, which increases the range of step sizeswe can take.
Without importance sampling: γ ≤ (2 max{Li})−1
Without importance sampling: γ ≤
(2n
∑i
Li
)−1
• SAGA with Importance Sampling:
Sample ik ∈ {1, . . . , n} arbitrarily
xk+1 = xk − γ
(pi � (aikf
′ik (aTikx
k)− aikzkik ) + 1
n
n∑i=1
aizki
)
zk+1i =
{f ′ik (aTikx
k) if i = ik;
zki otherwise.
• Improves theoretical convergence rate and practical performance.
15 / 31
• Back to ERM problem....
• Can also add importance sampling, which increases the range of step sizeswe can take.
Without importance sampling: γ ≤ (2 max{Li})−1
Without importance sampling: γ ≤
(2n
∑i
Li
)−1
• SAGA with Importance Sampling:
Sample ik ∈ {1, . . . , n} arbitrarily
xk+1 = xk − γ
(pi � (aikf
′ik (aTikx
k)− aikzkik ) + 1
n
n∑i=1
aizki
)
zk+1i =
{f ′ik (aTikx
k) if i = ik;
zki otherwise.
• Improves theoretical convergence rate and practical performance.15 / 31
• These algorithms are serial; only one gradient is touched per iteration.
• Let’s parallelize: choose dk ∈ {1, . . . , τ}m and eik ∈ {1, . . . , τ}. Set
xk−dk = (xk−dk,11 , . . . , x
k−dk,mm )
• Asynchronous SAGA:
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
(aikf
′ik (aTikx
k−dk )− aikzk−eik
kik
+ 1n
n∑i=1
aizk−eik
ki
)
zk+1i =
{f ′ik (aTikx
k−eikk ) if i = ik;
zki otherwise.
• Can be combined with block coordinate updates.
• It’s like running serial SAGA on n different processors, without eversyncing them up.
16 / 31
• These algorithms are serial; only one gradient is touched per iteration.
• Let’s parallelize: choose dk ∈ {1, . . . , τ}m and eik ∈ {1, . . . , τ}. Set
xk−dk = (xk−dk,11 , . . . , x
k−dk,mm )
• Asynchronous SAGA:
Sample ik ∈ {1, . . . , n} uniformly
xk+1 = xk − γ
(aikf
′ik (aTikx
k−dk )− aikzk−eik
kik
+ 1n
n∑i=1
aizk−eik
ki
)
zk+1i =
{f ′ik (aTikx
k−eikk ) if i = ik;
zki otherwise.
• Can be combined with block coordinate updates.
• It’s like running serial SAGA on n different processors, without eversyncing them up.
16 / 31
• These algorithms are nice, but they’re lacking generality.
• Recall that SAGA and the other algorithms solve
find x ∈ H such that:n∑i=1
∇fi(x) = 0
• SMART solves the following Root-Finding Problem:
find x ∈ H such that: S(x) = 1n
n∑i=1
Si(x) = 0
where Si : H → H are gradient-like.
17 / 31
• The Coherence Condition: (∃βij > 0) : (∀x ∈ H), (∀x∗ ∈ zer(S))
m∑j=1
n∑i=1
βij‖(Si(x))j − (Si(x∗))j‖2j ≤ 〈S(x), x− x∗〉.
• Smooth convex functions satisfy
(∀x, y ∈ H) 1Li‖∇fi(x)−∇fi(y)‖2 ≤ 〈∇fi(x)−∇fi(y), x− y〉.
if ∇fi is Li-Lipschitz.
• =⇒ property can be summed together for multiple smooth functions:n∑i=1
1nLi‖∇fi(x)−∇fi(x∗)‖2 ≤ 〈 1
n
n∑i=1
∇fi(x), x− x∗〉.
if∑n
i=1∇fi(x∗) = 0.
18 / 31
• The Coherence Condition: (∃βij > 0) : (∀x ∈ H), (∀x∗ ∈ zer(S))
m∑j=1
n∑i=1
βij‖(Si(x))j − (Si(x∗))j‖2j ≤ 〈S(x), x− x∗〉.
• Smooth convex functions satisfy
(∀x, y ∈ H) 1Li‖∇fi(x)−∇fi(y)‖2 ≤ 〈∇fi(x)−∇fi(y), x− y〉.
if ∇fi is Li-Lipschitz.
• =⇒ property can be summed together for multiple smooth functions:n∑i=1
1nLi‖∇fi(x)−∇fi(x∗)‖2 ≤ 〈 1
n
n∑i=1
∇fi(x), x− x∗〉.
if∑n
i=1∇fi(x∗) = 0.
18 / 31
• Proximal operators: (∀x, y ∈ H)
‖(I − proxγf )(x)− (I − proxγf )(y)‖2
≤ 〈(I − proxγf )(x)− (I − proxγf )(y), x− y〉.
(for smooth and nonsmooth f)
• Projection operators: (∀x, y ∈ H)
‖(I − PC)(x)− (I − PC)(y)‖2 ≤ 〈(I − PC)(x)− (I − PC)(y), x− y〉.
(for closed convex sets)
• Subgradient projectors: (∀x ∈ H) , (∀x∗ ∈ [f ≤ 0])∥∥∥∥ [f(x)]+‖g(x)‖2 g(x)
∥∥∥∥2
≤ 〈 [f(x)]+‖g(x)‖2 g(x), x− x∗〉.
where g(x) ∈ ∂f(x) is a subgradient selector.
19 / 31
• Proximal operators: (∀x, y ∈ H)
‖(I − proxγf )(x)− (I − proxγf )(y)‖2
≤ 〈(I − proxγf )(x)− (I − proxγf )(y), x− y〉.
(for smooth and nonsmooth f)
• Projection operators: (∀x, y ∈ H)
‖(I − PC)(x)− (I − PC)(y)‖2 ≤ 〈(I − PC)(x)− (I − PC)(y), x− y〉.
(for closed convex sets)
• Subgradient projectors: (∀x ∈ H) , (∀x∗ ∈ [f ≤ 0])∥∥∥∥ [f(x)]+‖g(x)‖2 g(x)
∥∥∥∥2
≤ 〈 [f(x)]+‖g(x)‖2 g(x), x− x∗〉.
where g(x) ∈ ∂f(x) is a subgradient selector.
19 / 31
• Proximal operators: (∀x, y ∈ H)
‖(I − proxγf )(x)− (I − proxγf )(y)‖2
≤ 〈(I − proxγf )(x)− (I − proxγf )(y), x− y〉.
(for smooth and nonsmooth f)
• Projection operators: (∀x, y ∈ H)
‖(I − PC)(x)− (I − PC)(y)‖2 ≤ 〈(I − PC)(x)− (I − PC)(y), x− y〉.
(for closed convex sets)
• Subgradient projectors: (∀x ∈ H) , (∀x∗ ∈ [f ≤ 0])∥∥∥∥ [f(x)]+‖g(x)‖2 g(x)
∥∥∥∥2
≤ 〈 [f(x)]+‖g(x)‖2 g(x), x− x∗〉.
where g(x) ∈ ∂f(x) is a subgradient selector.
19 / 31
Algorithm (SMART)
Let {λk}k∈N be a sequence of stepsizes. Choose x0 ∈ H and y01 , . . . , y
0n ∈ H
arbitrarily except that y0i,j = 0 if S∗ij = 0. Then for k ∈ N, perform the
following three steps:
1. Sampling. choose a set of coordinates Sk, an operator index ik, and dualupdate decision εk.
2. Primal update: set
(∀j ∈ Sk) xk+1j = xkj −
λkqjmn
(1pij
((Sik (xk−dk ))j − y
k−eikk
ik,j
)+
n∑i=1
yk−ei
ki,j
)(∀j 6∈ Sk) xk+1
j = xkj .
3. Dual update: If ik triggers i, set(∀j ∈ Sk with S∗ij 6= 0
)yk+1i,j = yki,j + εk
((Si(xk−dk ))j − yki,j
)(∀j /∈ Sk) yk+1
i,j = yki,j .
Otherwise, set yk+1i,j = yki,j .
20 / 31
• Linear feasibility problem:
Find x ∈ H such that Ax = b
• Randomized Asynchronous Kaczmarz algorithm:
Sample ik ∈ {1, . . . , n} uniformlyxk+1 = xk + λ(bik − 〈aik , x
k−dk 〉)aik
• Here we used Ci = {x | 〈ai, x〉 = bi} and
Si := (I − PCi ).
• No memory needed precisely because Si(x∗) = 0 at any solution.
21 / 31
• Nonsmooth regularization of ERM?
minimizex∈H
g(x) + 1N
N∑i=1
fi(aTi x),
• Operators (that satisfy the coherence condition)
Si = 1Li‖ai‖2N
ai∇fi ◦ aTi proxL−1g i = 1, . . . , N ;
SN+1 = (I − proxL−1g),
• Roots x∗ ∈ zer(S) are not minimizers, but proxL−1g(x) is a minimizer.
• Every time we evaluate Si, we have to evaluate proxL−1g, make thetrigger graph a star and we always update the (N + 1)rst dual variable.
22 / 31
• Nonsmooth regularization of ERM?
minimizex∈H
g(x) + 1N
N∑i=1
fi(aTi x),
• Operators (that satisfy the coherence condition)
Si = 1Li‖ai‖2N
ai∇fi ◦ aTi proxL−1g i = 1, . . . , N ;
SN+1 = (I − proxL−1g),
• Roots x∗ ∈ zer(S) are not minimizers, but proxL−1g(x) is a minimizer.
• Every time we evaluate Si, we have to evaluate proxL−1g, make thetrigger graph a star and we always update the (N + 1)rst dual variable.
22 / 31
• Nonsmooth regularization of ERM?
minimizex∈H
g(x) + 1N
N∑i=1
fi(aTi x),
• Operators (that satisfy the coherence condition)
Si = 1Li‖ai‖2N
ai∇fi ◦ aTi proxL−1g i = 1, . . . , N ;
SN+1 = (I − proxL−1g),
• Roots x∗ ∈ zer(S) are not minimizers, but proxL−1g(x) is a minimizer.
• Every time we evaluate Si, we have to evaluate proxL−1g, make thetrigger graph a star and we always update the (N + 1)rst dual variable.
22 / 31
• Asynchronous Proximal SAGA:
Sample ik ∈ {1, . . . , N + 1} with P (ik = i) =
{1
2N if i < N ;12 if i = N.
if ik < N
xk+1 = xk − 1λ(N + 1)
(1pik
(Si(xk−dk )− aizk−eik
ki ) + y
k−eN+1k
N+1 +N∑i=1
aizk−ei
ki
);
else
xk+1 = xk − 1λ(N + 1)
(1
pN+1(SN+1(xk−dk )− yk−e
N+1k
N+1 ) + yk−eN+1
kN+1 +
N∑i=1
aizk−ei
ki
);
end
zk+1i =
{1
Li‖ai‖2N∇fi(aTi proxL−1g(xk−dk )) if i = ik;
zki otherwise.
ykN+1 =(I − proxL−1g
)(xk−dk );
23 / 31
• Monotropic Programming?
minimizexj∈Hj
M∑j=1
gj(xj) + f(x1, . . . , xM );
subject to:M∑j=1
Ajxj = b
• Operator S :∏M+1j=1 Hj →
∏M+1j=1 Hj :
(S(x))M+1 := −γM+1
(M∑l=1
Alxl − b
);
(S(x))j
:= xj − proxγjgj
(xj − γjA∗j
(xM+1 + 2γM+1
(M∑l=1
Alxl − b
))− γj∇jf(x)
).
• x∗ ∈ zer(S) =⇒ (x∗1, . . . , x∗m) solves the monotropic programmingproblem. (Why?)
24 / 31
• Monotropic Programming?
minimizexj∈Hj
M∑j=1
gj(xj) + f(x1, . . . , xM );
subject to:M∑j=1
Ajxj = b
• Operator S :∏M+1j=1 Hj →
∏M+1j=1 Hj :
(S(x))M+1 := −γM+1
(M∑l=1
Alxl − b
);
(S(x))j
:= xj − proxγjgj
(xj − γjA∗j
(xM+1 + 2γM+1
(M∑l=1
Alxl − b
))− γj∇jf(x)
).
• x∗ ∈ zer(S) =⇒ (x∗1, . . . , x∗m) solves the monotropic programmingproblem. (Why?)
24 / 31
• TropicSMART:
Sample coordinate jk ∈ {1, . . . ,M + 1} uniformly and set Sk = {jk}.
xk+1M+1 = xkM+1 + γM+1
(M∑l=1
Alxkl − b
);
xk+1j = proxγjgj
(xkj − γjA∗j (2xk+1
M+1 − xkM+1)− γj∇jf(xk)
);
(∀j ∈ Sk) xk+1j = xkj − λ
(xkj − xk+1
j
);
(∀j /∈ Sk) xk+1j = xkj .
25 / 31
• More special cases in the paper.
• What about theory?
26 / 31
Theorem (SMART converges)1. The sequence {xk}k∈N weakly converges to a root of S.
2. If S is essentially strongly quasi monotone (ESQM)
(∃µ > 0) : (∀x ∈ H) 〈S(x), x− Pzer(S)(x)〉 ≥ µ‖x− Pzer(S)(x)‖2,
then {xk}k∈N linearly converges to a root of S.
• Examples of ESQM include S(x) =∑
iai∇fi(aTi x) if each fi is strongly
convex.
27 / 31
• Proof is not difficult, but kind of long.
1. Construct a supermartingale sequence
E[Xk+1|Fk] + Yk ≤ Xk
where
Xk = (distance to solution)2 + (asynchrony residual) + (dual variables residual)Yk = (Residuals that force convergence if they are 0).
2. Then through a series of magical steps, show that the sequence weaklyconverges.
• Linear convergence is somewhat more difficult to show.
28 / 31
101 102
Time (s)
10-1
Obje
ctiv
e E
rror
1 core4 cores8 cores16 cores
Figure: `2-regularized Logistic regression with N = 1000, m = 10000, conditionnumber = 10, matrix A random Gaussian, vector b uniformly distributed.
29 / 31
• A lot left to do.
• Nonconvex case• (asynchronous matrix factorization algorithms soon)
• Do more numerical experiments• Brent Edmunds at UCLA making program that takes operators as input and
runs SMART to find roots.
• Characterize sublinear convergence rates.
• Make more operators =⇒ more algorithms.
30 / 31
Thanks!
• Paper available here: http://arxiv.org/abs/1601.00698
• This material is based upon work supported by the National ScienceFoundation under Award No. 1502405.
31 / 31