intr o duc t i on

69
Introduct ion Computation in fields Accelerating fields Conclusions and Future Work Referenc es Appendi x Accelerating the ‘f i e l d s ’ Package in R: Theory and Application John Paige 1 ([email protected]) Isaac Lyngaas 2 ( [email protected]) Srinath Vadlamani 3 ( [email protected]) Doug Nychka 3 ([email protected]) 1 Macalester College 2 Florida State University tudents who

Upload: skyla

Post on 11-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Intr o duc t i on. Computatio n in fields. Accelerating fields. Conclusion s and F uture W o rk. References. Ap p endix. Accelerating the ‘ fields ’ P ac k age in R : The o ry and Application John P aige 1 ( [email protected] ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Accelerating the ‘fi e l d s ’ Package in R: Theory and

Application

John Paige1 ([email protected])

Isaac Lyngaas2

([email protected]) Srinath Vadlamani3 ([email protected])

Doug Nychka3 ([email protected])

1Macalester College

2Florida State University

3National Center of Atmospheric Research

July 31, 2014

tudents who are

Page 2: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

IntroductionIntroduction to Problem Goals and Motivation

Computation in fieldsMathematical

Foundation

Accelerating fieldsEigen and Cholesky

Decompositions Parameter Optimization

Conclusions and Future

Work Appendix

Page 3: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Analyzing Spatial Data with fi e l d s : Kriging

−108 −106 −104 −102

3738

3940

41Colorado Average Spring High Temperature (Celsius)

Longitude

Dataset from fi e l d s package (Nychka et al., 2014b)

Latit

ude

5

10

15

20

Page 4: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Analyzing Spatial Data with fi e l d s : Kriging

−108 −106 −104 −102

3738

3940

41Colorado Average Spring High Temperature (Celsius)

Longitude

Dataset from fi e l d s package (Nychka et al., 2014b)

Latit

ude

5

10

15

20

?

Page 5: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Analyzing Spatial Data with fi e l d s : Kriging

Dataset from fi e l d s package (Nychka et al., 2014b) Kriging estimate for λ = .103, θ = 5.45

Page 6: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Difficult to Analyze Large Datasets

Difficulties in Kriging:• Kriging with many observations

−150 −100 −50 0 50 100 150−

50

05

0

Fields' CO2 Dataset

Longitude

La

titu

de

CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 7: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Difficult to Analyze Large Datasets

Difficulties in Kriging:• Kriging with many observations

−150 −100 −50 0 50 100 150−

50

05

0

Fields' CO2 Dataset

Longitude

La

titu

de

CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 8: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Difficult to Analyze Large Datasets

Difficulties in Kriging:• Kriging with

many observations

• O (n3)

−150 −100 −50 0 50 100 150−

50

05

0

Fields' CO2 Dataset

Longitude

La

titu

de

CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 9: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Difficult to Analyze Large Datasets

Difficulties in Kriging:• Kriging with

many observations

• O (n3)• 100,000 observations, fixed

parameters: ∼8 hours

−150 −100 −50 0 50 100 150−

50

05

0

Fields' CO2 Dataset

Longitude

La

titu

de

CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 10: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Difficult to Analyze Large Datasets

Difficulties in Kriging:• Kriging with

many observations

• O (n3)• 100,000 observations, fixed

parameters: ∼8 hours• 100,000 observations, 10

parameter samples: ∼80 hours

−150 −100 −50 0 50 100 150

Longitude

−5

00

50

Fields' CO2 Dataset

La

titu

de

CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 11: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Difficult to Analyze Large Datasets

Difficulties in Kriging:• Kriging with

many observations

• O (n3)• 100,000 observations, fixed

parameters: ∼8 hours• 100,000 observations, 10

parameter samples: ∼80 hours

• Kriging over time for many parameter sets:

−150 −100 −50 0 50 100 150

Longitude

−5

00

50

Fields' CO2 Dataset

La

titu

de

CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 12: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Difficult to Analyze Large Datasets

Difficulties in Kriging:• Kriging with

many observations

• O (n3)• 100,000 observations, fixed

parameters: ∼8 hours• 100,000 observations, 10

parameter samples: ∼80 hours

• Kriging over time for many parameter sets:

• 5,000 observations, 30 years (monthly data), 20 parameter samples: nearly 14 hours

−150 −100 −50 0 50 100 150

Longitude

−5

00

50

Fields' CO2 Dataset

La

titu

de

CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 13: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Research Goals

• Learn Kriging theory

• Accelerate the fields package in R• Cholesky and eigen decompositions• Spatial parameter estimation (maximum likelihood estimation)

Page 14: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Why Focus on fi e l d s ?

Spatial Problem Workflow Times

goeRfields (with mKrig)

• Made in R(Nychka et al., 2014b)

• Free, open source• Popular with statisticians• Easy to use

4000 6000

Number of Observations

Tim

e (

Min

ute

s)

0 2000 8000 10000

0.1

11

01

00

geoR and fi e l d s packages used in this plot: Diggle and Ribeiro (2007); Nychka et al. (2014b); Ribeiro Jr and Diggle (2001)

Page 15: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Why Focus on fi e l d s ?

Spatial Problem Workflow Times

goeRfields (with mKrig)

• Made in R(Nychka et al., 2014b)

• Free, open source• Popular with statisticians• Easy to use

• Fast (for R)4000 6000

Number of Observations

Tim

e (

Min

ute

s)

0 2000 8000 10000

0.1

11

01

00

geoR and fi e l d s packages used in this plot: Diggle and Ribeiro (2007); Nychka et al. (2014b); Ribeiro Jr and Diggle (2001)

Page 16: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Why Focus on Matrix Decompositions and Optimization?

• Eigen and Cholesky decompositions take a long time• Over 10,000 observations

means over 40% of the computation time

• Spatial parameter optimization requires matrix decompositions

• Better optimization means fewer matrix decompositionsNote: running on a Caldera node

on Jellystone

• GPUs: 2 Tesla M2070-Q

• CPUs: 2 8-core 2.6-Ghz Intel Xeon E5-2670 (Sandy Bridge)

0 2000 4000 6000

Number of Observations

8000 10000

01

23

45

6

Spatial Surface Estimation Computation Time

Tim

e (

Min

ute

s)

Complete WorkflowWorkflow Cholesky Decompositions

Page 17: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Why Focus on Matrix Decompositions and Optimization?

• Eigen and Cholesky decompositions take a long time• Over 10,000 observations

means over 40% of the computation time

• Spatial parameter optimization requires matrix decompositions

• Better optimization means fewer matrix decompositionsNote: running on a Caldera node

on Jellystone

• GPUs: 2 Tesla M2070-Q

• CPUs: 2 8-core 2.6-Ghz Intel Xeon E5-2670 (Sandy Bridge)

0 2000 4000 6000

Number of Observations

8000 10000

02

04

06

08

01

00

Percent Workflow Time Taken by Cholesky Decompositions

Pe

rce

nt

Page 18: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Why Focus on Matrix Decompositions and Optimization?

• Eigen and Cholesky decompositions take a long time• Over 10,000 observations

means over 40% of the computation time

• Spatial parameter optimization requires matrix decompositions

• Better optimization means fewer matrix decompositionsNote: running on a Caldera node

on Jellystone

• GPUs: 2 Tesla M2070-Q

• CPUs: 2 8-core 2.6-Ghz Intel Xeon E5-2670 (Sandy Bridge)

0 2000 4000 6000

Number of Observations

8000 10000

02

04

06

08

01

00

Percent Workflow Time Taken by Cholesky Decompositions

Pe

rce

nt

Page 19: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Why Focus on Matrix Decompositions and Optimization?

• Eigen and Cholesky decompositions take a long time• Over 10,000 observations

means over 40% of the computation time

• Spatial parameter optimization requires matrix decompositions

• Better optimization means fewer matrix decompositionsNote: running on a Caldera node

on Jellystone

• GPUs: 2 Tesla M2070-Q

• CPUs: 2 8-core 2.6-Ghz Intel Xeon E5-2670 (Sandy Bridge)

0 2000 4000 6000

Number of Observations

8000 10000

02

04

06

08

01

00

Percent Workflow Time Taken by Cholesky Decompositions

Pe

rce

nt

Page 20: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Why Focus on Matrix Decompositions and Optimization?

• Eigen and Cholesky decompositions take a long time• Over 10,000 observations

means over 40% of the computation time

• Spatial parameter optimization requires matrix decompositions

• Better optimization means fewer matrix decompositionsNote: running on a Caldera node

on Jellystone

• GPUs: 2 Tesla M2070-Q

• CPUs: 2 8-core 2.6-Ghz Intel Xeon E5-2670 (Sandy Bridge)

0 2000 4000 6000

Number of Observations

8000 10000

02

04

06

08

01

00

Percent Workflow Time Taken by Cholesky Decompositions

Pe

rce

nt

Page 21: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent

Page 22: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent

Page 23: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent

Page 24: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent

Page 25: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent

Page 26: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent Assumptions (Nychka et al., 2014a):

• E (g (x)) = 0• Kriging surface has zero mean

• E [g (x)g (xt)] = ρ · k(x, xt):• k: correlation function (e.g. e− | |x− xt | | /θ )• ρ: correlation strength (or signal strength)

Page 27: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent Assumptions (Nychka et al., 2014a):

• E (g (x)) = 0• Kriging surface has zero mean

• E [g (x)g (xt)] = ρ · k(x, xt):• k: correlation function (e.g. e− | |x− xt | | /θ )• ρ: correlation strength (or signal strength)

Page 28: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent Assumptions (Nychka et al., 2014a):

• E (g (x)) = 0• Kriging surface has zero mean

• E [g (x)g (xt)] = ρ · k(x, xt):• k: correlation function (e.g. e− | |x− xt | | /θ )• ρ: correlation strength (or signal strength)

Page 29: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Kriging Set-Up (With No Trend Component)

yi = g (xi ) + ei

yi : observationsxi : locationsg : Kriging surfaceei : error following N (0, σ2) distribution, independent Assumptions (Nychka et al., 2014a):

• E (g (x)) = 0• Kriging

surface has zero mean

• E [g (x)g (xt)] = ρ · k(x, xt):

• k: correlation function (e.g. e− | |

x− xt | | /θ )• ρ:

correlation strength (or signal strength)

Reparameterization:

λ = σ2/ρ: smoothing parameter

(1/λ: signal/noise ratio)

Page 30: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization: Maximum Likelihood Estimation

DefinitionA likelihood function, L, gives the chance that a set of observations occur in a model given the model parameters.

Definitionαˆ MLE is a maximum likelihood estimate of α if αˆ MLE

maximizes the data likelihood:

L(αˆ MLE ) ≥ L(α), ∀ α

Page 31: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization: Maximum Likelihood Estimation

DefinitionA likelihood function, L, gives the chance that a set of observations occur in a model given the model parameters.

Definitionαˆ MLE is a maximum likelihood estimate of α if αˆ MLE

maximizes the data likelihood:

L(αˆ MLE ) ≥ L(α), ∀ α

Page 32: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization: Maximum Likelihood Estimation

DefinitionA likelihood function, L, gives the chance that a set of observations occur in a model given the model parameters.

Definitionαˆ MLE is a maximum likelihood estimate of α if αˆ MLE

maximizes the data likelihood:

L(αˆ MLE ) ≥ L(α), ∀ α

Page 33: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization: Maximum Likelihood Estimation

DefinitionA likelihood function, L, gives the chance that a set of observations occur in a model given the model parameters.

Definitionαˆ MLE is a maximum likelihood estimate of α if αˆ MLE

maximizes the data likelihood:

L(αˆ MLE ) ≥ L(α), ∀ α

The data log-likelihood given in Nychka et al. (2014a) for a givenθ and λ is:

ln L(θ, λ) = −

y T (Σθ + λI )− 1y1 2

+ ln |Σθ + λI | + C

2

Page 34: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s

ln L(θ, λ) = −

y T (Σθ + λI )− 1y1 2

+ ln |Σθ + λI | + C

2• y : vector of observation values

• Σθ : covariance matrix, where:

(Σθ ) i j = ρk(xi , xj ) = ρe− | |xi − xj

||/θ

• C : constant

Page 35: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s

ln L(θ, λ) = −

y T (Σθ + λI )− 1y1 2

+ ln |Σθ + λI | + C

2• y : vector of observation values

• Σθ : covariance matrix, where:

(Σθ ) i j = ρk(xi , xj ) = ρe− | |xi − xj

||/θ

• C : constant

Page 36: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s

ln L(θ, λ) = −

y T (Σθ + λI )− 1y1 2

+ ln |Σθ + λI | + C

2• y : vector of observation values

• Σθ : covariance matrix, where:

(Σθ ) i j = ρk(xi , xj ) = ρe− | |xi − xj

||/θ

• C : constant

Page 37: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s

ln L(θ, λ) = −

y T (Σθ + λI )− 1y1 2

+ ln |Σθ + λI | + C

2• y : vector of observation values

• Σθ : covariance matrix, where:

(Σθ ) i j = ρk(xi , xj ) = ρe− | |xi − xj

||/θ

• C : constant

Page 38: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s

ln L(θ, λ) = −

y T (Σθ + λI )− 1y1 2

+ ln |Σθ + λI | + C

2• y : vector of observation values

• Σθ : covariance matrix, where:

(Σθ ) i j = ρk(xi , xj ) = ρe− | |xi − xj

||/θ

• C : constant

How can this likelihood be computed quickly?

Page 39: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s

ln L(θ, λ) = −

y T (Σθ + λI )− 1y1 2

+ ln |Σθ + λI | + C

2• y : vector of observation values

• Σθ : covariance matrix, where:

(Σθ ) i j = ρk(xi , xj ) = ρe− | |xi − xj

||/θ

• C : constant

How can this likelihood be computed quickly?

• Krig uses eigendecomposition

• mKrig uses Cholesky decomposition

Page 40: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s

ln L(θ, λ) = −

y T (Σθ + λI )− 1y1 2

+ ln |Σθ + λI | + C

2• y : vector of observation values

• Σθ : covariance matrix, where:

(Σθ ) i j = ρk(xi , xj ) = ρe− | |xi − xj

||/θ

• C : constant

How can this likelihood be computed quickly?

• Krig uses eigendecomposition

• mKrig uses Cholesky decomposition

Page 41: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Accelerating fields: Eigen and Cholesky Decompositions

• MAGMA (Agullo et al., 2009)• Freely available library with multi-GPU computing

capability• Much faster than default R

Page 42: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Accelerated Workflow Time

0 5000 10000 15000

02

46

8

Spatial Problem Workflow Times

Default R One GPU Two GPUs

Number of Observations

Tim

e (

Min

ute

s)

• > 2500 observations ⇒ accelerated workflow is faster• > 10000 observations ⇒ ≥ 1.55× speedup (for 1 or 2

GPUs)

Page 43: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Accelerated Workflow Time

0 5000 10000 15000

02

46

8

Spatial Problem Workflow Times

Default R One GPU Two GPUs

Number of Observations

Tim

e (

Min

ute

s)

• > 13000 observations ⇒two GPU workflow faster than one GPU by ≥ 1 sec

Page 44: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Accelerating fields: Maximum Likelihood Estimation

Goal:

• Quickly find set of model parameters maximizing data likelihood

Questions:

• Is Eigen or Cholesky decomposition faster for maximizing likelihood?

• How do different ways of splitting up these decomposit ions among cores and GPUs affect likelihood maximization time?

Observations from CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 45: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Accelerating fields: Maximum Likelihood Estimation

Goal:

• Quickly find set of model parameters maximizing data likelihood

Questions:

• Is Eigen or Cholesky decomposition faster for maximizing likelihood?

• How do different ways of splitting up these decomposit ions among cores and GPUs affect likelihood maximization time?

Observations from CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 46: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Accelerating fields: Maximum Likelihood Estimation

Goal:

• Quickly find set of model parameters maximizing data likelihood

Questions:

• Is Eigen or Cholesky decomposition faster for maximizing likelihood?

• How do different ways of splitting up these decomposit ions among cores and GPUs affect likelihood maximization time?

Observations from CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 47: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Accelerating fields: Maximum Likelihood Estimation

Goal:

• Quickly find set of model parameters maximizing data likelihood

Questions:

• Is Eigen or Cholesky decomposition faster for maximizing likelihood?

• How do different ways of splitting up these decomposit ions among cores and GPUs affect likelihood maximization time?

Observations from CO2 dataset in fi e l d s package (Nychka et al., 2014b)

Page 48: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Cholesky Wins Over Eigendecomposition

Cholesky Decomposition Speedup Over Eigendecomposition (Default Implementations)

Number of Observations

Sp

ee

dup

0 5000 10000 15000

0

1

5

40

60

80

10

0

12

0

14

0

• Optimizing λ for fixed θ:• 10 to 15 Cholesky decompositions• One eigendecomposition

Page 49: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Cholesky Wins Over Eigendecomposition

Cholesky Decomposition Speedup Over Eigendecomposition (Default Implementations)

Number of Observations

Sp

ee

dup

0 5000 10000 15000

0

1

5

40

60

80

10

0

12

0

14

0

• < 20000 observations ⇒ Cholesky is at least 18× faster

Page 50: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Cholesky Wins Over Eigendecomposition

Cholesky Decomposition Speedup Over Eigendecomposition (Default Implementations)

Number of Observations

Sp

ee

dup

0 5000 10000 15000

0

1

5

40

60

80

10

0

12

0

14

0

• Cholesky decomposition achieved better speedups with GPUs

Page 51: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Cholesky Wins Over Eigendecomposition

Cholesky Decomposition Speedup Over Eigendecomposition (Default Implementations)

Number of Observations

Sp

ee

dup

0 5000 10000 15000

0

1

5

40

60

80

10

0

12

0

14

0

• Multidimensional parameter space:• Likelihood gradient is easier to estimate with Cholesky

decomposition

Page 52: Intr o duc t i on

Introduction Computation in fi e l d s Accelerating fi e l d s Conclusions and Future Work

Splitting Up Cholesky Decompositions in

Calculations

• Two GPUs per Caldera node, so either:• Use both GPUs per Cholesky decomposition• Use one GPU per Cholesky decomposition

References Appendix

Likelihood

GPU1 GPU2

{ . i } i =

1

16

GPU1 GPU2

816

{ >i } i =1 { >i } i =9

Core1

{ . i } i =

1

16

Core16Core1

>1 >16

• Compare with likelihood calculation times using:• Default Cholesky decomposition run serially• Default Cholesky decomposition parallelized in

Rmpi

Page 53: Intr o duc t i on

Introduc

16{ . i } i =1 >1 >16

Core1 Core1 Core16

168

1

{ . i } i =1{ >i } i=1 { >i } i

GPU1 GPU2

GPU1

GPU

tion Computation in fi e l d s Accelerating fi e l d s Conclusions and Future Work

Splitting Up Cholesky Decompositions in

Calculations

• Two GPUs per Caldera node, so either:• Use both GPUs per Cholesky decomposition• Use one GPU per Cholesky decomposition

References Appendix

Likelihood

2

8 166

{ .Ai } i=1 { .Ai } i=9

=9GPU1 GPU2 GPU1 GPU2

{ .Ai } i

=1

16

Core1

{ . i } i =

1

16

Core16Core1

>1 >16

• Compare with likelihood calculation times using:• Default Cholesky decomposition run serially• Default Cholesky decomposition parallelized in

Rmpi

Page 54: Intr o duc t i on

Introduc

16{ . i } i =1 >1 >16

Core1 Core1 Core16

168

1

{ . i } i =1{ >i } i=1 { >i } i

GPU1 GPU2

GPU1

GPU

tion Computation in fi e l d s Accelerating fi e l d s Conclusions and Future Work

Splitting Up Cholesky Decompositions in

Calculations

• Two GPUs per Caldera node, so either:• Use both GPUs per Cholesky decomposition• Use one GPU per Cholesky decomposition

References Appendix

Likelihood

2

816 6

{ .Ai } i =1 { .Ai } i =9

GPU1 GPU2

=9

GPU1 GPU2

• C

{ .Ai } i

=1

16

Core1

{ . i } i =

1

16

Core16Core1

>1 >16

ompare with likelihood calculation times using:• Default Cholesky decomposition run serially• Default Cholesky decomposition parallelized in Rmpi

Core1

{ . i } i =

1

16

Core16Core1

>1 >16

Page 55: Intr o duc t i on

Introduc

16{ . i } i =1 >1 >16

Core1 Core1 Core16

168

1

{ . i } i =1{ >i } i=1 { >i } i

GPU1 GPU2

GPU1

GPU

tion Computation in fi e l d s Accelerating fi e l d s Conclusions and Future Work

Splitting Up Cholesky Decompositions in

Calculations

• Two GPUs per Caldera node, so either:• Use both GPUs per Cholesky decomposition• Use one GPU per Cholesky decomposition

References Appendix

Likelihood

2

816 6

{ .Ai } i =1 { .Ai } i =9

GPU1 GPU2

=9

GPU1 GPU2

• C

{ .Ai } i

=1

16

Core1

{ . i } i =

1

16

Core16Core1

>1 >16

ompare with likelihood calculation times using:• Default Cholesky decomposition run serially• Default Cholesky decomposition parallelized in Rmpi

16{ . i } i=1 >1 >16

Core1 Core1 Core16

Core1

{ . i } i =

1

16

Core16Core1

>1 >16

Page 56: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

10

01

50

20

02

50

Splitting Up Likelihood Calculations: Results

Likelihood Calculation Time (For 16 Problems)

Serial Default16−Core Parallel Default

Serial 2 GPUs per problem Parallel 1 GPU per problem

0 5000 10000

Number of Observations

15000

05

0

Tim

e (

Se

con

ds)

• Substantial speedups versus serial, default R (10,000 observations)

• Two GPUs in parallel (one per problem): 8.3ו Two GPUs in serial (two per problem): 5.2×

Page 57: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

10

01

50

20

02

50

Splitting Up Likelihood Calculations: Results

Likelihood Calculation Time (For 16 Problems)

Serial Default16−Core Parallel Default

Serial 2 GPUs per problem Parallel 1 GPU per problem

0 5000 10000

Number of Observations

15000

05

0

Tim

e (

Se

con

ds)

• Significant speedups versus parallel, default R (12,500 observations)

• Two GPUs in parallel (one per problem): 1.67ו Two GPUs in serial (two per problem): 1.02×

Page 58: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Conclusions

• Evidence suggests using Cholesky, not eigen decomposition for likelihood maximization will be faster (at least for ≤ 20,000 observations)

• Successfully accelerated fields’ spatial workflow computations

• > 10000 observations ⇒ ≥ 1.55× speedup (for 1 or 2 GPUs)

• Demonstrated viability of two-GPU parallelized likelihood calculations using Cholesky decomposition

• 2 GPU, 2 per problem: 5.2× speedup over current implementation for ≥ 10000 observations

• 2 GPU, 1 per problem: 8.3× speedup over current implementation for ≥ 10000 observations

• Splitting up likelihood calculations among two GPUs is faster than using two GPUs serially or 16 cores in parallel

Page 59: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Future Work

• Possible to reach even higher levels of speedups with GPUs (by enhancing R wrapper memory usage)

• Further investigate Cholesky vs. eigendecomposition maximum likelihood estimation speed

• Implement multidimensional parallel optimization algorithm

• Latin hypercube sampling• L-BFGS-B (Zhu et al., 1997)

• Fully incorporate code into fields package

• Test fields parallelization across nodes

Page 60: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

References

Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, and Stanimire Tomov. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. In Journal of Physics: Conference Series, volume 180, page 012037. IOP Publishing, 2009.

Peter Diggle and Paulo Justiniano Ribeiro. Model-based geostatistics. Springer, 2007.

Douglas Nychka, Reinhard Furrer, and Stephan Sain. Smoothing and spatial statistics: a unified and practical approach with fields. To Be Published by Springer, 2014a.

Douglas Nychka, Reinhard Furrer, and Stephan Sain. fields: Tools for spatial data, 2014b. URLhttp://CRAN.R-p roject .o rg /package=fi elds. R package version 7.1.

Paulo J Ribeiro Jr and Peter J Diggle. geor: A package for geostatistical analysis. R news, 1(2):14–18, 2001.

Ciyou Zhu, Richard H Byrd, Peihuang Lu, and Jorge Nocedal. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23(4): 550–560, 1997.

Page 61: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

MAGMA Cholesky Decomposition Times

0 5000 10000 15000

010

2030

40

Default and MAGMA−Accelerated Cholesky Times in R

DefaultMAGMA (1 GPU)MAGMA (2 GPUs)

Number of Observations

Tim

e (S

econ

ds)

Page 62: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s : Cholesky Decomposition

ln L(θ, λ) = −

y T (Σ + λI )− 1y1 2

+ ln |Σ + λI |

2Solving with Cholesky decomposition (Nychka et al., 2014a): Note: Σ + λI is symmetric positive definiteLet (Σ + λI ) = LLT . Then:

y T (Σ + λI )− 1y = y T (LLT )− 1y

Also:

|Σ + λI | = |LLT |

= |L|2

Page 63: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Parameter Optimization in fi e l d s : Eigendecomposition

ln L(θ, λ) = −

y T (Σ + λI )− 1y1 2

+ ln |Σ + λI |

2Solving with Eigendecomposition (Nychka et al., 2014a): Let (Σ + λI ) = U (D + λI )U− 1. Then:

y T (Σ + λI )− 1y = y T (U (D + λI )U− 1)− 1y

= y T (U (D + λI )− 1Uy.

Also:

|Σ + λI | = |U (D + λI )U− 1|

= |U| · |D + λI | · |U− 1|

= |D + λI |.

Page 64: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Analyzing Spatial Data with fi e l d s : Walkthrough

• Call mKrig or Krig with locations, observations, model parameters, covariance function

• In this case, exponential covariance:θ = 20, correlation range1/λ = 10, signal to noise ratio

Page 65: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Analyzing Spatial Data with fi e l d s : Walkthrough

• Call mKrig or Krig with locations, observations, model parameters, covariance function

• In this case, exponential covariance:θ = 20, correlation range1/λ = 10, signal to noise ratio

• predictSurface computes spatial surface

−108 −106 −104 −10237

3839

4041

Colorado Average Spring High Temperature (Celsius)

Longitude

Latit

ude

5

10

15

20

Page 66: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Analyzing Spatial Data with fi e l d s : Walkthrough

• Call mKrig or Krig with locations, observations, model parameters, covariance function

• In this case, exponential covariance:θ = 20, correlation range1/λ = 10, signal to noise ratio

• predictSurface computes spatial surface

−108 −106 −104 −10237

3839

4041

Colorado Average Spring High Temperature (Celsius)

Longitude

Latit

ude

5

10

15

20

?

Page 67: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Analyzing Spatial Data with fi e l d s : Walkthrough

• Call mKrig or Krig with locations, observations, model parameters, covariance function

• In this case, exponential covariance:θ = 20, correlation range1/λ = 10, signal to noise ratio

• predictSurface computes spatial surface

Page 68: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix

Analyzing Spatial Data with fi e l d s : Walkthrough

• Call mKrig or Krig with locations, observations, model parameters, covariance function

• In this case, exponential covariance:θ = 20, correlation range1/λ = 10, signal to noise ratio

• predictSurface computes spatial surface

• predictSE computes standard errors of surface

Page 69: Intr o duc t i on

Introduction Computation in fi e l d s

Accelerating fi e l d s

Conclusions and Future Work References Appendix