machine learning for scientific applications

110
Machine Learning for Scientific Applications http://davidlary.info David Lary Need: Accounting for complex multi-variate context which is often not fully described by theory Monday, August 11, 14

Upload: david-lary

Post on 10-Jun-2015

202 views

Category:

Technology


0 download

DESCRIPTION

Machine Learning for Scientific Applications at the European Space Agency Summer School on Remote Sensing

TRANSCRIPT

Page 1: Machine Learning for Scientific Applications

Machine Learning for Scientific Applications

http://davidlary.infoDavid Lary

Need: Accounting for complex multi-variate contextwhich is often not fully described by theory

Monday, August 11, 14

Page 2: Machine Learning for Scientific Applications

Long Term Data Sets: Uncertainty, Cross-Calibration,

Data Fusion & Machine Learning

Motivated by Data Assimilation

With examples from Land, Atmosphere & OceanMonday, August 11, 14

Page 3: Machine Learning for Scientific Applications

Bias Detection“Who may discern his errors, ....” Psalm 19:12

7

Monday, August 11, 14

Page 4: Machine Learning for Scientific Applications

Why is it an issue?

• With fusion of multiple datasets bias is often an issue (very relevant for climate variables).

• Data assimilation is a least squares or a Best Linear Unbiased Estimator (BLUE)

8

Monday, August 11, 14

Page 5: Machine Learning for Scientific Applications

.... runs deeper still• Instrument teams have a keen sense of faithfully reporting the

data, as it is, warts and all. They are naturally loath to empirically correct biases; they would like to theoretically understand the cause of the bias and data issues from first principles.

The Earth System is so complex, with many interacting processes, and often the instruments are also complex, this is not always possible.

Residual data issues can, and usually do, remain.

• Modelers know that data bias exist, but are very reticent to make changes to data products.

.... we therefore have a problem of closure.

9

Monday, August 11, 14

Page 6: Machine Learning for Scientific Applications

The problem!

• Biases are ubiquitous, not all of them can be explained theoretically. Yet, we typically need to fuse multiple datasets to construct long-term time series and/or improve global coverage.

• If the biases are not corrected before data fusion we introduce further problems, such as ...

• spurious trends, leading to the possibility of unsuitable policy decisions.

• when assimilation is involved, the suboptimal use of observations, non-physical structures in the analysis, biases in the assimilated fields, and extrapolation of biases due to multivariate background constraints.

10

Monday, August 11, 14

Page 7: Machine Learning for Scientific Applications

A Further Problem

The instruments whose data we would like to fuse are often not making coincident measurements in time or space.

Imperative to inter-compare observations in their appropriate context.

11

Monday, August 11, 14

Page 8: Machine Learning for Scientific Applications

Integrate multiple satellite datasets for applications

The comparison above shows the total ozone column observed by EP TOMS and Aura OMI. The high resolution coverage that Aura OMI provides is clearly seen. In the particular event shown there is a tropopause fold event over Texas.

12

Monday, August 11, 14

Page 9: Machine Learning for Scientific Applications

An Example

13 representativenessMonday, August 11, 14

Page 10: Machine Learning for Scientific Applications

14

Monday, August 11, 14

Page 11: Machine Learning for Scientific Applications

0.5 1 1.5 2 2.5 3 3.5 4

x 10ï6

0

0.02

0.04

0.06

0.08

0.1

0.12

O3 v.m.r.

Rel

ativ

e Fr

eque

ncy

All years 01 (1900 K<e< 2300 K, ï90o<qel< ï79o)

Aura MLS O3 (23)

CLAES v9 O3 (207)

ISAMS v10 O3 (19)

UARS MLS v5 183 GHz O3 (379)

UARS MLS v5 205 GHz O3 (490)

SAGE 2 v6.2 O3 (21)

SBUV v8 O3 (33)

15

Monday, August 11, 14

Page 12: Machine Learning for Scientific Applications

Geophysical Insights

(a) (b)

(c) (d)

Figure 2: N2O Equivalent PV latitude - potential temperature cross sectionsof (a) representativeness uncertainty (v.m.r.), (b) observational uncertainty(v.m.r.), (c) obvservation (v.m.r.), and (d) analyses uncertainty (v.m.r.). Thedata used is from the Upper Atmosphere Research Satellite (UARS) CryogenicLimb Array Etalon Spectrometer (CLAES) version 9 for January 1992.

3

16

Monday, August 11, 14

Page 13: Machine Learning for Scientific Applications

Bias is Spatially Dependent

−75 −60 −45 −30 −15 0 15 30 45 60 75

250

300

350

400

500

600

700

1000

1200

1500

2000

2500

Equivalent PV Latitude

Pote

ntia

l Tem

pera

ture

(K)

% Bias (UARS MLS v5 183 GHz O3 − HALOE v19 O3) for January of all years

−30

−20

−10

0

10

20

30

−75 −60 −45 −30 −15 0 15 30 45 60 75

250

300

350

400

500

600

700

1000

1200

1500

2000

2500

Equivalent PV Latitude

Pote

ntia

l Tem

pera

ture

(K)

% Bias (UARS MLS v5 183 GHz O3 − HALOE v19 O3) for January of all years

−30

−20

−10

0

10

20

30

17

Monday, August 11, 14

Page 14: Machine Learning for Scientific Applications

So what can we do about this?

.... we do not have a theoretical explanation

18

Monday, August 11, 14

Page 15: Machine Learning for Scientific Applications

Machine Learningfor when our understanding is incomplete

19

... and that is quite often!

Monday, August 11, 14

Page 16: Machine Learning for Scientific Applications

What is Machine Learning?

• Machine learning is a sub-field of artificial intelligence that is concerned with the design and development of algorithms that allow computers to learn the behavior of data sets empirically.

• A major focus of machine-learning research is to produce (induce) empirical models from data automatically.

• This approach is usually used because of the absence of adequate and complete theoretical models that are more desirable conceptually.

20

Monday, August 11, 14

Page 17: Machine Learning for Scientific Applications

What is Machine Learning?

The use of machine learning can actually help us to construct a more complete theoretical model, as it allows us to determine which factors are statistically capable of providing the data mappings we seek— e.g. the multi-variate, non-linear, non-parametric mapping between satellite radiances and a suite of ocean products.

21

Monday, August 11, 14

Page 18: Machine Learning for Scientific Applications

Machine Learning

Is for:

Regression

➡ Multivariate, non-linear, non-parametric

Classification

➡ Supervised and unsupervised

22

Monday, August 11, 14

Page 19: Machine Learning for Scientific Applications

Machine Learning

Comes in Several Flavors, for example:

• Neural Networks

• Support Vector Machines

• Gaussian Process Models

• Decision Trees

• Random Forests

23

Monday, August 11, 14

Page 20: Machine Learning for Scientific Applications

Machine Learning Regression

x1 x2 x3 x4 x5 � xn y

Inpu

ts

Out

put(

s)

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

y = f (x1, x2, x3, x4 , x5,…, xn )

Multivariate, non-linear, non-parametricn can be very large

Training Data

Monday, August 11, 14

Page 21: Machine Learning for Scientific Applications

Machine Learning Supervised Classification

x1 x2 x3 x4 x5 � xn y

Inpu

ts

Out

put(

s)

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Multivariate, non-linear, non-parametricn can be very large

Training Data

Monday, August 11, 14

Page 22: Machine Learning for Scientific Applications

Machine Learning Unsupervised Classification

Multivariate, non-linear, non-parametricn can be very large

x1 x2 x3 x4 x5 � xn

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Inpu

ts

Training Data

Monday, August 11, 14

Page 23: Machine Learning for Scientific Applications

Neural Networks

In a neural network model simple nodes (neurons), are connected together to form a network of nodes. Its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow.

27

Monday, August 11, 14

Page 24: Machine Learning for Scientific Applications

Support Vector Machines

Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression.

Intuitively, an SVM model is a representation of the training examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible.

Vladimir Vapnik

28

Monday, August 11, 14

Page 25: Machine Learning for Scientific Applications

Gaussian Process ModelsGaussian processes (GPs) (Rasmussen and Williams 2006) fit a multivariate Gaussian probability distribution to any set of regressors, allowing for analytic inference. As a principled Bayesian technique, GPs go beyond SVMs by allowing us to supply a full posterior distribution for our regressors, giving us both mean estimates as well as an indication of the uncertainty in them.

29

Monday, August 11, 14

Page 27: Machine Learning for Scientific Applications

A key issue is training dataset size, the bigger

the better!

..... until we run out of memory

31

Monday, August 11, 14

Page 28: Machine Learning for Scientific Applications

Variations in Stratospheric Cly Between 1991 and the present

David Lary, Anne Douglass, Darryn Waugh, Richard Stolarski, Paul Newman, Hamse Mussa

• Data can be biased, maybe as a function of many parameters.

• May be observing a proxy for what we really want to know.32

Monday, August 11, 14

Page 29: Machine Learning for Scientific Applications

ozone reductions there (SOCOL and E39C), and the modelwith the largest cold bias in the Antarctic lower strato-sphere in spring (LMDZrepro) simulates very low ozone.

CCMs show a large range of ozone trends over thepast 25 years (see left panels in Figure 3-26 of Chapter 3)and large differences from observations. Some of thesedifferences may in part be related to differences in the sim-ulated Cly, e.g., E39C and SOCOL show a trend smallerthan observed, whereas AMTRAC and UMETRAC showa trend larger than observed in extrapolar area weightedmean column ozone. However, other factors also con-tribute, e.g., biases in tropospheric ozone (Austin andWilson, 2006).

The CCM evaluation discussed above and in Eyringet al. (2006) has guided the level of confidence we placeon each model simulation. The CCMs vary in their skillin representing different processes and characteristics ofthe atmosphere. Because the focus here is on ozone

recovery due to declining ODSs, we place importance onthe models’ ability to correctly simulate stratospheric Clyas well as the representation of transport characteristicsand polar temperatures. Therefore, more credence is givento those models that realistically simulate these processes.Figure 6-7 shows a subset of the diagnostics used to eval-uate these processes and CCMs shown with solid curvesin Figures 6-7, 6-8, 6-10 and 6-12 to 6-14 are those thatare in good agreement with the observations in Figure6-7. However, these line styles should not be over-interpreted as both the ability of the CCMs to representthese processes as well as the relative importance of Cly,temperature, and transport vary between different regionsand altitudes. Also, analyses of model dynamics in theArctic, and differences in the chlorine budget/partitioningin these models, when available, might change this evalu-ation for some regions and altitudes.

21st CENTURY OZONE LAYER

6.26

Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs.Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Clyin the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aurasatellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb(see Figure 4-8), respectively.

50 hPa 80°S October 50 hPa 80°S October

Cl y–

Cl y

(198

0) (

ppbv

)

Cl y

(ppb

v)

Year Year

33

Monday, August 11, 14

Page 30: Machine Learning for Scientific Applications

ozone reductions there (SOCOL and E39C), and the modelwith the largest cold bias in the Antarctic lower strato-sphere in spring (LMDZrepro) simulates very low ozone.

CCMs show a large range of ozone trends over thepast 25 years (see left panels in Figure 3-26 of Chapter 3)and large differences from observations. Some of thesedifferences may in part be related to differences in the sim-ulated Cly, e.g., E39C and SOCOL show a trend smallerthan observed, whereas AMTRAC and UMETRAC showa trend larger than observed in extrapolar area weightedmean column ozone. However, other factors also con-tribute, e.g., biases in tropospheric ozone (Austin andWilson, 2006).

The CCM evaluation discussed above and in Eyringet al. (2006) has guided the level of confidence we placeon each model simulation. The CCMs vary in their skillin representing different processes and characteristics ofthe atmosphere. Because the focus here is on ozone

recovery due to declining ODSs, we place importance onthe models’ ability to correctly simulate stratospheric Clyas well as the representation of transport characteristicsand polar temperatures. Therefore, more credence is givento those models that realistically simulate these processes.Figure 6-7 shows a subset of the diagnostics used to eval-uate these processes and CCMs shown with solid curvesin Figures 6-7, 6-8, 6-10 and 6-12 to 6-14 are those thatare in good agreement with the observations in Figure6-7. However, these line styles should not be over-interpreted as both the ability of the CCMs to representthese processes as well as the relative importance of Cly,temperature, and transport vary between different regionsand altitudes. Also, analyses of model dynamics in theArctic, and differences in the chlorine budget/partitioningin these models, when available, might change this evalu-ation for some regions and altitudes.

21st CENTURY OZONE LAYER

6.26

Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs.Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Clyin the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aurasatellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb(see Figure 4-8), respectively.

50 hPa 80°S October 50 hPa 80°S October

Cl y–

Cl y

(198

0) (

ppbv

)

Cl y

(ppb

v)

Year Year

A large range of Cly in the model simulations

Constrained by a limited number of Cly observations

33

Monday, August 11, 14

Page 31: Machine Learning for Scientific Applications

• We need to know the distribution of inorganic chlorine (Cly) in the stratosphere to:

• Attribute changes in stratospheric ozone to changes in halogens.

• Assess the realism of chemistry-climate models.

34

Monday, August 11, 14

Page 32: Machine Learning for Scientific Applications

Cly=HCl+ClONO2+ClO+HOCl+2Cl2O2+2Cl2

Long time-series

SporadicLong time-series

Since 2004

Estimating Cly is hampered by lack of observations

Estimating Cly is hampered by inter-instrument biases35

Monday, August 11, 14

Page 33: Machine Learning for Scientific Applications

Using PDFs for Bias Detection

0.8 1 1.2 1.4 1.6 1.8 2x 10ï9

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

HCl v.m.r.

Rel

ativ

e Fr

eque

ncy

2005/01 (460 K<e< 590 K, 49o<qel< 61o)

ACE v2.2 HCl (75)Aura MLS HCl (1544)HALOE v19 HCl (101)

http://www.pdfcentral.info/

HALOE -Aura

HCl

If we now repeat this globally for all periods of overlap

36

Monday, August 11, 14

Page 34: Machine Learning for Scientific Applications

0 1 2 3 40

1

2

3

4

HALOE HCl (ppbv)

ATM

OS

HCl (

ppbv

)

Slope = 1.05Intercept = 0.23 ppbv

Data1:1Weighted Fit

HCl Inter-comparisons

37

Monday, August 11, 14

Page 35: Machine Learning for Scientific Applications

0 1 2 3 40

1

2

3

4

HALOE HCl (ppbv)

ACE

HCl v

2.2

(ppb

v)

Slope = 1.18Intercept = −0.050 ppbv

Data1:1Weighted Fit

HCl Inter-comparisons

37

Monday, August 11, 14

Page 36: Machine Learning for Scientific Applications

0 1 2 30

1

2

3

HALOE HCl (ppbv)

MLS

HCl

(ppb

v)

Slope = 1.09Intercept = 0.070 ppbv

Data1:1Weighted FitFit

HCl Inter-comparisons

37

Monday, August 11, 14

Page 37: Machine Learning for Scientific Applications

0 1 2 30

1

2

3

HALOE HCl (ppbv)

MLS

HCl

(ppb

v)

Slope = 1.09Intercept = 0.070 ppbv

Data1:1Weighted FitFit

0 1 2 30

1

2

3

HALOE HCl (ppbv) NN adjustedM

LS H

Cl (p

pbv)

Slope = 0.995Iintercept = 0.0093 ppbv

Data1:1Weighted Fit

HCl Inter-comparisons

37

Monday, August 11, 14

Page 38: Machine Learning for Scientific Applications

Neurological algorithmsInputsOutputs

Process

38

Monday, August 11, 14

Page 39: Machine Learning for Scientific Applications

An example neural network

Inputs

Outputs

Process

39

Monday, August 11, 14

Page 40: Machine Learning for Scientific Applications

An example neural network

Inputs

Outputs

Process

39

Objective design of neural networksusing genetic algorithms

Monday, August 11, 14

Page 41: Machine Learning for Scientific Applications

An example neural network

40

Monday, August 11, 14

Page 42: Machine Learning for Scientific Applications

Re-calibration using a Neural Network

0.5 1 1.5 2 2.5 3 3.5

x 10ï9

0.5

1

1.5

2

2.5

3

3.5

x 10ï9

Targets T

Out

puts

A,

Line

ar F

it: A

=(0.

97)T

+(5eï1

1)

HCl Training Outputs vs. Targets, R=0.98739

Training Data PointsBest Linear FitA = T

0.5 1 1.5 2 2.5 3 3.5x 10ï9

0.5

1

1.5

2

2.5

3

3.5

x 10ï9

Targets T

Out

puts

A,

Line

ar F

it: A

=(0.

98)T

+(2.

9eï1

1)

HCl Validation Outputs vs. Targets, R=0.99232

Validation Data PointsBest Linear FitA = T

41

Monday, August 11, 14

Page 43: Machine Learning for Scientific Applications

Re-calibration using a Neural Network

0.5 1 1.5 2 2.5 3 3.5

x 10ï9

0.5

1

1.5

2

2.5

3

3.5

x 10ï9

Targets T

Out

puts

A,

Line

ar F

it: A

=(0.

97)T

+(5eï1

1)

HCl Training Outputs vs. Targets, R=0.98739

Training Data PointsBest Linear FitA = T

0.5 1 1.5 2 2.5 3 3.5x 10ï9

0.5

1

1.5

2

2.5

3

3.5

x 10ï9

Targets T

Out

puts

A,

Line

ar F

it: A

=(0.

98)T

+(2.

9eï1

1)

HCl Validation Outputs vs. Targets, R=0.99232

Validation Data PointsBest Linear FitA = T

Totally independentvalidation

41

Monday, August 11, 14

Page 44: Machine Learning for Scientific Applications

Long-term continuity42

Monday, August 11, 14

Page 45: Machine Learning for Scientific Applications

Long-term continuity

Applied Neural NetworkRe-calibration to HALOE

42

Monday, August 11, 14

Page 46: Machine Learning for Scientific Applications

1995 2000 20050

0.5

1

1.5

2

2.5

3

3.5

4 x 10ï9

Year

Cl y

Monthly average 2o

800 K525 K6 Year Age5 Year Age4 Year Age3 Year Age2 Year Age

1995 2000 20050

0.5

1

1.5

2

2.5

3

3.5

4 x 10ï9

Year

Cl y

Monthly average 61o

800 K525 K6 Year Age5 Year Age4 Year Age3 Year Age2 Year Age

October

Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ.

Long-term continuity for Cly43

Monday, August 11, 14

Page 47: Machine Learning for Scientific Applications

1995 2000 20050

0.5

1

1.5

2

2.5

3

3.5

4 x 10ï9

Year

Cl y

Monthly average 2o

800 K525 K6 Year Age5 Year Age4 Year Age3 Year Age2 Year Age

1995 2000 20050

0.5

1

1.5

2

2.5

3

3.5

4 x 10ï9

Year

Cl y

Monthly average 61o

800 K525 K6 Year Age5 Year Age4 Year Age3 Year Age2 Year Age

October

Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ.

Long-term continuity for Clyozone reductions there (SOCOL and E39C), and the modelwith the largest cold bias in the Antarctic lower strato-sphere in spring (LMDZrepro) simulates very low ozone.

CCMs show a large range of ozone trends over thepast 25 years (see left panels in Figure 3-26 of Chapter 3)and large differences from observations. Some of thesedifferences may in part be related to differences in the sim-ulated Cly, e.g., E39C and SOCOL show a trend smallerthan observed, whereas AMTRAC and UMETRAC showa trend larger than observed in extrapolar area weightedmean column ozone. However, other factors also con-tribute, e.g., biases in tropospheric ozone (Austin andWilson, 2006).

The CCM evaluation discussed above and in Eyringet al. (2006) has guided the level of confidence we placeon each model simulation. The CCMs vary in their skillin representing different processes and characteristics ofthe atmosphere. Because the focus here is on ozone

recovery due to declining ODSs, we place importance onthe models’ ability to correctly simulate stratospheric Clyas well as the representation of transport characteristicsand polar temperatures. Therefore, more credence is givento those models that realistically simulate these processes.Figure 6-7 shows a subset of the diagnostics used to eval-uate these processes and CCMs shown with solid curvesin Figures 6-7, 6-8, 6-10 and 6-12 to 6-14 are those thatare in good agreement with the observations in Figure6-7. However, these line styles should not be over-interpreted as both the ability of the CCMs to representthese processes as well as the relative importance of Cly,temperature, and transport vary between different regionsand altitudes. Also, analyses of model dynamics in theArctic, and differences in the chlorine budget/partitioningin these models, when available, might change this evalu-ation for some regions and altitudes.

21st CENTURY OZONE LAYER

6.26

Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs.Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Clyin the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aurasatellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb(see Figure 4-8), respectively.

50 hPa 80°S October 50 hPa 80°S October

Cl y–

Cl y

(198

0) (

ppbv

)

Cl y

(ppb

v)

Year Year

43

Monday, August 11, 14

Page 48: Machine Learning for Scientific Applications

1995 2000 20050

0.5

1

1.5

2

2.5

3

3.5

4 x 10ï9

Year

Cl y

Monthly average 2o

800 K525 K6 Year Age5 Year Age4 Year Age3 Year Age2 Year Age

October

Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ.

Long-term continuity for Cly

ozone reductions there (SOCOL and E39C), and the modelwith the largest cold bias in the Antarctic lower strato-sphere in spring (LMDZrepro) simulates very low ozone.

CCMs show a large range of ozone trends over thepast 25 years (see left panels in Figure 3-26 of Chapter 3)and large differences from observations. Some of thesedifferences may in part be related to differences in the sim-ulated Cly, e.g., E39C and SOCOL show a trend smallerthan observed, whereas AMTRAC and UMETRAC showa trend larger than observed in extrapolar area weightedmean column ozone. However, other factors also con-tribute, e.g., biases in tropospheric ozone (Austin andWilson, 2006).

The CCM evaluation discussed above and in Eyringet al. (2006) has guided the level of confidence we placeon each model simulation. The CCMs vary in their skillin representing different processes and characteristics ofthe atmosphere. Because the focus here is on ozone

recovery due to declining ODSs, we place importance onthe models’ ability to correctly simulate stratospheric Clyas well as the representation of transport characteristicsand polar temperatures. Therefore, more credence is givento those models that realistically simulate these processes.Figure 6-7 shows a subset of the diagnostics used to eval-uate these processes and CCMs shown with solid curvesin Figures 6-7, 6-8, 6-10 and 6-12 to 6-14 are those thatare in good agreement with the observations in Figure6-7. However, these line styles should not be over-interpreted as both the ability of the CCMs to representthese processes as well as the relative importance of Cly,temperature, and transport vary between different regionsand altitudes. Also, analyses of model dynamics in theArctic, and differences in the chlorine budget/partitioningin these models, when available, might change this evalu-ation for some regions and altitudes.

21st CENTURY OZONE LAYER

6.26

Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs.Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Clyin the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aurasatellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb(see Figure 4-8), respectively.

50 hPa 80°S October 50 hPa 80°S October

Cl y–

Cl y

(198

0) (

ppbv

)

Cl y

(ppb

v)

Year Year

43

Monday, August 11, 14

Page 49: Machine Learning for Scientific Applications

44

Monday, August 11, 14

Page 50: Machine Learning for Scientific Applications

45

Monday, August 11, 14

Page 51: Machine Learning for Scientific Applications

Other uses of machine learning

• Cross calibration of vegetation indices from AVHRR, MODIS, SPOT and SeaWIFS

• Inferring CO2 fluxes from vegetation indices and surface temperature

• Inferring ocean pigment concentrations and other parameters

• Inferring drought stress and endophyte infection in cacao (coffee)

• Learning the chaotically tumbling orbit of the Hubble space telescope

• Detecting online ebay fraud

• Acceleration of expensive code elements

46

Monday, August 11, 14

Page 52: Machine Learning for Scientific Applications

Another applicationdissolved organic carbon

47

Monday, August 11, 14

Page 53: Machine Learning for Scientific Applications

48

Monday, August 11, 14

Page 54: Machine Learning for Scientific Applications

48

Monday, August 11, 14

Page 55: Machine Learning for Scientific Applications

48

Monday, August 11, 14

Page 56: Machine Learning for Scientific Applications

48

Monday, August 11, 14

Page 57: Machine Learning for Scientific Applications

48

Monday, August 11, 14

Page 58: Machine Learning for Scientific Applications

48

Monday, August 11, 14

Page 59: Machine Learning for Scientific Applications

48

Monday, August 11, 14

Page 60: Machine Learning for Scientific Applications

48

Monday, August 11, 14

Page 61: Machine Learning for Scientific Applications

Method used to estimate DOC R

SeaWiFS bands GP NL 0.99977

MODIS bands GP NL 0.9997

All bands GP NL 0.99901

UV & SeaWiFS bands GP NL 0.99899

All bands NN 0.95859

UV & SeaWiFS bands NN 0.94609

MODIS bands NN 0.92585

SeaWiFS bands NN 0.91653

49

Monday, August 11, 14

Page 62: Machine Learning for Scientific Applications

5

10

15

0

5

0

10

0

15

1

0.99

0.95

0.9

0.8

0.7

0.6

0.50.4

0.30.20.10

Stan

dard

dev

iatio

n

Co r r e l a t i o n Co e f f i ci e n

tR

MS

D

A

B

C

D

E

F

G

HGaussian Process Models50

Monday, August 11, 14

Page 63: Machine Learning for Scientific Applications

Relative Importance of the Inputs

Wavelength Relative Importance

Rrs490 0.00087123

Rrs555 0.011976

Rrs670 1.5876

Rrs510 9.8423

Rrs443 13.0898

Rrs412 20.2553

The GPM hyper-parameters give an indication of the relative importance of the inputs. For the DOC SeaWiFS bands the best inputs

are those with the smallest values, here they are sorted in order of importance

Mos

t Im

port

ant

Leas

t Im

port

ant

51

Monday, August 11, 14

Page 64: Machine Learning for Scientific Applications

−0.5 0 0.5 1 1.5 20

5

10

15

20

25

30

35

40

a412−a443

Salin

itySalinity

DataPolynomial (r2=0.928)NN (r2=0.933)SVM (r2=0.933)

52

Monday, August 11, 14

Page 65: Machine Learning for Scientific Applications

Visibility

Variable R

Td

q

T

U

RH

SLP

-0.29

-0.26

-0.19

-0.18

0.13

0.05

53

Monday, August 11, 14

Page 66: Machine Learning for Scientific Applications

High Resolution Identification of Dust Sources Using Machine

Learning and Remote Sensing DataAnnette Walker and David J. Lary

A42A-08

Monday, August 11, 14

Page 67: Machine Learning for Scientific Applications

NRL High-resolution Dust Source Database

20030820 NRL DEP20030820 NRL DEP

Iran

Pakistan

Iran

Pakistan

• 10 years of DEP (2 yr MSG/RGB) imagery• COAMPS 10 m wind overlays • Surface weather plots • ENVI (Gis-like software)• NGDC topographical 10ºX10º tiles• Overlay 0.25º grid or use Google Earth (GE)

• Dust source area entered into database (cursor location tool = 1km precision)• Cross-correlate land and water features using maps, atlases, Landsat images (detailed topographic, geographic, and geomorphic information, GE) • Technical and governmental reports

Approach and Methodology

20110630 NRL MSG/RGB

Saudi Arabia

20030820 MODIS True Color

Monday, August 11, 14

Page 68: Machine Learning for Scientific Applications

NRL High-resolution Dust Source Database

20030820 NRL DEP20030820 NRL DEP

Iran

Pakistan

Iran

Pakistan

• 10 years of DEP (2 yr MSG/RGB) imagery• COAMPS 10 m wind overlays • Surface weather plots • ENVI (Gis-like software)• NGDC topographical 10ºX10º tiles• Overlay 0.25º grid or use Google Earth (GE)

• Dust source area entered into database (cursor location tool = 1km precision)• Cross-correlate land and water features using maps, atlases, Landsat images (detailed topographic, geographic, and geomorphic information, GE) • Technical and governmental reports

Approach and Methodology

20110630 NRL MSG/RGB

Saudi Arabia

20030820 MODIS True Color20030820 NRL DEP

Iran

Pakistan

Monday, August 11, 14

Page 69: Machine Learning for Scientific Applications

NRL High-resolution Dust Source Database

Solid red and purple shapes identify dust source areas located using DEP and MSG.

SW Asia DSD East Asia DSD

Mongolia

Saudi Arabia

Monday, August 11, 14

Page 70: Machine Learning for Scientific Applications

Self-Organizing MapSelf-organizing maps (SOMs) are a data visualization and unsupervised classification technique invented by Professor Teuvo Kohonen (Kohonen 1982; 1990) that reduce the dimensions of data through the use of self-organizing neural networks.

They help us address the issue that humans simply cannot visualize high dimensional data.

Monday, August 11, 14

Page 71: Machine Learning for Scientific Applications

Self-Organizing MapSOMs reduce dimensionality by producing a map that objectively plots the similarities of the data by grouping similar data items together.

SOMs learn to classify input vectors according to how they are grouped in the input space.

SOMs learn both the distribution and topology of the input vectors they are trained on. This approach allows SOMs to accomplish two things, reduce dimensions and display similarities.

Monday, August 11, 14

Page 72: Machine Learning for Scientific Applications

Detecting Dust Sources

Monday, August 11, 14

Page 73: Machine Learning for Scientific Applications

Self Organizing Map Classification

7 BandsMODIS MCD43C3

bihemispherical reflectance

Monday, August 11, 14

Page 74: Machine Learning for Scientific Applications

All 1000-Classes mapped for North Africa

Monday, August 11, 14

Page 75: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 76: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 77: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 78: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 79: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 80: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 81: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 82: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 83: Machine Learning for Scientific Applications

Libyan Dust Event: May 9, 2010 (8Z – 12Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 84: Machine Learning for Scientific Applications

Plumes originate on leeward side ofAl Jabal al Akhdar where drainage occurs along slopes.

Corresponding SOM-Classes: 49, 93, 94

Libyan Dust Event: May 9, 2010 (6Z – 8Z)Jabal al Akhdar (الجبل األخضر ‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)A coastal mountain range with height 1.0-1.5 km.

Monday, August 11, 14

Page 85: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 86: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 87: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 88: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 89: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 90: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 91: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 92: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 93: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 94: Machine Learning for Scientific Applications

Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z)

Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.

Monday, August 11, 14

Page 95: Machine Learning for Scientific Applications

Selected SOM Classes

Chad: Bodélé Depression

NRL MSG-RGB 20110109

Source area is not designated in first pass of MODIS reflectance and land surface classification.

1000 SOM Classes

Monday, August 11, 14

Page 96: Machine Learning for Scientific Applications

Selected Classes with Class 137

Chad: Bodélé Depression

NRL MSG-RGB 20110109

Class 137 maps diatom sediment in depression.

1000 SOM Classes

Monday, August 11, 14

Page 97: Machine Learning for Scientific Applications

Selected Classes Without Class 137

Chad: Bodélé Depression

NRL MSG-RGB 201101091000 SOM Classes

Class 137 maps diatom sediment in depression.

Monday, August 11, 14

Page 98: Machine Learning for Scientific Applications

Solid black circles/ovals show plume source

Corresponding SOM Classes within open circles/ovals

Northern Sahara: 36, 40, 63, 100 Sahel: 147, 229, 230, 405

West Africa: Feb 2, 2011 13Z

Monday, August 11, 14

Page 99: Machine Learning for Scientific Applications

Selected Classes for North Africa (This involves 40 distinct classes)

Monday, August 11, 14

Page 100: Machine Learning for Scientific Applications

Jan 1, 2006 True Color

Jan 1, 2006 NRL DEP

Sources along New Mexico/Texas border

The North American sources have a different spectral signature than those we saw in SW Asia

Agricultural on high planesBlue dessert areas

Monday, August 11, 14

Page 101: Machine Learning for Scientific Applications

Sources in Arizona and Colorado

Apr 17, 2006 NRL DEP

Apr 17, 2006 True color

Monday, August 11, 14

Page 102: Machine Learning for Scientific Applications

Selected Classes for North America (n=64)

Monday, August 11, 14

Page 103: Machine Learning for Scientific Applications

All 1000-Classes mapped for South America

Monday, August 11, 14

Page 104: Machine Learning for Scientific Applications

All 1000-Classes mapped for South America

Blue colored SOM-Classes are concentrated in Atacama and Salar de Uyuni deserts

White areas are salt flats

Monday, August 11, 14

Page 105: Machine Learning for Scientific Applications

South America: Bolivia and Chile

July 18, 2010 MODIS Terra True Color

Monday, August 11, 14

Page 106: Machine Learning for Scientific Applications

South America: Bolivia and Chile

July 18, 2010 MODIS Terra True Color Selected SOM-Classes in 200s, 300s, and 400s

Monday, August 11, 14

Page 107: Machine Learning for Scientific Applications

• SOMs provide an effective mechanism for automating the identification of dust sources.

• Using the SOMs let us globally map dust sources at high resolution 1-10 km.

• Saved time in finding dust sources while comparing to satellite imagery.

• This can be done in real time to have dynamically changing dust sources.

Monday, August 11, 14

Page 108: Machine Learning for Scientific Applications

Model&

Exis+ng&New&

Model&

Exis+ng&New&

78

Monday, August 11, 14

Page 109: Machine Learning for Scientific Applications

Model&

Exis+ng&New&

Model&

Exis+ng&New&

• Personalized Health Care

• Proactive Health Care System

• Business Analytics

• Smart Logistics

• Disaster Response

• Fraud Detection

http://holistics3.comMonday, August 11, 14

Page 110: Machine Learning for Scientific Applications

Visualiza1on(

Decision(Support(

Machine(Learning(

Insight(&(Discovery(

Exis%ng(

• Social(Media(

• Socioeconomic,(Census(

• News(feeds(• Environmental(

• Weather(

• Satellite(• Sensors(• Health(• Economic(

New(

• Business(Analy%cs(2.0(• UAVs(• HyperHspectral(Imaging(

• Smart(Dust(

• Wearable(Sensors(

• Autonomous(Cars(

Simula%on(

• Global(Weather(Models(

• Economic(Models(

• Earthquake(Models(

GigaPop(Pipe(

TACC

Monday, August 11, 14