matthew greenstein | meteo 485 | apr. 26, 2004 using neural networks and lagged climate indices to...

Matthew GreensteinMatthew Greenstein || METEO 485METEO 485 || Apr. 26, Apr. 26, 20042004

Using Neural Networks and Using Neural Networks and Lagged Climate Indices to Lagged Climate Indices to

Predict Monthly Predict Monthly Temperature and Temperature and

Precipitation AnomaliesPrecipitation Anomalies

OverviewOverview

• To correlate monthly temperature and precipitation anomalies with a number of climate indices lagged several months

• To use neural networks because they simulate non-linear interactions between variables(as opposed to linear regression)

OverviewOverview

1. Introduction to neural networks

2. Data collection

• Temperature and precipitation anomalies

• Climate indices

3. Methods of attack (“how to”)

4. Results

5. Discussion

6. Future ideas

Neural NetworksNeural Networks


• Creates categorical and numerical forecasts

• Uses categorical and numerical predictors


• Layered regression equations

• Predictors are linearly regressed (weighted) to create the hidden layer of intermediate forecasts

• Hidden layer forecasts used as predictors to produce either another hidden layer (and so on) or a final forecast


• Layered regression captures non-linear relationships, i.e. mimics whatever equation best fits the data

• You don’t need to know form of equation ahead of time

• Each dot is a node

• Human brain:10 billion nodes

• Neural net:10 – 1000 nodes


Training a networkTraining a network

1. Training data (66% of dataset) run through neural net / forecasts generated

2. Error calculated skill scores

3. Neural net tuned (weights changed) toimprove scores

4. Repeat fixed number of times (epochs) or until weights stop changing



)()1()()( twtwtw

Etw

Learning rate: how much weights are changed compared to error slope

Momentum: use aportion of previousweight change forless “jumpiness”



)()1()()( twtwtw

Etw

Decay: eliminates useless weights / interactions


WEKAWEKA• Waikato Environment for Knowledge Analysis (University of Waikato, New Zealand)

• Weka: flightless bird with an inquisitive nature found only in New Zealand

• Set values oflearning rate, momentum,number of nodes, & epochsto fit data well without overfitting

• Overfitting = fit too perfectly to training data performs poorly on new data

Data CollectionData Collection

What data is needed?What data is needed?• Monthly anomalies

• 6 regions of the U.S. (NW, SW, NC, SC, NE, SE)

• Temperature and precipitation

• U.S. Climate Division data since 1895 available

• Climate indices

• Monthly values lagged 2, 3, & 4 months

• Since 1948 available

Anomaly DataAnomaly Data

• Divide country into 6 pieces (NW/SW/NC/SC/NE/SE)

Anomaly DataAnomaly Data

• Obtained average monthly anomaly data for the U.S. Climate Divisions in each of the 6 regions

• Dataset from Jeremy Ross

• Averaged using GrADS

• Monthly, 1950 – present

• °F, inches

Climate Index DataClimate Index Data

• Obtained from CDC’s climate indices page:

• http://www.cdc.noaa.gov/ClimateIndices/

• From 1950 – present

• SOI, PNA, NAO, EPO, MEI, Nino3, Nino1+2, Nino3.4, Nino4, AO, NOI, WP, NP, QBO

Climate Index DataClimate Index Data

Some years & months missing!Some years & months missing!

• No SOI until 1951

• No AO until 1958

• No PNA for June & July

• No EPO for Aug & Sept

• WEKA throws out cases with missing data No forecasts were made for Aug – Jan !!

• Need to re-run without PNA and EPO to get a neural net that can be used during any month

Data ProcessingData Processing

Excel fileExcel file

• Row for each month (Jan 1950 – Dec 2000)

• Columns of month; each anomaly; andeach index lagged 2, 3, and 4 months

Data ProcessingData Processing

Conversion to ARFF Conversion to ARFF / / Attribute-Relation File Format

• Save as a CSV

• Fix blanks: ,, replaced by ,?,

• Change file extension: .csv .arff

Method IMethod I

• Following procedure followed for each anomaly

(NE T, NE P, SE T, SE P, SW T, SW P, NW T, NW P)

• Build neural nets

• Vary learning rate (L),momentum (M), layers,epochs

• Decay

• Indices and month predict anomaly

• Takes a long time to try many possibilities

Method IMethod I

Skill scoresSkill scores

• Calculated with remaining 34% of dataset

• Many scores provided

• 2 used

1. Correlation coefficient (r)

2. Root relative squared error• Relative to error if prediction

= average of actual values

• Outliers are penalized strongly

ii

iii

aa

ap

2

2

)(

)(

Results IResults I

NE TemperatureNE Temperature

• Linear regression: r = 0.1067, RRSE = 102.25%

• Neural nets:Neural Net Layers

8,4,2 9,6

Momentum

0.3 0.3

Learning Rate

0.3 0.3

Epochs 500 500

R -0.0205 -0.1190

RRSE 100.25% 100.30%

Results IResults I

SE TemperatureSE Temperature



4,2 15,7

Momentum

0.3 0.3

Learning Rate

0.2 0.2

Epochs 500 500

R 0.0372 -0.0389

RRSE 100.13% 100.74%

Results IResults I

SW TemperatureSW Temperature



9,3 9,3

Momentum

0.2 0.4

Learning Rate

0.2 0.2

Epochs 500 500

R 0.063 0.025

RRSE 99.99% 99.97%

Results IResults I

NW TemperatureNW Temperature



Auto 9,5

Momentum

0.05 0.8

Learning Rate

0.2 0.2

Epochs 500 500

R 0.176 0.224

RRSE 98.66% 99.08%

Results IResults I

NE PrecipitationNE Precipitation



5,3 5,3

Momentum

0.5 0.2

Learning Rate

0.2 0.1

Epochs 500 2000

R 0.061 0.115

RRSE 99.732% 99.999%

Results IResults I

SE PrecipitationSE Precipitation



8,3 Auto

Momentum

0.7 0.5

Learning Rate

0.2 0.2

Epochs 500 500

R 0.0651 0.1179

RRSE 99.85% 101.05%

Results IResults I

SW PrecipitationSW Precipitation



9,5 9,5

Momentum

0.5 0.3

Learning Rate

0.2 0.2

Epochs 500 500

R 0.280 0.289

RRSE 92.25% 96.12%

Results IResults I

NW PrecipitationNW Precipitation



9,5 Auto

Momentum

0.3 0.3

Learning Rate

0.5 0.6

Epochs 500 500

R 0.052 0.098

RRSE 99.91% 102.71%

Results IResults I

• Putrid results !!

• Not worth trying NC/SC… away from oceans

• RRSE ~ 100%, r ~ 0.10

• No big improvement over linear regression

• SW Precipitation predictedthe best (although still bad)…El Nino-related?

Method IIMethod II

• Predict positive or negative anomaly instead of actual value!

• Anomalies changed to binary (1, 0) predictands

• Vary indices used

• Does that cause significant changes?

• This became the most interesting part of the study

• Limited time available: NE T, NE P, SW P

Method IIMethod II

Skill scoresSkill scores

• Many scores provided

• 3 used

1. Percent Correctly Classified

2. TP (True Positive) Rate

3. TN (True Negative) Rate

Results IIResults II

NE Temperature NE Temperature

Neural Net Setup

Auto No Month

Only Nino:Nino 3.4

No Nino’s

No Nino’s, SOI, MEI

More Epochs

Momentum

0.5 0.5 0.5 0.5 0.5 0.5

Learning Rate

0.5 0.5 0.5 0.5 0.5 0.5

Epochs 500 500 500 500 500 1000

Classified Correctly

56.04%

54.59% 51.69% 54.59% 53.62% 56.52%

TP Rate .617 .628 .606 .617 .553 .628

TN Rate .513 .478 .442 .487 .522 .513

Auto = WEKA automatically chooses node setup


NE PrecipitationNE Precipitation

Neural Net Setup

Auto No Month

Only Nino:Nino 3.4

No Nino’s


Auto

Momentum

0.3 0.3 0.3 0.3 0.3 0.3

Learning Rate

0.2 0.2 0.2 0.2 0.2 0.2

Epochs 500 500 500 500 500 1000


54.11%

48.79% 53.14% 49.28% 51.21% 53.62%

TP Rate .664 .573 .691 .645 .755 .645

TN Rate .402 .392 .351 .320 .237 .412

** Changing the epochs results in overfitting!


SW PrecipitationSW Precipitation

Neural Net Setup

Auto No Month

Only Nino:Nino 3.4

No Nino’s


Momentum

0.1 0.1 0.1 0.1 0.1

Learning Rate

0.2 0.2 0.2 0.2 0.2

Epochs 500 500 500 500 500


60.39% 60.39% 62.32% 61.83% 59.42%

TP Rate .611 .611 .646 .628 .841

TN Rate .596 .596 .596 .606 .298

** Changing the epochs did not change the ‘Only Nino: Nino 3.4” value

DiscussionDiscussion

• NE Temperature 94 +, 113 –

• Predict negative correct 54.59%

• Best neural net: 56.52% correctly classified

• NE Precipitation 110 +, 97 –

• Predict positive correct 53.14%

• Best neural net: 54.11 % correctly classified

• SW Precipitation 113 +, 94 –

• Predict positive correct 54.59%

• Best neural net: 62.32% correctly classified


• These types of neural nets do not provide significant skill over ‘guessing’

• Similar to Method I, not significant difference in skill of logistic regression versus neural nets

• There is some sensitivity to which variables are included in the neural net… even though the decay factor would attempt to eliminate useless interactions

• Different sensitivities in each region

• Using the ‘auto’ setting for layers produced better results


• The study was originally supposed to predict the anomaly, but predicting the sign of the anomaly seems to show more promise

• Time constraints prevented a more in depth look at Method II possible Meteo 485 project in future semesters

• Missing June – Sept data could have caused problems with this study

Future WorkFuture Work

1. Obtain missing PNA & EPO data

2. Build neural nets for other regionsof the country for Method II

3. Use different lag times and combinations of lag times

4. Use different climate indices

5. Omit different indices from current set

6. Try other tools that WEKA offers

Special thanks to…Special thanks to…

• Jeremy Ross

For gathering anomaly data

• Climate Diagnostics Center (CDC)

For climate indices

• Dr. George Young

Neural net info from Meteo 474 notes

Useful InfoUseful Info

• WEKA website with software downloads: http://www.cs.waikato.ac.nz/ml/weka/

• Results data file

• ARFF fileindicesbinary.arff

results data file

matthew greenstein | meteo 485 | apr. 26, 2004 using neural networks and lagged climate indices to...

Documents

missing data

climate division data

networktraining data

inchesclimate index

average monthly anomaly

lagged climate indices

monthly temperature

number of climate indices