improving subseasonal forecasting in the western u.s. with ... · the subseasonal climate forecast...

1
Improving Subseasonal Forecasting in the Western U.S. with Machine Learning Paulo Orenstein * , Jessica Hwang * , Karl Pfeiffer , Judah Cohen and Lester Mackey * Stanford University, Atmospheric and Environmental Research, Microsoft Research Summary To improve the accuracy of long-term weather forecasts, the Bureau of Reclamation (USBR) and the National Oceanic and Atmospheric Administration (NOAA) launched the Subseasonal Climate Forecast Rodeo, a year-long real-time forecasting challenge, in which participants aimed to predict temperature and precipitation in the western U.S. two to four weeks and four to six weeks in advance. Our machine learning solution is an ensemble of two mod- els: (i) MultiLLR, a local linear regression with multitask model selection; and (ii) AutoKNN, a multitask k -nearest neighbor autoregression. The ensemble significantly out- performs the government baselines as well as the top Rodeo competitor for each target variable and forecast horizon. We also release our SubseasonalRodeo dataset, collected to train and evaluate our forecasting system. Forecast Rodeo Goal Water managers in the western U.S. rely on sub- seasonal forecasts of temperature and precipitation to prepare for droughts and other wet weather extremes. This motivated the USBR and NOAA to conduct the Subseasonal Climate Forecast Rodeo: a real-time fore- casting competition in which, every two weeks, contes- tants submitted four forecasts: for each variable temp: average temperature ( C) prec: total precipitation (mm), predict it at each of two forecast horizons, weeks 3-4: 15-28 days ahead weeks 5-6: 29-42 days ahead. The geographic region of interest is shown in Figure 1, with a total of G = 514 grid points. The contest ran from April 18, 2017 to April 3, 2018. Fig 1: Forecast Rodeo prediction region. Evaluation Let t be a date, and let year(t ), doy(t ), and monthday(t ) denote the associated year, day of the year, and month-day combination (e.g., January 1). For the two-week period beginning on t , associate an observed average temperature or total precipitation y t R G and an observed anomaly and climatology a t = y t - c monthday(t ) , c d , 1 30 X t : monthday(t )=d, 1981year(t )2010 y t . Contestants were judged on the skill between their fore- cast anomalies ˆ a t y t -c monthday(t ) and observed anoma- lies: skill(ˆ a t , a t ) , cos(ˆ a t , a t )= hˆ a t , a t i kˆ a t k 2 ka t k 2 . The overall contest skill was the mean over all dates. Baselines To qualify for a prize, contestants had to achieve higher mean skill than two government bench- marks: (i) a debiased 32-member ensemble version of the operational, physics-based U.S. Climate Forecasting System (CFSv2), and (ii) a damped persistence forecast. SubseasonalRodeo Dataset As the Rodeo did not provide data for training predictive models, we constructed our own dataset from a diverse collection of data sources (see [1] for details): temperature: daily maximum and minimum tempera- ture measurements at 2 meters, in C; the official con- test target temperature variable is tmp2m , tmax+tmin 2 ; precipitation: daily precipitation, in mm; sea surface temperature (SST), sea ice concen- tration (SIC): daily; top three principal components for grid points in the Pacific basin region; Multivariate ENSO index (MEI): bimonthly values; MEI is a scalar summary of the El Ni˜ no/Southern Os- cillation, an ocean-atmosphere coupled climate mode; Madden-Julian oscillation (MJO): daily values of phase and amplitude; MJO is a metric of tropical con- vection influencing western U.S. subseasonal climate; relative humidity and pressure: daily relative humid- ity and pressure near the surface (sigma level 0.995); geopotential height: daily; top three principal com- ponents for height at which 10mb of pressure occurs; North American Multi-Model Ensemble (NMME): monthly mean forecast for each model in a collection of 10 physics-based models were averaged according to the number of target period days that fell in each month; the predictions from each model were then averaged. Models Our forecasting system relies on two regression models. MultiLLR: multitask model selection with full data. predictors : lagged measurements from all data sources in the SubseasonalRodeo dataset; method : multitask backward stepwise procedure adapted to cosine objective, built atop per-grid-point local linear regression (with locality determined by day of the year). AutoKNN: autoregression model with k NN features. predictors : lagged measurements of target variable and anomalies of the 20 most similar historical dates (simi- larity measured as skill between anomaly trajectory lead- ing up to target date and up to each historical date); method : weighted local linear regression, with weights given by 1/(variance of the target anomaly vector). Ensembling For a given target date, the ensemble fore- cast anomalies are the average of the 2 -normalized pre- dicted anomalies of the two models: ˆ a ensemble , 1 2 ˆ a multillr k|ˆ a multillr k 2 + 1 2 ˆ a autoknn kˆ a autoknn k 2 . Prop. Skill of ˆ a ensemble is better than the average skill of ˆ a multillr and ˆ a autoknn whenever that average skill is positive. nbr Dec nbr Nov nbr Oct nbr Sep nbr Aug nbr Jul nbr Jun nbr May nbr Apr nbr Mar nbr Feb nbr Jan target Jan target Feb target Mar target Apr target May target Jun target Jul target Aug target Sep target Oct target Nov target Dec Fig 2: Prec 3-4: Distribution of the month of the most similar neighbor learned by AutoKNN as a function of month of target date. precip_shift86 phase_shift31 precip_shift86_anom rhum_shift44 nmme0_wo_ccsm3_nasa tmp2m_shift86_anom icec_2_shift44 mei_shift59 sst_2_shift44 precip_shift43_anom wind_hgt_10_1_shift44 wind_hgt_10_2_shift44 precip_shift43 tmp2m_shift43_anom icec_1_shift44 icec_3_shift44 sst_1_shift44 sst_3_shift44 tmp2m_shift43 tmp2m_shift86 ones nmme_wo_ccsm3_nasa pres_shift44 0 20 40 60 80 inclusion frequency precipitation, weeks 5-6 phase_shift31 wind_hgt_10_1_shift44 icec_3_shift44 icec_2_shift44 mei_shift59 wind_hgt_10_2_shift44 tmp2m_shift43_anom rhum_shift44 sst_2_shift44 icec_1_shift44 sst_1_shift44 nmme0_wo_ccsm3_nasa tmp2m_shift43 sst_3_shift44 nmme_wo_ccsm3_nasa tmp2m_shift86 tmp2m_shift86_anom pres_shift44 ones 0 30 60 90 120 inclusion frequency temperature, weeks 5-6 Fig 3: Frequency with which each candidate feature was selected in MultiLLR. Pressure, temperature, intercept and NMME are common. Results Contest-period skills The table below shows the aver- age skills for each of our methods and each of the base- lines over the contest period, April 2017 to April 2018. All three of our methods (MultiLLR, AutoKNN, ensemble) outperform the official contest baselines (debiased CFSv2 and damped persistence), and our ensemble outperforms the top Rodeo competitor in all four prediction tasks. task multillr autoknn ens cfsv2 damp top temp 3-4 0.285 0.280 0.341 0.158 0.195 0.2855 temp 5-6 0.237 0.281 0.307 0.219 -0.076 0.235 prec 3-4 0.167 0.215 0.238 0.071 -0.146 0.214 prec 5-6 0.221 0.187 0.241 0.022 -0.161 0.216 Historical forecast evaluation In the period from April 2011 to April 2017, AutoKNN and the ensembled model surpass the reconstructed debiased CFSv2 baseline. Fur- ther ensembling with CFSv2 gives even better skill. task multillr autoknn ens cfsv2 ens-cfsv2 temp 3-4 0.223 0.311 0.307 0.256 0.351 temp 5-6 0.220 0.281 0.296 0.214 0.328 precip 3-4 0.157 0.151 0.189 0.086 0.196 precip 5-6 0.131 0.140 0.170 0.069 0.176 Fig 4: Performance of ens, cfs, and ens-cfs in the historical period. Conclusion Main takeaways from our work: Statistical models offer significant subseasonal forecast improvements over operational physics-based models. Ensembling statistical and physics-based forecasts can produce further improvements. New SubseasonalRodeo dataset for training and eval- uation (https://doi.org/10.7910/DVN/IHBANG). Main References [1 ] J Hwang, P Orenstein, K Pfeiffer, J Cohen, and L Mackey. 2018. Improving Subseasonal Forecasting in the Western US with Machine Learning. arXiv:1809.07394. [2] K Nowak, RS Webb, R Cifelli, and LD Brekke. 2017. Sub-Seasonal Climate Forecast Rodeo. In 2017 AGU Fall Meeting, New Orleans, LA, 11-15 Dec.

Upload: others

Post on 07-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Subseasonal Forecasting in the Western U.S. with ... · the Subseasonal Climate Forecast Rodeo, a year-long real-time forecasting challenge, in which participants aimed

Improving Subseasonal Forecasting in the Western U.S. with Machine Learning

Paulo Orenstein*, Jessica Hwang*, Karl Pfeiffer†, Judah Cohen† and Lester Mackey‡

* Stanford University, † Atmospheric and Environmental Research, ‡ Microsoft Research

SummaryTo improve the accuracy of long-term weather forecasts,the Bureau of Reclamation (USBR) and the NationalOceanic and Atmospheric Administration (NOAA) launchedthe Subseasonal Climate Forecast Rodeo, a year-longreal-time forecasting challenge, in which participants aimedto predict temperature and precipitation in the westernU.S. two to four weeks and four to six weeks in advance.Our machine learning solution is an ensemble of two mod-els: (i) MultiLLR, a local linear regression with multitaskmodel selection; and (ii) AutoKNN, a multitask k-nearestneighbor autoregression. The ensemble significantly out-performs the government baselines as well as the topRodeo competitor for each target variable and forecasthorizon. We also release our SubseasonalRodeo dataset,collected to train and evaluate our forecasting system.

Forecast RodeoGoal Water managers in the western U.S. rely on sub-seasonal forecasts of temperature and precipitation toprepare for droughts and other wet weather extremes.This motivated the USBR and NOAA to conduct theSubseasonal Climate Forecast Rodeo: a real-time fore-casting competition in which, every two weeks, contes-tants submitted four forecasts: for each variable

• temp: average temperature (◦C)

•prec: total precipitation (mm),

predict it at each of two forecast horizons,

•weeks 3-4: 15-28 days ahead

•weeks 5-6: 29-42 days ahead.

The geographic region of interest is shown in Figure 1,with a total of G = 514 grid points. The contest ranfrom April 18, 2017 to April 3, 2018.

Fig 1: Forecast Rodeo prediction region.

Evaluation Let t be a date, and let year(t), doy(t),and monthday(t) denote the associated year, day of theyear, and month-day combination (e.g., January 1). Forthe two-week period beginning on t, associate an observedaverage temperature or total precipitation yt ∈ RG andan observed anomaly and climatology

at = yt − cmonthday(t), cd ,1

30

∑t : monthday(t)=d,1981≤year(t)≤2010

yt.

Contestants were judged on the skill between their fore-cast anomalies at = yt−cmonthday(t) and observed anoma-lies:

skill(at, at) , cos(at, at) =〈at, at〉‖at‖2‖at‖2

.

The overall contest skill was the mean over all dates.

Baselines To qualify for a prize, contestants had toachieve higher mean skill than two government bench-marks: (i) a debiased 32-member ensemble version ofthe operational, physics-based U.S. Climate ForecastingSystem (CFSv2), and (ii) a damped persistence forecast.

SubseasonalRodeo DatasetAs the Rodeo did not provide data for training predictivemodels, we constructed our own dataset from a diversecollection of data sources (see [1] for details):

• temperature: daily maximum and minimum tempera-ture measurements at 2 meters, in ◦C; the official con-test target temperature variable is tmp2m , tmax+tmin

2 ;

•precipitation: daily precipitation, in mm;

• sea surface temperature (SST), sea ice concen-tration (SIC): daily; top three principal componentsfor grid points in the Pacific basin region;

•Multivariate ENSO index (MEI): bimonthly values;MEI is a scalar summary of the El Nino/Southern Os-cillation, an ocean-atmosphere coupled climate mode;

•Madden-Julian oscillation (MJO): daily values ofphase and amplitude; MJO is a metric of tropical con-vection influencing western U.S. subseasonal climate;

• relative humidity and pressure: daily relative humid-ity and pressure near the surface (sigma level 0.995);

•geopotential height: daily; top three principal com-ponents for height at which 10mb of pressure occurs;

•North American Multi-Model Ensemble (NMME):monthly mean forecast for each model in a collectionof 10 physics-based models were averaged according tothe number of target period days that fell in each month;the predictions from each model were then averaged.

ModelsOur forecasting system relies on two regression models.

MultiLLR: multitask model selection with full data.

• predictors : lagged measurements from all data sourcesin the SubseasonalRodeo dataset;

•method : multitask backward stepwise procedure adaptedto cosine objective, built atop per-grid-point local linearregression (with locality determined by day of the year).

AutoKNN: autoregression model with kNN features.

• predictors : lagged measurements of target variable andanomalies of the 20 most similar historical dates (simi-larity measured as skill between anomaly trajectory lead-ing up to target date and up to each historical date);

•method : weighted local linear regression, with weightsgiven by 1/(variance of the target anomaly vector).

Ensembling For a given target date, the ensemble fore-cast anomalies are the average of the `2-normalized pre-dicted anomalies of the two models:

aensemble ,1

2

amultillr

‖|amultillr‖2+

1

2

aautoknn

‖aautoknn‖2.

Prop. Skill of aensemble is better than the average skill ofamultillr and aautoknn whenever that average skill is positive.

nbr

Dec

nbr N

ov

nbr O

ct

nbr S

ep

nbr Aug

nbr Jul

nbr Junnbr Maynbr Apr

nbr Mar

nbr Feb

nbr Jan

targ

et J

anta

rget

Feb

targ

et M

ar

target A

pr target May target Jun target Jul target Aug target Sep target Oct target N

ovtarget D

ec

Fig 2: Prec 3-4: Distribution of the month of the most similar neighbor

learned by AutoKNN as a function of month of target date.

precip_shift86phase_shift31

precip_shift86_anomrhum_shift44

nmme0_wo_ccsm3_nasatmp2m_shift86_anom

icec_2_shift44mei_shift59

sst_2_shift44precip_shift43_anom

wind_hgt_10_1_shift44wind_hgt_10_2_shift44

precip_shift43tmp2m_shift43_anom

icec_1_shift44icec_3_shift44sst_1_shift44sst_3_shift44tmp2m_shift43tmp2m_shift86

onesnmme_wo_ccsm3_nasa

pres_shift44

0 20 40 60 80inclusion frequency

precipitation, weeks 5-6

phase_shift31

wind_hgt_10_1_shift44

icec_3_shift44

icec_2_shift44

mei_shift59

wind_hgt_10_2_shift44

tmp2m_shift43_anom

rhum_shift44

sst_2_shift44

icec_1_shift44

sst_1_shift44

nmme0_wo_ccsm3_nasa

tmp2m_shift43

sst_3_shift44

nmme_wo_ccsm3_nasa

tmp2m_shift86

tmp2m_shift86_anom

pres_shift44

ones

0 30 60 90 120inclusion frequency

temperature, weeks 5-6

Fig 3: Frequency with which each candidate feature was selected in

MultiLLR. Pressure, temperature, intercept and NMME are common.

ResultsContest-period skills The table below shows the aver-age skills for each of our methods and each of the base-lines over the contest period, April 2017 to April 2018. Allthree of our methods (MultiLLR, AutoKNN, ensemble)outperform the official contest baselines (debiased CFSv2and damped persistence), and our ensemble outperformsthe top Rodeo competitor in all four prediction tasks.

task multillr autoknn ens cfsv2 damp top

temp 3-4 0.285 0.280 0.341 0.158 0.195 0.2855temp 5-6 0.237 0.281 0.307 0.219 -0.076 0.235prec 3-4 0.167 0.215 0.238 0.071 -0.146 0.214prec 5-6 0.221 0.187 0.241 0.022 -0.161 0.216

Historical forecast evaluation In the period from April2011 to April 2017, AutoKNN and the ensembled modelsurpass the reconstructed debiased CFSv2 baseline. Fur-ther ensembling with CFSv2 gives even better skill.

task multillr autoknn ens cfsv2 ens-cfsv2

temp 3-4 0.223 0.311 0.307 0.256 0.351temp 5-6 0.220 0.281 0.296 0.214 0.328precip 3-4 0.157 0.151 0.189 0.086 0.196precip 5-6 0.131 0.140 0.170 0.069 0.176

Fig 4: Performance of ens, cfs, and ens-cfs in the historical period.

Conclusion Main takeaways from our work:

•Statistical models offer significant subseasonal forecastimprovements over operational physics-based models.

•Ensembling statistical and physics-based forecasts canproduce further improvements.

•New SubseasonalRodeo dataset for training and eval-uation (https://doi.org/10.7910/DVN/IHBANG).

Main References

[1 ] J Hwang, P Orenstein, K Pfeiffer, J Cohen, and LMackey. 2018. Improving Subseasonal Forecasting inthe Western US with Machine Learning. arXiv:1809.07394.

[2 ] K Nowak, RS Webb, R Cifelli, and LD Brekke. 2017.Sub-Seasonal Climate Forecast Rodeo. In 2017 AGUFall Meeting, New Orleans, LA, 11-15 Dec.