five-year progress in the performance of air quality forecast models: analysis on categorical...

Five-year Progress in the Performance of Air Quality Forecast Models: Analysis on

Categorical Statistics for the National Air Quality Forecast Capacity (NAQFC)

Daiwen Kang1, Rohit Mathur2, Brian Eder2, Kenneth Schere2, and S. Trivikrama Rao2

1Computer Science Corporation

2Atmospheric Modeling and Analysis DivisionNERL/U.S. EPA

8th Annual CMAS Conference, Chapel Hill, NC, October 19 – 21, 2009

Motivations

• Assess the progress in performance improvements for categorical metrics of the NAQFC system for O3 forecasts over the past 5 years

• Identify categorical metrics that can well characterize AQF performance for categorical forecasts

• Assess AQI-based categorical performances

• Propose guidelines for AQF categorical evaluations based on the analysis of KF bias-adjusted forecasts and human forecasts.

Traditional Categorical Metrics

Observed Exceedances & Non-ExceedancesObserved Exceedances & Non-Exceedances versusversus

Forecast Exceedances & Non-ExceedancesForecast Exceedances & Non-Exceedances

a b

c d

Fo

reca

st E

xcee

danc

e

N

o

Yes

No YesObserved Exceedance

Ab c

a b c d%

100

Ba b

b d

FARa

a b%

100

C SIb

a b d

100%

a b

c d

%100

db

bH

Observation

For

ecas

t

AQI Definition and Categories

Air Quality Index(AQI) Values

Levels of Health Concern

Colors

When the AQIis in this range:

...air quality conditions are:

...as symbolized by this color:

0 to 50 Good Green (1)

51 to 100 Moderate Yellow (2)

101 to 150 Unhealthy forSensitive Groups

Orange (3)

151 to 200 Unhealthy Red (4)

201 to 300 Very Unhealthy Purple (5)

301 to 500 Hazardous Maroon (6)

LoLopLoHi

LoHip IBPC

BPBP

III

Where: Ip= the index for pollutant p (O3 in

this case)

Cp = the rounded concentration of

pollutant p

BPHi = the breakpoint that is ≥ Cp

BPLo = the breakpoint that is ≤ Cp

IHi = the AQI value corresponding to BPHi

ILo = the AQI value corresponding to BPLo

AQI-based Metrics Definition

io

ifo

i N

NcH

ifo

if

io

ifo

i NNN

NcCSI

m

i

io

m

i

ifo

N

NeH

3

3

N

NoH

m

i

ifo

1

m

i

ifo

m

i

ifo

ifo

m

i

if

io

m

i

ifo

NN

N

NNN

NoCSI

1

1

1

1

2)()(3

3

ifo

io

m

i

if

m

i

ifo

NNN

NeCSI

m

i

if

ifo

m

i

if

N

NNeFAR

3

3

)(

where i is the AQI index category (1, 2, 3, 4, 5) or the color scheme (green, yellow, orange, red, purple), and are the number of observed and forecast instances in the ith category, respectively, is the correctly forecast instances in the ith category, and is the total number of records.

ioN

ifoN

ifN

N

Categorical Stats over 3x domain (1)

Accuracy (A)

Bias (B)

The accuracy is always high (>90%) because the correctly forecast non-exceedence points dominate. Bias indicates that the model has always over estimated execeedences through the years.


eFAR

eH

False alarm ratios are quite high across all the years ranging from 70 to 90% on average. Mean hit rates are generally greater than 40% except in the year of 2006; during 2006, a big transition for the meteorology model was made from Eta to WRF.


Critical Success Index (CSI)

Critical success index reflects the combination of false alarm ratio and hit rate. A forecast system can have both high FAR and high H or low FAR and low H, both resulting in low CSI. High CSI values indicate moderate FAR and reasonable H.

Metropolitan Statistical Area (MSA)

Local forecasters generally forecast the maximum AQI value that they expect to occur anywhere within an MSA; and then verify this forecast with the maximum monitored value within that area.Here is an example of Charlotte MSA that is comprised of 8 counties, 7 in NC, 1 in SC. There are 8 AQS monitors in those counties, 7 in NC, 1 in SC. And The MSA is represented by 103, 12-km grid cells by the NAQFC.

O3AQI

MSAs used in this research

• Atlanta• Charlotte• Dallas• Houston• Washington DC

Kalman Filter Bias-adjustment

• Kalman Filter (KF) was used to bias-adjust the raw model forecasts for the continental U.S. domain during 2005-2008 summer seasons at all locations where AIRNow monitoring data were available.

• The categorical performance of both raw model and KF forecasts was assessed over: 1. all sites (paired observation-model grid cell) within the domain, 2. sites within all MSAs, and 3. MSA value (the maximum value out of all the sites within the MSA for each day)

Human

NAQFC

Exceedance Hit Rate Exceedance False Alarm Rate

Because the NAQFC is positively biased, it tends to capture a higher percentage of exceedance hit rates, but this also results in a higher percentage of false alarm ratios. The critical success index results were mixed over MSAs, but on average the NAQFC performed better than Human Forecasts.

NAQFC Categorical Performance vs. Human Forecast

0

20

40

60

80

All Atlanta Charlotte Dallas DC Houston

eCSI

(%)

0

20

40

60

80


eFAR

(%)

0

20

40

60

80

100


eH (%

)

cH for the raw model and KF forecasts at all sites and MSAs

Domain All Sites

0102030405060708090

100

2005 2006 2007 2008

cH (

%)

1 Model 1 KF 2 Model 2 KF 3 Model 3 KF 4 Model 4 KF

MSA all sites

0

10

20

30

40

50

60

70

80

90

100

2005 2006 2007 2008

cH (

%)


MSA

0

10

20

30

40

50

60

70

80

90

100

2005 2006 2007 2008

cH

(%

)


Domain All Sites: All AIRNow sites within the domain are included in the calculationMSA All Sites: All the AIRNow sites which are located in one of the MSAs listed earlierMSA: The maximum values from both AIRNow sites and the model forecasts within each of the MSAs are used to generate the stats.

cCSI for the raw model and KF forecasts at all sites and MSAs

Domain All Sites

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

cCS

I (%

)

1 Model

1 KF

2 Model

2 KF

3 Model

3 KF

4 Model

4 KF

MSA All Sites

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

cC

SI

(%)

1 Model

1 KF

2 Model

2 KF

3 Model

3 KF

4 Model

4 KF

MSA

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

cCS

I (%

)

1 Model

1 KF

2 Model

2 KF

3 Model

3 KF

4 Model

4 KF

eH for the raw model and KF forecasts at all sites and MSAs

Domain All Sites

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

eH (

%)

Model

KF

MSA All Sites

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

eH (

%)

Model

KF

MSA

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

eH (

%)

Model

KF

The hit rates are significantly increased when evaluated over MSAs compared to over individual sites. KF bias-adjusted forecasts improved hit rate, especially when the raw model was significantly flawed with systematic biases as in 2006.

eFAR for the raw model and KF forecasts at all sites and MSAs

Domain All Sites

0

10

20

30

40

50

60

70

80

2005 2006 2007 2008

eFA

R (

%)

Model

KF

MSA All Sites

0

10

20

30

40

50

60

70

80

2005 2006 2007 2008

eFA

R (

%)

Model

KF

MSA

0

10

20

30

40

50

60

70

80

2005 2006 2007 2008

eFA

R (

%)

Model

KF

False alarm ratios are significantly lower when evaluated over MSAs than over the individual sites. The KF bias-adjusted forecasts significantly reduced FAR for all the situations across all the years.

eCSI for the raw model and KF forecasts at all sites and MSAs

Domain All Sites

0

10

20

30

40

50

60

2005 2006 2007 2008

eCS

I (%

)

Model

KF

MSA All Sites

0

10

20

30

40

50

60

2005 2006 2007 2008

eCS

I (%

)

Model

KF

MSA

0

10

20

30

40

50

60

2005 2006 2007 2008

eCS

I (%

)

Model

KF

eCSI values almost doubled when evaluated over MSAs compared to those evaluated over the individual sites. The KF bias-adjusted forecasts had larger eCSI values than the raw model forecasts, especially when evaluated over the individual sites.

oH for the raw model and KF forecasts at all sites and MSAs

Domain All Sites

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

oH

(%

)

Model

KF

MSA All Sites

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

oH

(%

)

Model

KF

MSA

0

10

20

30

40

50

60

70

80

90

2005 2006 2007 2008

oH

(%

)

Model

KF

The overall hit rates were consistent and stable and slowly improving over the years for both the KF and raw model forecasts. KF forecasts always had larger oH values than the raw model. oH values decreased when evaluated over MSAs (but still > 50%) due to overestimation at low AQIs compared to those evaluated over individual sites.

oCSI for the raw model and KF forecasts at all sites and MSAs

MSA All Sites

0

10

20

30

40

50

60

70

80

2005 2006 2007 2008

oC

SI

(%)

Model

KF

Domain All Sites

0

10

20

30

40

50

60

70

80

2005 2006 2007 2008

oC

SI

(%)

Model

KF

MSA

0

10

20

30

40

50

60

70

80

2005 2006 2007 2008

oC

SI

(%)

Model

KF

The overall critical success index (oCSI) is quite consistent and increases over the years. The oCSI values are lower when evaluated over MSAs than over individual site because the MSA values are the maximum of all the sites within the MSA resulting in lower hit rate for low AQI values (overestimate low AQI).

Minimum values of H and CSI during the years 2005-2008 over the continental US domain and MSAs

Stats

Type

eH (%) oH (%) eCSI (%) oCSI (%)

Raw Model

KFRaw

ModelKF

Raw Model

KFRaw

ModelKF

All Sites 14.5 39.3 47.2 61.2 12.2 29.4 52.3 63.2

MSA 47.2 61.2 52.2 59.5 32.0 43.3 35.4 42.3

(1) MSA based analysis provides a more objective assessment of the practical use of the guidance, consistent with the way local forecasts are typically developed;

(2) Bias-adjustment further improves the predictive skill of the system thereby improving the utility of the forecast products.

Guidelines for AQF models

Stats

Type

eH (%) oH (%) eCSI (%) oCSI (%)

All Sites 30 50 20 50

MSA 50 50 30 30

These guideline values are in between the minimum values (rounded) of raw model and the KF-adjusted forecasts, which set (1) as targets for what the raw models can realistically achieve as a result of model improvements in the short term; (2) as a reference that any AQF models should perform when combined with KF-adjustment.

Conclusions

• Comparisons indicate that the NAQFC performed at least as well as, if not better than, the human forecasts over MSAs.

• The categorical performance of NAQFC has been consistent and stable over the years from 2005 to 2008, with the exception in 2006 when the model underwent significant changes resulting in degraded categorical performance.

• Kalman filter bias-adjustment resulted in improvement over almost all categorical statistics, especially when the raw model was systematically biased in 2006.

Conclusions

• Hit Rate (H), False Alarm Ratio (FAR), and Critical Success Index (CSI) are three most appropriate metrics to gauge the categorical performance of an AQF; CSI is even better than H and FAR, because it reflects the combination of H and FAR.

• The AQI based H and CSI over all sites and MSAs are good indicators of overall performance for categorical forecasts.

• Based on the analysis in this study, the following guidelines are proposed: eH >= 30%, eCSI >= 20%, oH and oCSI >= 50% for all sites; eH and oH >= 50%, eCSI and oCSI >= 30% for MSAs.

Acknowledgements The authors would like to thank the NOAA/EPA

air quality forecast program and the EPA’s AIRNow program for providing forecasted and observed O3 data. Thanks also goes to Scott Jackson for providing the Human forecast data.

Disclaimer The United States Environmental Protection Agency

through its Office of Research and Development funded and managed the research described here. It has been subjected to Agency’s administrative review and approved for presentation.

five-year progress in the performance of air quality forecast models: analysis on categorical...

Documents