five-year progress in the performance of air quality forecast models: analysis on categorical...
TRANSCRIPT
Five-year Progress in the Performance of Air Quality Forecast Models: Analysis on
Categorical Statistics for the National Air Quality Forecast Capacity (NAQFC)
Daiwen Kang1, Rohit Mathur2, Brian Eder2, Kenneth Schere2, and S. Trivikrama Rao2
1Computer Science Corporation
2Atmospheric Modeling and Analysis DivisionNERL/U.S. EPA
8th Annual CMAS Conference, Chapel Hill, NC, October 19 – 21, 2009
Motivations
• Assess the progress in performance improvements for categorical metrics of the NAQFC system for O3 forecasts over the past 5 years
• Identify categorical metrics that can well characterize AQF performance for categorical forecasts
• Assess AQI-based categorical performances
• Propose guidelines for AQF categorical evaluations based on the analysis of KF bias-adjusted forecasts and human forecasts.
Traditional Categorical Metrics
Observed Exceedances & Non-ExceedancesObserved Exceedances & Non-Exceedances versusversus
Forecast Exceedances & Non-ExceedancesForecast Exceedances & Non-Exceedances
a b
c d
Fo
reca
st E
xcee
danc
e
N
o
Yes
No YesObserved Exceedance
Ab c
a b c d%
100
Ba b
b d
FARa
a b%
100
C SIb
a b d
100%
a b
c d
%100
db
bH
Observation
For
ecas
t
AQI Definition and Categories
Air Quality Index(AQI) Values
Levels of Health Concern
Colors
When the AQIis in this range:
...air quality conditions are:
...as symbolized by this color:
0 to 50 Good Green (1)
51 to 100 Moderate Yellow (2)
101 to 150 Unhealthy forSensitive Groups
Orange (3)
151 to 200 Unhealthy Red (4)
201 to 300 Very Unhealthy Purple (5)
301 to 500 Hazardous Maroon (6)
LoLopLoHi
LoHip IBPC
BPBP
III
Where: Ip= the index for pollutant p (O3 in
this case)
Cp = the rounded concentration of
pollutant p
BPHi = the breakpoint that is ≥ Cp
BPLo = the breakpoint that is ≤ Cp
IHi = the AQI value corresponding to BPHi
ILo = the AQI value corresponding to BPLo
AQI-based Metrics Definition
io
ifo
i N
NcH
ifo
if
io
ifo
i NNN
NcCSI
m
i
io
m
i
ifo
N
NeH
3
3
N
NoH
m
i
ifo
1
m
i
ifo
m
i
ifo
ifo
m
i
if
io
m
i
ifo
NN
N
NNN
NoCSI
1
1
1
1
2)()(3
3
ifo
io
m
i
if
m
i
ifo
NNN
NeCSI
m
i
if
ifo
m
i
if
N
NNeFAR
3
3
)(
where i is the AQI index category (1, 2, 3, 4, 5) or the color scheme (green, yellow, orange, red, purple), and are the number of observed and forecast instances in the ith category, respectively, is the correctly forecast instances in the ith category, and is the total number of records.
ioN
ifoN
ifN
N
Categorical Stats over 3x domain (1)
Accuracy (A)
Bias (B)
The accuracy is always high (>90%) because the correctly forecast non-exceedence points dominate. Bias indicates that the model has always over estimated execeedences through the years.
Categorical Stats over 3x domain (2)
eFAR
eH
False alarm ratios are quite high across all the years ranging from 70 to 90% on average. Mean hit rates are generally greater than 40% except in the year of 2006; during 2006, a big transition for the meteorology model was made from Eta to WRF.
Categorical Stats over 3x domain (3)
Critical Success Index (CSI)
Critical success index reflects the combination of false alarm ratio and hit rate. A forecast system can have both high FAR and high H or low FAR and low H, both resulting in low CSI. High CSI values indicate moderate FAR and reasonable H.
Metropolitan Statistical Area (MSA)
Local forecasters generally forecast the maximum AQI value that they expect to occur anywhere within an MSA; and then verify this forecast with the maximum monitored value within that area.Here is an example of Charlotte MSA that is comprised of 8 counties, 7 in NC, 1 in SC. There are 8 AQS monitors in those counties, 7 in NC, 1 in SC. And The MSA is represented by 103, 12-km grid cells by the NAQFC.
O3AQI
MSAs used in this research
• Atlanta• Charlotte• Dallas• Houston• Washington DC
Kalman Filter Bias-adjustment
• Kalman Filter (KF) was used to bias-adjust the raw model forecasts for the continental U.S. domain during 2005-2008 summer seasons at all locations where AIRNow monitoring data were available.
• The categorical performance of both raw model and KF forecasts was assessed over: 1. all sites (paired observation-model grid cell) within the domain, 2. sites within all MSAs, and 3. MSA value (the maximum value out of all the sites within the MSA for each day)
Human
NAQFC
Exceedance Hit Rate Exceedance False Alarm Rate
Because the NAQFC is positively biased, it tends to capture a higher percentage of exceedance hit rates, but this also results in a higher percentage of false alarm ratios. The critical success index results were mixed over MSAs, but on average the NAQFC performed better than Human Forecasts.
NAQFC Categorical Performance vs. Human Forecast
0
20
40
60
80
All Atlanta Charlotte Dallas DC Houston
eCSI
(%)
0
20
40
60
80
All Atlanta Charlotte Dallas DC Houston
eFAR
(%)
0
20
40
60
80
100
All Atlanta Charlotte Dallas DC Houston
eH (%
)
cH for the raw model and KF forecasts at all sites and MSAs
Domain All Sites
0102030405060708090
100
2005 2006 2007 2008
cH (
%)
1 Model 1 KF 2 Model 2 KF 3 Model 3 KF 4 Model 4 KF
MSA all sites
0
10
20
30
40
50
60
70
80
90
100
2005 2006 2007 2008
cH (
%)
1 Model 1 KF 2 Model 2 KF 3 Model 3 KF 4 Model 4 KF
MSA
0
10
20
30
40
50
60
70
80
90
100
2005 2006 2007 2008
cH
(%
)
1 Model 1 KF 2 Model 2 KF 3 Model 3 KF 4 Model 4 KF
Domain All Sites: All AIRNow sites within the domain are included in the calculationMSA All Sites: All the AIRNow sites which are located in one of the MSAs listed earlierMSA: The maximum values from both AIRNow sites and the model forecasts within each of the MSAs are used to generate the stats.
cCSI for the raw model and KF forecasts at all sites and MSAs
Domain All Sites
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
cCS
I (%
)
1 Model
1 KF
2 Model
2 KF
3 Model
3 KF
4 Model
4 KF
MSA All Sites
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
cC
SI
(%)
1 Model
1 KF
2 Model
2 KF
3 Model
3 KF
4 Model
4 KF
MSA
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
cCS
I (%
)
1 Model
1 KF
2 Model
2 KF
3 Model
3 KF
4 Model
4 KF
eH for the raw model and KF forecasts at all sites and MSAs
Domain All Sites
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
eH (
%)
Model
KF
MSA All Sites
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
eH (
%)
Model
KF
MSA
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
eH (
%)
Model
KF
The hit rates are significantly increased when evaluated over MSAs compared to over individual sites. KF bias-adjusted forecasts improved hit rate, especially when the raw model was significantly flawed with systematic biases as in 2006.
eFAR for the raw model and KF forecasts at all sites and MSAs
Domain All Sites
0
10
20
30
40
50
60
70
80
2005 2006 2007 2008
eFA
R (
%)
Model
KF
MSA All Sites
0
10
20
30
40
50
60
70
80
2005 2006 2007 2008
eFA
R (
%)
Model
KF
MSA
0
10
20
30
40
50
60
70
80
2005 2006 2007 2008
eFA
R (
%)
Model
KF
False alarm ratios are significantly lower when evaluated over MSAs than over the individual sites. The KF bias-adjusted forecasts significantly reduced FAR for all the situations across all the years.
eCSI for the raw model and KF forecasts at all sites and MSAs
Domain All Sites
0
10
20
30
40
50
60
2005 2006 2007 2008
eCS
I (%
)
Model
KF
MSA All Sites
0
10
20
30
40
50
60
2005 2006 2007 2008
eCS
I (%
)
Model
KF
MSA
0
10
20
30
40
50
60
2005 2006 2007 2008
eCS
I (%
)
Model
KF
eCSI values almost doubled when evaluated over MSAs compared to those evaluated over the individual sites. The KF bias-adjusted forecasts had larger eCSI values than the raw model forecasts, especially when evaluated over the individual sites.
oH for the raw model and KF forecasts at all sites and MSAs
Domain All Sites
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
oH
(%
)
Model
KF
MSA All Sites
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
oH
(%
)
Model
KF
MSA
0
10
20
30
40
50
60
70
80
90
2005 2006 2007 2008
oH
(%
)
Model
KF
The overall hit rates were consistent and stable and slowly improving over the years for both the KF and raw model forecasts. KF forecasts always had larger oH values than the raw model. oH values decreased when evaluated over MSAs (but still > 50%) due to overestimation at low AQIs compared to those evaluated over individual sites.
oCSI for the raw model and KF forecasts at all sites and MSAs
MSA All Sites
0
10
20
30
40
50
60
70
80
2005 2006 2007 2008
oC
SI
(%)
Model
KF
Domain All Sites
0
10
20
30
40
50
60
70
80
2005 2006 2007 2008
oC
SI
(%)
Model
KF
MSA
0
10
20
30
40
50
60
70
80
2005 2006 2007 2008
oC
SI
(%)
Model
KF
The overall critical success index (oCSI) is quite consistent and increases over the years. The oCSI values are lower when evaluated over MSAs than over individual site because the MSA values are the maximum of all the sites within the MSA resulting in lower hit rate for low AQI values (overestimate low AQI).
Minimum values of H and CSI during the years 2005-2008 over the continental US domain and MSAs
Stats
Type
eH (%) oH (%) eCSI (%) oCSI (%)
Raw Model
KFRaw
ModelKF
Raw Model
KFRaw
ModelKF
All Sites 14.5 39.3 47.2 61.2 12.2 29.4 52.3 63.2
MSA 47.2 61.2 52.2 59.5 32.0 43.3 35.4 42.3
(1) MSA based analysis provides a more objective assessment of the practical use of the guidance, consistent with the way local forecasts are typically developed;
(2) Bias-adjustment further improves the predictive skill of the system thereby improving the utility of the forecast products.
Guidelines for AQF models
Stats
Type
eH (%) oH (%) eCSI (%) oCSI (%)
All Sites 30 50 20 50
MSA 50 50 30 30
These guideline values are in between the minimum values (rounded) of raw model and the KF-adjusted forecasts, which set (1) as targets for what the raw models can realistically achieve as a result of model improvements in the short term; (2) as a reference that any AQF models should perform when combined with KF-adjustment.
Conclusions
• Comparisons indicate that the NAQFC performed at least as well as, if not better than, the human forecasts over MSAs.
• The categorical performance of NAQFC has been consistent and stable over the years from 2005 to 2008, with the exception in 2006 when the model underwent significant changes resulting in degraded categorical performance.
• Kalman filter bias-adjustment resulted in improvement over almost all categorical statistics, especially when the raw model was systematically biased in 2006.
Conclusions
• Hit Rate (H), False Alarm Ratio (FAR), and Critical Success Index (CSI) are three most appropriate metrics to gauge the categorical performance of an AQF; CSI is even better than H and FAR, because it reflects the combination of H and FAR.
• The AQI based H and CSI over all sites and MSAs are good indicators of overall performance for categorical forecasts.
• Based on the analysis in this study, the following guidelines are proposed: eH >= 30%, eCSI >= 20%, oH and oCSI >= 50% for all sites; eH and oH >= 50%, eCSI and oCSI >= 30% for MSAs.
Acknowledgements The authors would like to thank the NOAA/EPA
air quality forecast program and the EPA’s AIRNow program for providing forecasted and observed O3 data. Thanks also goes to Scott Jackson for providing the Human forecast data.
Disclaimer The United States Environmental Protection Agency
through its Office of Research and Development funded and managed the research described here. It has been subjected to Agency’s administrative review and approved for presentation.