developing a global chlorophyll product from integrating...

Developing a global chlorophyll product by integrating regionally-specific algorithms

Timothy S. Moore and Janet W. CampbellUniversity of New Hampshire

Durham, NH 03824

ABSTRACT

Phytoplankton are the primary source of optical variability in the ocean, and thus their concentration in surface waters can be observed by ocean color sensors. Currently, there are two families of algorithms that derive chlorophyll a concentration (Chl) – a proxy for phytoplankton biomass. These are the empirical and semi-analytic algorithms. Such algorithms, parameterized from in situ data, are currently used operationally to produce global maps of Chl. However, it is generally accepted that a single universal algorithm is not accurate everywhere, regardless of which type of algorithm is used.

Regional differences in global empirical algorithms (e.g., OC4v4) have been shown to exhibit biases specific to the geographic ocean basin (e.g., Southern Ocean, North Atlantic). Similarly, semi-analytic algorithms require empirical parameterizations derived from in situ data that are often regionally specific. Most models parameterize the relationship between the inherent optical properties (IOPs) of the water (absorption, scattering) and its constituents. On the global scale, IOPs vary over two orders of magnitude (Barnard et al. 1998) due to variations in particle size, pigment composition and packaging of algal cells, and overall particle composition. Since the constituents can vary from place to place and seasonally, it is believed that model parameterizations have to be locally derived for a particular water type, thus requiring the algorithm to decide when and where to use appropriate parameters.

INTRODUCTION

The standard NASA suite of empirical chlorophyll a (Chl) algorithms – represented by OC4v4 for SeaWiFS and OC3M for MODIS (O’Reilly et al, 2000) - are based on statistically-derived relationships between Chl and the ratio of water-leaving radiance in two or more spectral bands. Such algorithms, parameterized from in situ data, are currently used operationally to produce global maps of Chl. These are intended to serve as global algorithms, with the understood caveat that regionally-parameterized algorithms may be more suited (i.e., tuned) to their respective regions (e.g., MEDOC3), and as a consequence show better performance (i.e., reduced uncertainty) compared to the global algorithms. Currently, the average relative errors of the global algorithms, defined as the mean of the absolute differences between measured Chl and the derived Chl, as assessed using the NOMAD version 2 data set, exceed the desired goal of 35%. The average relative difference OC3M is 56% as computed with this data set. One way to reduce the overall uncertainty in global Chl retrieval is to integrate regional algorithms which have better performance than the global algorithm into a merged product.

While it is generally accepted that a single universal algorithm is not accurate everywhere, little attention has been given to the challenge of how to combine different algorithms that are parameterized for specific regions. How does one decide when to choose one algorithm over another? And how can abrupt discontinuities in product retrievals be avoided at the boundaries where two algorithms meet? The decision of which algorithm to select has sometimes been based on fixed geographic boundaries (Mueller and Lange, 1989; Longhurst, 2007; Hoepffner et al, 1999), or criteria such as Chl or radiance levels (Gordon et al, 1983; Morel and Berthon, 1989). These approaches often create artificial boundaries in the ocean where the algorithm switching occurs. To avoid this problem, Carder et al. (1999) used a linear weighting function to blend results from algorithms parameterized for nutrient-replete “packaged” and nutrient-deplete “unpackaged” algal populations. Moore et al (2001) implemented a fuzzy logic switching algorithm that blended retrievals for different optical water types based on radiance levels. The work reported here follows the fuzzy logic methodology as a means of blending algorithms that are specifically tailored for different geographic ocean regions into a merged global product. This is demonstrated with 9km global imagery from MODIS Aqua at the monthly time scale. The end result is a global Chl map with reduced uncertainties across different oceanic regions that approach the desired error level of 35%.

METHODS

Regional OC3M development

Regional algorithms for the major ocean basins were parameterized from the NOMAD version 2 data set (adapted from Werdell and Bailey, 2005). Data points were assigned to one of 5 major ocean regions – the Atlantic Ocean, Pacific Ocean, Indian Ocean, Southern Ocean and the Arctic Ocean (Figure 1). The total number of points after quality control was 2202 (determined from points having positive reflectance measurements at the MODIS satellite wavelengths and fluorometric chlorophyll values). The distribution of data points across the ocean basins were as follows: Southern Ocean – 265; Atlantic – 1164; Pacific – 582; Indian – 119; Arctic – 67.

From these ocean-basin subsets, a regional empirical algorithm was fitted following the form of the OC3M algorithm as a 4th order polynomial:

log10 Chl = A + B*X + C*X2 + D*X3 + E*X4 (1)

where X is the maximum reflectance at 443nm or 490nm divided by the reflectance at 551nm. The coefficients were computed in Matlab using the ‘polyfit’ function to each subset of data for the five ocean basins. After quality control, a total of 2374 data points remained.

A basin mask was then constructed which covered the potential area of application for each algorithm (Figure 2). This area covers a wider area than the basin for which it was assigned, as the intent of this mask is to restrict the area to which an algorithm is applied,

as well as to create areas of overlap between adjacent basins. The areas of overlap (where more than one mask is assigned to a pixel) are areas where there is some uncertainty to which water mass the pixel belongs. For these regions, the fuzzy logic algorithm decides where the boundaries are, and to what degree (the membership) each algorithm is weighted. In areas where only one mask is present, only one algorithm is applied. Thus, the fuzzy logic code is acting only in areas with mask overlap, and designed to decide when and where boundaries exist and which algorithm to apply. This is reviewed in the following section.

Fuzzy memberships and ocean basin characterization

Fuzzy logic was first introduced by Zadeh (1965) as a mathematical way to represent vagueness and imprecision inherent in data. The idea behind fuzzy sets simply states that an object can have partial membership to more than one set. This concept is a departure from classical set theory, which states that an object belongs exclusively to only one set. In fuzzy set theory, full membership to exclusively one set is still permitted, and thus it is a superset of classical set theory.

This concept has been applied to ocean color remote sensing as a means of blending retrievals from algorithms tailored to water types into a unified product (Moore et al, 2001). In that case, semi-analytic algorithms were blended based on reflectance spectra. In the present case, membership to a specific ocean basin will be defined in terms of: sea surface temperature (SST), diffuse attenuation at 490nm (Kd490), photosynthetic available radiation (PAR), and longitude. The characterization of these variables for each basin were obtained from NASA monthly climatologies at 9km pixel resolution. Thus, for each month of the year, separate statistical properties were calculated for a subregion within each ocean basin (Figure 3). The subregions were based on the geographic location of the basins and guided by the Longhurst (2007) provinces. This characterization served as the basis of the membership functions, which were used to assign pixels to one or more of the basins. The critical aspect of the basin characterization is that variables which are useful descriptors of the water mass can be measured from space. This limits the pool of variables available, and it is acknowledged that there are more suitable variables to characterize water masses (e.g. salinity), but they are not measureable from satellite platforms.

Statistical properties (the mean and covariance matrix) of the set of these four variables were determined and used as the basis of the fuzzy membership function used to define the membership to each ocean basin for areas of overlap. In general terms, for any measured vector, x, the fuzzy membership was defined in terms of the squared distance between x and the ith class mean yi. For this, we used the squared Mahalanobis distance given by:

Zi2 = (x - yi)tSi -1(x - yi) (2)

where t indicates the matrix transpose. The Mahalanobis distance is a generalized distance from x to yi in units of standard deviations adjusted for the covariance.

If the satellite vectors belonging to class i are multivariate normal, and if x belongs to class i, then Zi2 has a 2 distribution with n degrees of freedom (where n is the dimension of x). As a measure of likelihood that x is drawn from class i, we defined the membership function to be:

fi = 1 - Fn(Zi2) (3)

where Fn(Z2) is the cumulative 2 distribution function with n degrees of freedom. Thus defined, the membership function returns a value ranging from 0 to 1. If the observation x is exactly equal to yi, then Zi

2 = 0, and fi = 1. This would be interpreted as the pixel having full membership in class i. As x becomes more distant from yi, fi decreases from 1 to 0, indicating a decreasing likelihood that x comes from class i.

Application to satellite data

Monthly averaged satellite data (level 3) were obtained from the NASA Goddard DAAC for the following data sets: SST – MODIS Aqua; Kd490 – MODIS Aqua; PAR – SeaWiFS. The nominal spatial resolution of each pixel was approximately 9 km2. For each month of global satellite data, membership maps were generated showing the distribution of the five major ocean basins.

For each pixel, a chlorophyll value was calculated from each regional algorithm when the mask value for that ocean type was present. In most cases, only one mask was present and thus only one algorithm was used. In cases where masks overlapped multiple algorithms were considered plausible. In such cases, memberships were calculated to each ocean type that had a mask value present. The resulting memberships were used to weight the ocean-specific chlorophyll retrievals. The end result was a weighted average of the chlorophyll retrievals derived from the different algorithms for these pixels.

RESULTS

Regional OC3M Error Assessment

Figure 4 shows the polynomial fit to different ocean basins. For the Southern Ocean, the original OC3M algorithm has a strong bias towards underestimating Chl, while in the Atlantic Ocean Chl is overestimated in mid-Chl ranges. The Pacific Ocean also has a bias toward understimating Chl but to a lesser extent. The Arctic Ocean and Indian Ocean (not shown) have fewer data points than these other regions, and it is difficult to gauge the overall relationship in this ocean basin.

The relative errors are contained in Table 1 for the original OC3M algorithm and NOMAD V2 in situ data, and the new regional algorithms. The relative differences for each row (i.e. basin) were calculated on NOMAD points only from that region.

Ocean Basin Relative Error %Global OC3M Regional OC3M

Atlantic 70 52Pacific 33 31Southern 56 38Indian 31 28Average 56 44

Table 1. Relative error between measured NOMAD Chl andChl retrieved from the OC3-family of algorithms.

Significant improvements occurred in Chl retrievals for the Atlantic and Southern Ocean algorithms parameterized from data points exclusive to each region compared with the global data set. A minor improvement was seen in the Pacific and Indian Oceans. The overall reduction in Chl error was reduced from 56% to 44%. A further consequence of these improvements in Chl retrievals should increase the accuracy of global primary productivity algorithms which use Chl as an input. It should be noted that the data used for error estimates were used in the algorithm parameterization and thus are not a true independent validation data set. The error estimates provided are more a measure of ‘goodness of fit’.

Satellite fuzzy memberships The input and output images for the fuzzy membership function are shown in Figure 5 for January 2005. The membership map for the Southern Ocean is only shown in this figure. Based on the overlap region of the Southern Ocean regional mask with the other masks (shown in Figure 2) and the satellite input data, a geographic extent is generated and mapped. The boundary between the Southern Ocean and other regions is shown to be uneven, and graded across the boundary in a north-south direction.

This final membership map is much smaller in geographic extent from the original mask template, which served as the initial starting point. The dynamic nature of the ocean is manifested with fluid boundaries that change locations over space and time, and thus the mask is the template from which the membership maps are ‘carved’. It is assumed that within the mask overlap regions the boundaries between ocean regions exist, but the precise locations are not known. The fuzzy methodology determines the locations and the degree of transition between the boundaries within these overlap regions. In a broad sense, the classification scheme differentiates the ocean basins and grades the transitions without step-wise discontinuities.

Global Chlorophyll

A merged global chlorophyll product is shown in Figure 6, and compared to the standard OC3M chlorophyll product. The difference between the two images is also shown.

Chlorophyll values in the Southern Ocean are higher in the image with the integrated regional algorithms compared to the global algorithm. Conversely, Chl is lower in the same image comparison for the North Atlantic. These two areas have significant contributions and importance to global primary productivity and carbon flux interaction with the atmosphere on annual basis. Thus, the reduction in error in Chl fields via the empirical algorithms will have an impact on primary productivity algorithms that use satellite-derived Chl as an input, and ultimately should improve global primary productivity estimates. These effects are yet to be quantified.

CONCLUSIONS

A single global chlorophyll algorithm does not achieve the desired level of accuracy (35%) that has been specified by NASA. The current global empirical algorithm OC3M for MODIS satellites has a mean relative error of approximately 56% based on the NOMAD V2 data set. When subdivided into oceanographic regions, errors can be as high as 70% for the Atlantic Ocean. When empirical algorithms were developed for each ocean basin, the mean global relative error declined to 44%. While still outside the NASA requirement, the conclusion is that it is only through an association of regional algorithms that global Chl errors will be reduced. It has been demonstrated that a fuzzy logic framework is a potential framework for integrating these regional algorithms using global monthly satellite products. This is a significant step forward in the direction of uniting regional algorithms, and this framework could also be applied to semi-analytic bio-optical algorithms and primary productivity algorithms.

FIGURES

Figure 1 – Geographic distribution of the NOMAD V2 data points by ocean basin (total N=2202). The number of points assigned to each basin are Southern Ocean (red): 265; Atlantic (blue) – 1164; Pacific (green) – 688; Indian (cyan) – 56; Arctic (black) – 80.

Figure 2. Ocean basin masks where regional algorithms are ‘potentially’ applied. Upper left: Arctic; Middle left: Pacific; lower left: Indian; upper right: Atlantic; middle right: Southern Ocean. The figure in the lower left indicates areas where different combinations of the masks overlap. In these areas, the fuzzy logic decides where the boundaries exist, and is how to weight the chlorophyll retrievals for multiple algorithms.

Figure 3 – Regions where ocean basin statistics were extracted from satellite imagery. For each climatological month, SST, PAR, Kd and longitude were extracted from monthly climatologies for Aqua (SST, Kd) and SeaWiFS (PAR). The mean and covariance matrix for each were calculated and used as the basis of the fuzzy membership function. Regions were guided by the distribution of the Longhurst provinces (Longhurst, 2007).

Figure 4 – Regional OC3M algorithm fits for NOMAD V2 subsets for the Atlantic Ocean (upper left), Southern Ocean (upper right), Pacific Ocean (lower left), and Arctic Ocean (lower right). Green line is the standard OC3M, and red lines are new regional fits. Indian Ocean results are not shown.

Figure 5. Fuzzy membership map derived for the Southern Ocean for January 2005. Satellite input data are PAR (SeaWiFS), SST (Aqua), Kd490 (Aqua), and longitude (not shown). The mask limits the area of the eventual membership map, which was derived from statistical distributions of distinct ocean regions (figure 3) and mask overlap areas (figure 2).

Figure 6. Chlorophyll images for the standard OC3M (upper left) and the blended Chl from the regional algorithms (upper right) for January 2005. The difference between the two images is below (OC3M – Fuzzy OC3M). The standard OC3M is shown to underestimate Chl in the Southern Ocean, while overestimating Chl in the North Atlantic.

REFERENCES

Barnard, A. H., W.S. Pegau, and J.R.V. Zaneveld, Global relationships of the inherent optical properties of the ocean, J. Geophys. Res., 103, 24,955-24,968, 1998.

Carder, K.L, F.R. Chen, Z.P. Lee, S.K. Hawes and D. Kamykowski, Semianalytic Moderate-Resolution Imaging Spectrometer algorithms for chlorophyll a and absorption with bio-optical domains based on nitrate-depletion temperatures, J. Geophys. Res., 104, 5403-5421, 1999.

Gordon, H.R., D.K. Clark, J.W. Brown, O.B. Brown, R.H. Evans, and W.W. Broenkow, Phytoplankton pigment concentrations in the Middle Atlantic Bight: comparison of ship determinations and CZCS estimates, Appl. Opt., 22, 20-36, 1983.

Hoepffner, N., B. Sturm, Z. Finenko, and D. Larkin, Depth-integrated primary production in the eastern tropical and subtropical North Atlantic basin from ocean colour imagery, Int. J. Remote Sensing, 20, 1435-1456, 1999.

Longhurst, A.R. Ecological Geography of the Sea, Academic Press, 2007.

Moore, T.S., J.W. Campbell, and H. Feng, A fuzzy logic classification scheme for selecting and blending satellite ocean color algorithms, IEEE Trans. Geosci. Rem. Sens., 39, 1764-1776, 2001.

Morel, A. and J.F. Berthon, Surface pigments, algal biomass, and potential production of the euphotic layer: relationships reinvestigated in view of remote-sensing applications, Limnol. Oceanogr., 34, 1545-1562, 1989.

Mueller, J.L. and R.E.Lange, Bio-optical provinces of the Northeast Pacific Ocean: A provisional analysis, Limnol. Oceanogr., 34, 1572-1586, 1989.

O’Reilly, J.J. (and 21 co-authors), Ocean Color Chlorophyll a Algorithms for SeaWiFS, OC2 and OC4: Version 4. Chapter 2 in SeaWiFS Post-launch Calibration and Validation Analyses, Part 3, SeaWiFS Post-launch Technical Memorandum Series, 11, NASA, 2000.

Werdell, P.J. and S.W. Bailey, 2005: An improved in situ bio-optical data set for ocean color algorithm development and satellite data product validation. Remote Sensing of Environment, 98(1), 122-140.

Zadeh, L., Fuzzy Sets, Inform. Cont., 8, 338-353, 1965.

developing a global chlorophyll product from integrating...

Documents