clustering nutrients and toxic cyanobacteria communities ... · clustering nutrients and toxic...

33
Clustering nutrients and toxic cyanobacteria communities using a self- organizing map (SOM) www.lcbp.org A bloom near Venise-en-Quebec in August, 2008. Credit: Quebec Ministry of Sustainable Development, Environment and Parks. Andrea Pearce 1 , Donna Rizzo 1 , Lori Stevens 2 , Mary Watzin 3 (1) School of Engineering, University of Vermont, Votey Hall, 33 Colchester Ave, Burlington, VT 05405 (2) Department of Biology, University of Vermont, Marsh Life Science, 109 Carrigain Dr, Burlington, VT 05450 (3) Rubenstein School of Environment and Natural Resources, University of Vermont, 324 Aiken Center, Burlington, VT 05405

Upload: others

Post on 21-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Clustering nutrients and toxic

cyanobacteria communities using a self-

organizing map (SOM)

www.lcbp.org

A bloom near Venise-en-Quebec in August, 2008.

Credit: Quebec Ministry of Sustainable Development, Environment and Parks.

Andrea Pearce1, Donna Rizzo1,

Lori Stevens2, Mary Watzin3

(1) School of Engineering, University of Vermont, Votey Hall, 33

Colchester Ave, Burlington, VT 05405

(2) Department of Biology, University of Vermont, Marsh Life Science,

109 Carrigain Dr, Burlington, VT 05450

(3) Rubenstein School of Environment and Natural Resources, University

of Vermont, 324 Aiken Center, Burlington, VT 05405

Page 2: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Acknowledgements

•Vermont EPSCoR - Graduate Research Fellowship

•NSF - EAR Award #061154

•Casella Waste Management, EPA - Landfill Data

•Rubenstein Lab crew, VT ANR & volunteer

monitors - Lake Champlain Data

•CSYS Group

Background Photo Credit: Larry Dupont www.lcbp.org

Page 3: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Big Picture Goal

Develop a computational method to use

microbial diversity and other data to describe

spatial patterns in the environment

Lake Champlain Cyanobacteria Dataset

•Many correlated variables

•Data correlated in space and time

•Multiple indicators of water quality

•Observation of trends requires costly,long-term

monitoring

•Lack of appropriate computational tools for

•data mining multiple data types

•producing maps/estimations at multiple scales

Page 4: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Introduction: Computational Methodology

Self-Organizing Map (SOM) -

A clustering artificial neural network (ANN)

Why ANNs?

•Data-driven methods

• Exploit complex functional relationships in a

dataset without explicitly defining them

Page 5: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Self-Organizing Map (SOM) -

A clustering artificial neural network (ANN)

Why the SOM?

•Unsupervised ANN

•Reduces data dimensionality

•Does not require a priori knowledge of the

number of groupings or features of those groups

•Outperforms many clustering methods with

noisy datasets.

Introduction: Computational Methodology

Kohonen 1990, Mangiameli et al. 1996

Page 6: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Method Demonstration: Kohonen Animal Example

small medium big 2 legs 4 legs hair hooves mane feathers hunt run fly swim

dove 1 0 0 1 0 0 0 0 1 0 0 1 0

hen 1 0 0 1 0 0 0 0 1 0 0 0 0

duck 1 0 0 1 0 0 0 0 1 0 0 1 1

goose 1 0 0 1 0 0 0 0 1 0 0 1 1

owl 1 0 0 1 0 0 0 0 1 1 0 1 0

hawk 1 0 0 1 0 0 0 0 1 1 0 1 0

eagle 0 1 0 1 0 0 0 0 1 1 0 1 0

fox 0 1 0 0 1 1 0 0 0 1 0 0 0

dog 0 1 0 0 1 1 0 0 0 0 1 0 0

wolf 0 1 0 0 1 1 0 0 0 1 1 0 0

cat 1 0 0 0 1 1 0 0 0 1 0 0 0

tiger 0 0 1 0 1 1 0 0 0 1 1 0 0

lion 0 0 1 0 1 1 0 1 0 1 1 0 0

horse 0 0 1 0 1 1 1 1 0 0 1 0 0

zebra 0 0 1 0 1 1 1 1 0 0 1 0 0

cow 0 0 1 0 1 1 1 0 0 0 0 0 0

Page 7: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

SOM Network Architecture

Hair

2-D Output Map

2-Legs

Hooves

Swim

w(i,j)1

w(i,j)13

w(i,j)3

w(i,j)2

i

j

x1

x2

x3

x13

Introduction: Computational Methodology

Page 8: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

SOM Network Architecture

2-D Output Map

w(i,j)1

w(i,j)13

w(i,j)3

w(i,j)2

i

j

x1

x2

x3

x13

xk w(i, j)k 2

k1

K

1 2

Minimize Euclidean Distance

Introduction: Computational Methodology

Hair

2-Legs

Hooves

Swim

Page 9: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

SOM Network Architecture

2-D Output Map

w(i,j)1

w(i,j)13

w(i,j)3

w(i,j)2

i

j

x1

x2

x3

x13

w(i, j)knew w(i, j)k

old xk w(i, j)kold

Introduction: Computational Methodology

Hair

2-Legs

Hooves

Swim

Page 10: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

SOM Network Training

•Present input data vectors to the network 1 by 1

•Iterate through all a predetermined number of times

2-D Output Map

w(i,j)1

w(i,j)13

w(i,j)3

w(i,j)2

i

j

x1

x2

x3

x13

w(i, j)knew w(i, j)k

old xk w(i, j)kold

Du

ck

Fo

x

Ho

rse

Lio

nIntroduction: Computational Methodology

Page 11: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Method Demonstration Kohonen Animal Example

Before Training After Training

Page 12: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Method Demonstration - Animal Example U-Matrix

Page 13: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Method Demonstration - Component Planes

Page 14: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Method Demonstration

Field Site - Schuyler Falls Landfill, NY

Mouser 2006

Page 15: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Method Demonstration

Field Site - Schuyler Falls Landfill, NY

Mouser 2006

Page 16: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Describe the Dataset

hydrochemistry

Sp. Cond.

Eh

Ph

Turbidity

BOD

COD

Ammonia

Nitrate

Cations

Anions

Heavy Metals

Acetone

Benzene

Carbon Tetrachloride

Xylene

Page 17: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Describe the Dataset

hydrochemistry

Sp. Cond.

Eh

Ph

Turbidity

BOD

COD

Ammonia

Nitrate

Cations

Anions

Heavy Metals

Acetone

Benzene

Carbon Tetrachloride

Xylene

http://rdp8.cme.msu.edu/html/t-rflp_jul02.html

Mouser et al. 2005

Roling et al. 2001

Page 18: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Microbial Data from the Landfill

Community profiles of Archaea, Bacteria and

Geobacteracea

Relative abundance of „Operational Taxonomic Units‟,

or OTU‟s for each target

Community data from 25 monitoring wells

Reduce OTU data dimensionality by computing

principal components 75% of variance captured by

2 PC’s of Archaea

3 PC’s of Bacteria

3 PC’s of Geobacteracea

Page 19: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Method Demonstration - Data Organization

C = Clean

F = Fringe

P = Polluted

= Unused Node

After Training

Page 20: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Non-Parametric MANOVA

F = Between Group Variability / Within Group Variability

How Many Clusters?

Anderson, M. J. (2001), McArdle, B.H. and M.J. Anderson (2001), Jones, D. (2003)

•Non-parametric

•Suitable for unbalanced designs

•Can be applied to any distance matrix

Page 21: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Non-Parametric MANOVA

F = Between Group Variability / Within Group Variability

How Many Clusters?

Anderson, M. J. (2001), McArdle, B.H. and M.J. Anderson (2001), Jones, D. (2003)

0

5

10

15

20

25

30

0 2 4 6 8 10 12

Number of Clusters

F-S

tati

sit

c

Page 22: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Preliminary Results

2-Clusters

3-Clusters

Page 23: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Preliminary Results

4-Clusters

Page 24: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Preliminary Results

4-Clusters

Bacteria &

Geobacter Only

4-Clusters

Archaea, Bacteria &

Geobacter

Page 25: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

•The SOM is effective at distinguishing a gradient of

contamination at the landfill based on microbial

communities.

•The spatial pattern of the groups generated by the

algorithm agree with hydrochemical analysis.

•Microbe communities may be able to serve as an

advance indicator of migrating pollution.

•More knowledge of specific sub-groups of

organisms (and primers to amplify relevant DNA)

could improve the ability of the algorithm.

Landfill Application Preliminary Conclusions

Page 26: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Map courtesy of M. Watzin

Page 27: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Lake Champlain Dataset

ID

datetime

collected location rep

sample

type diatoms greens

chryso

phytes

crypt

ophyt

es

dinof

lagel

lates

euglen

ophyte

s

bacilla

riophy

ceae

indete

rmina

te

Potential

Toxin

Producers total

Net, Plankton

or Whole

Water

Toxin

Microcystin,

ug/L

Whole

Water Chl,

ug/L Total P Total N SRP

32 7/2/07 13:20Chapman Bay 1 net 8.3 15.7 4.4 0.0 570.2 598.6 ww <0.05 3.25 27.41 0.77 7.0

33 7/2/07 13:20Chapman Bay 1 net 8.3 15.7 4.4 0.0 570.2 598.6 wwp 0.006 3.25 27.41 0.77 7.0

34 7/2/07 13:20Chapman Bay 2 net 63.5 24.4 2.3 0.1 151.8 245.5 wwp 0.002 2.75 28.80 0.78 6.9

35 7/2/07 13:20Chapman Bay 2 net 63.5 24.4 2.3 0.1 151.8 245.5 ww <0.05 2.75 28.80 0.78 6.9

36 7/2/07 14:25Highgate Cliffs 1 net 5.5 84.0 0.0 0.0 680.6 775.9 wwp 0.009 0.00 46.76 1.04 20.3

37 7/2/07 14:25Highgate Cliffs 1 net 5.5 84.0 0.0 0.0 680.6 775.9 ww <0.05 0.00 46.76 1.04 20.3

38 7/2/07 14:25Highgate Cliffs 2 net 2.6 44.3 0.0 0.0 929.6 976.4 ww 0.061 4.97 46.57 0.96 20.7

39 7/2/07 14:25Highgate Cliffs 2 net 2.6 44.3 0.0 0.0 929.6 976.4 wwp 0.043 4.97 46.57 0.96 20.7

40 7/2/07 14:40Highgate Springs 1 net 0.7 73.0 0.0 0.1 317.0 390.8 ww <0.05 4.07 57.31 1.03 21.2

41 7/2/07 14:40Highgate Springs 2 net 2.8 49.5 0.0 0.0 198.2 250.5 ww 0.062 4.07 56.76 1.16 21.1

Sample

Collection

Details

Algae

Community

CompositionLake

ChemistryCyanotoxin

Concentration

Page 28: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

ID

datetime

collected location rep

sample

type diatoms greens

chryso

phytes

crypt

ophyt

es

dinof

lagel

lates

euglen

ophyte

s

bacilla

riophy

ceae

indete

rmina

te

Potential

Toxin

Producers total

Net, Plankton

or Whole

Water

Toxin

Microcystin,

ug/L

Whole

Water Chl,

ug/L Total P Total N SRP

32 7/2/07 13:20Chapman Bay 1 net 8.3 15.7 4.4 0.0 570.2 598.6 ww <0.05 3.25 27.41 0.77 7.0

33 7/2/07 13:20Chapman Bay 1 net 8.3 15.7 4.4 0.0 570.2 598.6 wwp 0.006 3.25 27.41 0.77 7.0

34 7/2/07 13:20Chapman Bay 2 net 63.5 24.4 2.3 0.1 151.8 245.5 wwp 0.002 2.75 28.80 0.78 6.9

35 7/2/07 13:20Chapman Bay 2 net 63.5 24.4 2.3 0.1 151.8 245.5 ww <0.05 2.75 28.80 0.78 6.9

36 7/2/07 14:25Highgate Cliffs 1 net 5.5 84.0 0.0 0.0 680.6 775.9 wwp 0.009 0.00 46.76 1.04 20.3

37 7/2/07 14:25Highgate Cliffs 1 net 5.5 84.0 0.0 0.0 680.6 775.9 ww <0.05 0.00 46.76 1.04 20.3

38 7/2/07 14:25Highgate Cliffs 2 net 2.6 44.3 0.0 0.0 929.6 976.4 ww 0.061 4.97 46.57 0.96 20.7

39 7/2/07 14:25Highgate Cliffs 2 net 2.6 44.3 0.0 0.0 929.6 976.4 wwp 0.043 4.97 46.57 0.96 20.7

40 7/2/07 14:40Highgate Springs 1 net 0.7 73.0 0.0 0.1 317.0 390.8 ww <0.05 4.07 57.31 1.03 21.2

41 7/2/07 14:40Highgate Springs 2 net 2.8 49.5 0.0 0.0 198.2 250.5 ww 0.062 4.07 56.76 1.16 21.1

Lake Champlain Dataset: Preliminary Analysis

Sample

Collection

Details

Algae

Community

CompositionLake

ChemistryCyanotoxin

Concentration

Input Dataset:

All sampling locations in Missisquoi Bay („03-‟07)

All sampling dates (May - October)

356 samples total

Variables:

Total N (g/L)

Total P (g/L)

Chlorophyll (g/L)

Anabaena (cells/mL)

Aphanizomenon (cells/mL)

Microcystis (cells/mL)

Page 29: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Lake Champlain Dataset - BMU by Microcystin Conc.

< 10 g/L toxin (n = 339)

10 - 100 g/L toxin (n = 14)

>100 g/L toxin (n = 3)

Unused Node

Page 30: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Lake Champlain Dataset - Component Planes

Anabaena Aphanizomenon

Total NitrogenTotal PhosphorusChlorophyll

Microcystis

Page 31: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

1) Create new input variables (Principal Components)

A. Lake Chemistry (N, P, Chl, H2O Temp)

B. Biology (Cyanobacteria Community)

C. Environmental Conditions (Air Temp, Cloud Cover)

2) Explore the spatial and temporal autocorrelation in the

dataset

Lake Champlain Dataset - Ongoing Work

Page 32: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Canfield, D.E., B. Thamdrup, and E. Kristensen, (2005). Aquatic Geomicrobiology, Elsevier, New York.

Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32-

46.

Jones, D. (2003). Fathom: a MATLAB toolbox for ecological and oceanographic data analysis. University of Miami

СSMAS, Department of Marine Biology and Fisheries. Available at: http://www.rsmas.miami.edu/personal/djones

Accessed: 3 December 2008

Roling, W.F.M., B.M. van Breukelen, M.Braster, B. Lin, H.W. van Verseveld, (2001) “Relationships between Microbial

Community Structure and Hydrochemistry in a Landfill Leachate-Polluted Aquifer”. Applied and Environmental

Microbiology 67(10):4619-4629.

Grant, L.M., L.M. Muckian, N.J.W. Clipson, and E.M. Doyle, (2006). “Microbial community changes during the

bioremediation of creosote-contaminated soil”. Letters in Applied Microbiology 44:293-300.

McArdle, B.H. and M.J. Anderson (2001). Fitting multivariate models to community data: A comment on distance-

based redundancy analysis. Ecology. 8(1), 290-297.

Mouser, P.J., D. M. Rizzo, W.F.M. Roling and B.M. van Breukelen, (2005). “A multivariate statistical approach to spatial

representation of groundwater contamination using hydrochemistry and microbial community profiles.” Environmental

Science and Technology 39:7551 - 7559.

Mouser (2006), Improving Detection and Long-Term Monitoring Strategies for Landfill Leachate Contaminated

Groundwater with Molecular-Based Microbiological Data Using Geostatistics and Artificial Neural Networks. Doctoral

Dissertation, University of Vermont.

Mangiameli, P., S.K. Chen, and D. West, (1996). “A comparison of SOM neural network and hierarchical clustering

methods”. European Journal of Operational Research 93:402-417.

Pace, N.R. (1997) “A Molecular View of Microbial Diversity and the Biosphere”. Science 276(5313):734-740.

References

Page 33: Clustering nutrients and toxic cyanobacteria communities ... · Clustering nutrients and toxic cyanobacteria communities using a self-organizing map (SOM) A bloom near Venise-en-Quebec

Preliminary Results