Download - Geophysical Anomaly Detection for Archaeology using …cs.brown.edu/courses/cs237/2010/proceedings2010.pdf · 2012. 8. 24. · ways are always better than one. Southeastern Archaeology,

Geophysical Anomaly Detection for Archaeology using

Multimodal Volume Visualization

Andrew LoomisComputer Science, Brown University

Tyler ParkerComputer Science, Brown University

Margaret WatersJoukowsky Institute, Brown University

Abstract—We present a multimodal volume visualization technique

that allows for simultaneous interactive exploration of multiple geophys-

ical surveys to aid in the detection and analysis of subsurface anomalies

at archaeological dig sites. An initial survey of archaeological students

revealed that our proposed technique is preferred for visualizing Ground

Penetrating Radar (GPR) and Electrical Resistivity Imaging (ERI) over

existing techniques used in software packages such as Radan and Geoplot.

I. INTRODUCTION

Archaeologists use multiple geophysical surveys to noninvasively

measure different geophysical properties and identify buried features

at archaeological sites. Current software tools used to analyze and

visualize these surveys are targeted primarily towards a single sur-

vey type. In addition, these tools do not fully exploit the three-

dimensional nature of some of these datasets, and instead prefer to

treat them as two-dimensional slices. Previous work has show that the

integration of multiple surveys into a single visualization is a more

effective technique for the exploration of a particular site than any

one survey alone [5], [1], [3]. Additional work has also been done to

visualize these datasets in a three-dimensional space [6], [2], [4]. We

approach both issues within the framework of multimodal volume

visualization, which allows us to highlight even subtle features of

the subsurface.

II. METHODS

Our project is focused on the integrated visualization of two

particular types of geophysical surveys, GPR and ERI. Both are

collected as two-dimensional slices arranged on a grid. The slices

combine to form three-dimensional scalar volumes. Our general

approach was to use volumetric ray casting on modern graphics

hardware to render both the GPR and ERI volumes simultaneously.

The color at any given point in the space was assigned using the

values of each volume as indices into a multidimensional transfer

function.

In order to evaluate the effectiveness of our multimodal visual-

ization, we enlisted an undergraduate archaeology class. We pre-

sented them with data from the Catholme Ceremonial Complex, an

archaeological dig site in the United Kingdom. The students had

previously been investigating this site through the use of two software

applications, Radan and Geoplot. The central features of this site

include a ring ditch, postholes, and plow furrows [6]. We gave each

student several minutes to explore the site using our application. They

were able to control several aspects of the visualization including

the density, lighting, transfer function lookup, and scaling between

the two datasets. This allowed them to both highlight and diminish

different features within the volumes. After they were finished we

gave each student a short survey asking them to compare different

aspects of our tool with their current tools by rating statements on a

Likert scale.

III. RESULTS

When asked if the proposed software was more effective for

identifying archaeological features than their current software, 62%

Fig. 1. Combined GPR and ERI volume visualization. The diagonal blueline is a medieval plow furrow.

felt that it was for GPR data and 50% felt that it was for ERI

data. When asked if the proposed software was more effective at

visualizing features, 87% felt that it was for GPR data and 62%

felt that it was for ERI data. All students felt that visualizing both

datasets simultaneously was more effective than visualizing either

dataset alone.

IV. CONCLUSION

The results from the survey are promising, but there are a number

of factors that may have influenced them. The students themselves

were not experts in the field and potentially lack experience in

analyzing these types of data. In addition, they were already familiar

with the test data using their current software. Finally the proposed

software had a user interface that was difficult to learn, which may

have hindered their ability to use the application. Overall there was

a great deal of enthusiasm among the students for continuing work

in this area.

REFERENCES

[1] R.B. Clay. Complimentary Geophysical Survey Techniques: why twoways are always better than one. Southeastern Archaeology, pages 31–43, 2001.

[2] S. Kadioglu and J.J. Daniels. 3D visualization of integrated groundpenetrating radar data and EM-61 data to determine buried objects andtheir characteristics. Journal of Geophysics and Engineering, 5:448, 2008.

[3] K.L. Kvamme. Integrating Multidimensional Geophysical Data. Archae-

ological Prospection, 13(1):57–72, 2006.[4] L. Nuzzo, G. Leucci, S. Negri, M.T. Carrozzo, and T. Quarta. Appli-

cation of 3D Visualization Techniques in the Analysis of GPR Data forArchaeology. Annals of Geophysics, 45(2), 2002.

[5] S. Piro, P. Mauriello, and F. Cammarano. Quantitative Integration ofGeophysical Methods for Archaeological Prospection. Archaeological

Prospection, 7(4):203–213, 2000.[6] M.S. Watters. Geovisualization: An Example from the Catholme Cere-

monial Complex. Archaeological Prospection, 13(4):282–290, 2006.

��AB��CAB�D��E��FA��A�C��D��A��A��A

��C��A�F��D�D

�F��C�A��D�A��C�A��C��A��C�A ��C�D

��A�B�CDE�F��A��AE��

��A�C��F��A��B��C��E��C��A�B��

��A��!��A��B��C��C��"��#��A��A��F��!��C��

CA��CDCF�AA!�D��AC"��FA��A��A�C�DA

��C��D�DAF!��CA��A�F�C��C�A��CA

�C��D��A��ADC�D#A��CA��A�DAC!��C�A��A��CA

�DC��FCDDA��A��DA��A�C��F�$�CDA��A��A��A!�D��A

CF�F�CAD�%��CAD�%D��CAF��CDAF�A�F��CDCA�FC&DA%��A

��A$��AF�A��C��A��CF��AF�A��DD��A��C��A

�C��CD#A CAD��C�A��A�DC�AD��A��A��CDCA�C��F�$�CDA

�CAC��C��!CA��CFA��C�A��A��'AF�A(')A��A��C��C�A

��A��C��AD��CD#A��A��A!��CA!�D��E��F�A�CD�D��!��A!��CA

!�D��E��F�A��C��A��

)#A*��F�

��C�FA��C��D�DA�C��A�C!��A�FAF�F+�F!D�!CA

�C��D��AD��!C�DA�FA��C�A��A%C��C�A�F�C�D�F�A��CA

D�%D��CAD��CA��A��CA��C��AD��CAF�A��DD�%��A

��CF��A�C��CDA��A�F�C�CD�#A��D�A��CF��A!��%�CAD��CA

��A�C��F�A��DA��A��DCDA�FA��CA!�D��E��FAF�AF��D�DA

��AAD�F��CA��ADC�A��C#A,��A��DC�AD��CA�DCDA!��CA

!�D��E��FA�C��F�$�CDA��A��CAF�AF��ECA��ADC�DA��A

��A�A��CAD�CA��C#

))#AA�C��D

-��A��AD��A�CA�DC�A��F�A�CFC��F�A.��'/A

F�A(�C��A'CD�D��!��A)��F�A.(')/A��A��C��C�A��A

��CA0��CA0C�C��F��A0��C"�A��A�DAFA

��C��AD��CA��A��D�DAA��C��FA��A��A��F��CF�D#A

��CDCA��A�C�CA��F��A�FA��CA��A��A��CAD��CDAF�A

�C�CA��CFA��CDDC�A�F��A1+��CFD��F�A�D�C�A!��CD�AF�AA

��C�CF�A��A��FD�C�A��F��FA�DA��C�A��A%��A��A

!��CD#

��A��CA��AA�DC�A�FA!�C�AC��C�A��ADC�A

�F��!��A��A�FA!��F�AD��CDA��AD��C��D��F#A(��A

��ADC�A�FA�!CA!�D��A��%��CDA�2�D�C��AD��ADA%��FCDD�A

!��CAD��F��AF�A�CFD��#A��CA��A�D�A��DAA�DC�A��A

�C��FCA��CA��CFD��FDA��AAD��CA��A�FA%CA��!C�A��A��CA

!��C.D/#

�A�DC�A��AD��A�DA��F��C��A��C�CA��F�DA

�DC�A��CA��DC�AD��CA��A��CF��A��C��A�C��CD�A

D��ADAA��A��DAF�A��!C�A�C��D��D#A��CDCA�DC�DA�C�CA

��C��A��A��A��CA��A��A��CA��C��AD��C�A

�D�F�A��CA��DA'�FA.��'/AF�A�C��A.(')/A��AF��ECA��CA

��#A��CA�DC�DA�C�CA��CFA��!CFAAD��!C�A��F�A��A

��FA��A��CAD��CA��C�A��CF��A�DC#A��F�DA�FA

��CAD��A�C�CA��CFAD�C�A��A��CA�FAA��C��A3��CA��A��A

C"�CF�A��C�A��CC�A��A��!CFAD��C�CF�D#

)))#AA'CD��D

'CD��DA��A��CAD��AD��C�A��A45#67A��A��CA�DC�DA

�C��A��A��A��FA�DA��CAC��C��!CA��FA��C��A��CF�A

D��CA�A!�D��E�F�A(')A��A89#67A��C�A��A��FA

DA��CAC��C��!CA��FA��C��A��CF�AD��CA�A!�D��E�F�A

��'A��#A:;;7A-C��A��A��CA��%�FC�A!�C�A��A��'AF�A(')A

�DA��CAC��C��!CA��FA!�C��F�AC��A��ADC�A�F��!��#

)B#A��D��DD��F

��%��DA��A��DAD��A�C�CA��A�CA�DC�AD��CF�D�A

��C�A��FA��CDD��F�D�A��AC!��CA��AD��C#A��CDCA

D��CF�DA�C�CA��C��A��CA��A��CA�C��CDA��C�AD��A%CA

��F�A��AF�A��C��AC!��FA�DAD�%2C��!C#A��CA<)A�DA

��C��AF�A��A�!CA�F��C�A��AD�D�C�#A��A��CDCA

��F�DA��A�!CA��DD�%��A��F��F�C�A��A�CD��D#

B#A0�F��D��F

CA��CDCF�C�AA��A��A!�D��E�F�A��'AF�A(')A

��A�D�F�A��+!��CA�CF�C��F�A�C��F�$�CD#A CAD��C�A

��A��CDCA�C��F�$�CDA�CAC��C��!C�A��CFA��C�A��A��A

��C��C�A��A��C��AD��CD�A��A��D��!C��F�AF�A

C"��D�F�AD�%��CA��C��A�C��CDA��FA��CA��#

B)#A'(-('(=0(3

>:?A0��A #AF�A3�D�A�#A@��A)F�C��"�F�AF�A��+

!��CA'CF�C��F�#@A"��$�CA��$��B�%��A:8A.:AAA/BA

16A+14A#

>5?A0��A'#A*#A@0��C�CF��A�C��D��A3��!C�A

�C��F�$�CDBA��A��A��DA�CA��DA%C��C�A��FA�FC#@A

&��C�A�BCA��'��A��D�A5;;:BA1:+C1#

>1?AD!��C�AD#�#A@)F�C��F�A��CFD��F�A�C��D��A

��#@A'��A��F��B$A�C��A.E��FA ��C�AFA3�FD/A:1�A

F�#A:A.5;;4/BA69++95#

>C?A=�EE��A�#AF�A�C��A�#AF�A=C��A3#AF�A0��EE��A�#�#A

F�AG��A�#A@��FA��A1�AB�D��E��FA�C��F�$�CDA�FA

��CA�F��D�DA��A��'A��A��A��C��#@A'��B��

�A�$�DB��BAC6�AF�#A5A.5;;5/#

>6?A'�%�C��A-#AF�A�C2��A(#AF�A-F��C�C��A�#AF�A(��A�#A

F�ADF��A�#A@��<+%DC�A��+!��CA�CF�C��F�A��A��CA

!�D��E��FA��A��F��F�A%��FA��CD#@AF��AA��B��

&��(�B�A5;;4#A1;6++1:8#

>4?A3C��AD��H��AF�AEC��C�AEA�F�C�DA@1�A!�D��E��FA��A

�F�C��C�A��F�A�CFC��F�A��A��AF�A(�+4:A��A��A

�C�C��FCA%��C�A�%2C��DAF�A��C��A��C��D��D@AA5;;8AE#A

�C��D#A(F�#ABACC8

>9?A�F��AI#AF�A��CFDC�C��AE#�#AF�A*��AJ#�#AF�A��A

�#�#A@�A��+B��CAB�D��E��FA-��C��A��A3��A

��FC�AB��CDA��C�A1�K1�A)��CA'C��D��F#@A

��C��&��A��A��!��AA��)�"�&!*�A�C�A5;;ABA

169:++169C#

Visualization of Hyperspectral Images Through Interactive Sections

Ryan P. Cabeen, Department of Computer Science, Brown University

Introduction: The presented work consists of a novel

method for visualizing hyperspectral images and an eval-

uation with expert users in the context of minerology.

Advances in remote sensing and progress in scientific

missions to collect data of the Earth, Moon and Mars

are creating a need for useful visualization methods.

In particular, hyperspectral images are challenging

to visualize due to their high dimensionality, and the

presented method addresses this by reducing the spatial

data dimension to a path in the image and rendering a

visualization of the spectra along this path.

Method: Each hyperspectral image consists of a set of

images of the same scene taken at different wavelengths,

which commonly described as an image stack or cube.

Since the number of spectral measurements will typically

vastly outnumber the number of colors that can be sensed

by the human visual system, these images cannot be

visualized directly. A variety of tools exist for spectral di-

mensionality reduction, often by algebraic manipulation

of certain bands or decomposition of the entire spectrum

into several representitive groups. Spatial dimensionality

reduction techniques exist, but are limited to point

sampling and statistics on simple regions of interest.

The presented work is a spatial dimensionality reduc-

tion technique that interactively visualizes the full spec-

trum along a path, which can be imagined as a piecewise-

linear section through the image in the spatial domain.

The user chooses a series of points in spatial coordinates

to define the path, and the spectra at regular intervals of

this path are sampled. This collection of spectra is then

rendered as an image, where one dimension is the spec-

tral band, one dimension is the arclength along the path,

and the color is a mapping hyperspectral image intensi-

ties in a given band at a given arclength. The path and

color map can be edited interactively, and a traditional

point sampled spectrum is provided additionally. The

views of the spatial domain and section visualization are

linked to allow the user to track features across displays.

Implementation: The work is available as a cross-

platform and open-source software package written in

C++. Many image file formats are supported through

use of the Geospatial Data Abstraction Library (GDAL).

Qt 4 and Qt Widgets for Technical applications (Qwt)

are used for the user interface. The sampling of spectra

along the path is performed by parameterizing the curve

by arclength and bilinearly resampling at a rate of one

sample per pixel.

Results: The computational demands of curve editing,

visualization and sampling were found to be practical for

real-time applications, and the test data sets were found

to be compatible with the GDAL library.

The tool was used in an informal evaluation by mem-

bers of the Brown Department of Geology in the context

of minerological applications. They found the tool to be

generally useful and expect it to be used in the course

of research. In the context of minerology, the general

shape variation of thermal infrared spectra was found

to be readily visualized with the tool. However, the

benefits for near-infrared spectra were found to be fewer,

due to the localization of the absorbtion bands. Several

potential additions were suggested, including rendering

the spectra relative to a given spectrum, supporting

multiple paths, and overlaying an image that has been

segmented by spectral dimension reduction methods.

Discussion: Due to the nature of hyperspectral images,

there is no natural way of visualizing an entire dataset.

Consequently, a variety of methods must be employed.

The presented work aims to address the reduction of

spatial domain, allowing visualization of the full spec-

trum at more than a single point. In addition to a variety

of methods, there are many types of data and scenes

that influence the efficacy of a tool. Hence, a variety of

tools and an understanding of their relative strengths and

weaknesses is necessary to address each scenario.

References

[1] NP Jacobson. Design goals and solutions for display of

hyperspectral images. Geo. and Rem. Sens., IEEE Trans.,

43(11), 2005.

[2] M Torson. Interactive image cube visualization and

analysis. In VVS, pages 33–38, 1989.

[3] J Zhang. Vdm-rs: A visual data mining system for

exploring and classifying remotely sensed images. Comp.

Geosci., 35(9):1827–1836, 2009.

1. Both researchers are at Brown University, Providence RI, 02912, and would like to thank Steven Sloman, Nathan Seigel, David Laidlaw, and Laidlaw’s CSCI2370 graduate course for their helpful comments during the course of this investigation.

!

Assessing confidence in mechanistic and covariational explanations and diagrams in medical diagnosis

Gideon Goldin

1 ([email protected])

Cognitive, Linguistic & Psychological Sciences

Diem Tran ([email protected])

Computer Science

Abstract

We conducted a study comparing the relative effects that

covariational (COV) and mechanistic (MECH) explanations,

and their corresponding diagrams (Euler and causal), have on

peoples’ confidence ratings for hypothetical diagnoses. We

empirically demonstrated that providing an explanation leads to

more confidence than not providing one. In addition, the

inclusion of Euler circles produced similar findings, where

diagnoses with these diagrams obtained better ratings than

those without them, though these ratings were not significantly

higher than those for diagnoses using causal diagrams.

1 Introduction

COV explanations rely on statistical information that

appeals to the co-occurrences of variables (e.g., Amy wrecked her car because she often gets into motor accidents), while

MECH explanations provide a causal link that explains how one

event entails another (e.g., Amy wrecked her car because she

was drunk). Here we compared these two types of explanations,

along with their diagrammatic counterparts (Euler circles and

causal diagrams) against each other and against a lack of

explanation/diagram in order to assess the levels-of-confidence

each evokes in a medical diagnosis task. We predicted that the

provision of an explanation would result in increased

confidence.

2 Related Work

The slow adoption of decision-support systems by doctors

may be due to a lack of trust or confidence in proposed

diagnoses. We hypothesize that this is due to issues with

uncertainty visualization in tasks involving risk representation.

The roles of various explanation types in expert systems have

been addressed [1] but we are not aware of any work that

compares explanation and diagram types in eliciting confidence.

3 Experiment

Design & Materials. Our design entails a 3-way, mixed model ANOVA. Both explanation (none, COV, MECH) and

diagram (none, Euler diagram, causal diagram) were within

subject, while the group variable likelihood (high, low) was

between-subject. Variables were crossed to form 18 conditions.

Each subject provided a confidence rating from 1 (not

confident at all) to 5 (very confident) for each of 9 novel,

hypothetical, counter-balanced diagnoses. For example, consider

a high-likelihood COV explanation with no accompanying

diagram: “A patient has red, square-like bumps on their body. They may or may not have Quandres’ Condition. Diagnosis: It is likely that the patient has Quandres’ Condition because people that have red, square-like bumps on their bodies usually have Quandres’ Condition.” Participants & Methods. A total of 113 U.S. subjects 18 years of age or older completed the study. Recruitment took

place online at Mechanical Turk [2] where subjects were

compensated $0.10 for their participation, which on average

required less than three minutes. Subjects were randomly

assigned to the high- or low-likelihood group. We removed 16 people from analyses because they provided identical ratings

across all items. Thus, 52 subjects constituted the low-likelihood

group—45 for the high-likelihood group.

Results. A main effect of group was found, with the high-

likelihood group showing more confidence than the low, p=.047.

Overall, providing no explanation resulted in less confidence

than providing COV or MECH ones, p<.0001. Providing no

diagram led to lower ratings than providing Euler diagrams

(p=.001), but not causal ones. Examining the cases in which the

explanation type and diagram type match reveals that ratings for

the no explanation/diagram condition (M=2.84; SD=0.13) were

significantly smaller than those for the COV/Euler condition (M=3.30; SD=0.10; p<.0001) and the MECH/causal condition

(M=3.28; SD=0.12; p=0.001). The difference between the latter

two conditions was not significant, p=.093.

!

Figure 1. Ratings for all 18 conditions.

4 Discussion

Though a group effect was not part of our prediction, we report it nonetheless as it implies that confidence may be related

to posterior probabilities in diagnosis. Notably, it seems that

providing a causal diagram for an unlikely diagnosis actually

hampers confidence. In general however, providing an

explanation (or even just an Euler diagram) greatly increases

confidence. The implications of these findings are important not

only for medicine, but for other areas of risk analysis as well,

such as public policy and scientific reasoning.

5 References

[1] C. Lacave and F. J. Díez. A review of explanation methods for Bayesian networks. The Knowledge Engineering

Review, 17(02):107-127, 2003.

[2] Amazon Mechanical Turk,

http://aws.amazon.com/mturk/. Retrieved 11-10-2010.

Mechanistic, covariational explanations and diagrams in medical diagnosis

Gideon Goldin – Diem Tran – Steven Sloman. Brown University – Fall 2010

Abstract – We hypothesize that 2 different kinds of explanations and diagrams will result in higher confidence in making medical diagnoses.

We conduct a user study to justify the hypotheses. Results from the study show significant difference in confidence when users are presented

with explanations and diagrams versus none of them.

1. BACKGROUND

In medical diagnosis, uncertainty occurs due to the natural

complexity of diseases and symptoms. Several expert systems

have been developed to help doctors in making rational diagnosis

by representing medical data in different formats, and several

researches have proved that different representation yields

different judgements [2]. We focus on 2 types of explanations

when describing uncertainty: covariational and mechanistic. The

corresponding diagrams for those 2 types are Euler and causal

diagrams.

Covariational explanation involves the coexistence of a

symptom and a disease, with some probability. For example: If

90% of time a person eats cheese and gets a stomachache, he is

likely to be lactose intolerant.

Mechanistic explanation describes the mechanism behind the

formation of a symptom caused by a disease. For example:

Patients with Alzheimer's disease have dementia because of

development of abnormal features harmful to the brain.

a. Contribution

Effects of certain types of explanations on medical diagnosis are

still in question, as physicians are hesitant to adopt expert systems

[4]. We investigate 2 discussed explanations by conducting a user

study. Results from the study provides basis to deeper research in

the field, which promises to help doctor in making medical

decisions.

b. Hypotheses

Hypothesis 1: Users give higher confidence ratings for diagnoses

with explanations and diagrams versus the control ones.

Hypothesis 2: Mechanistic explanations with causal diagrams

yield higher confidence than for covariational explanations with

Euler circles, as they are psychologically natural [1].

2. STUDY DESIGN

!The study is a 2x3x3 Mixed Model Design: High / Low

probability (Between subjects) x Explanation Type x Diagram

Type (Within Subjects) (Figure 2). We use made-up scenarios with

ordinary subjects

on Mechanical

Turk. High and low

probability

scenarios are

divided equally

among subjects.

For each condition,

a subject uses 5-

points Likert scale

to rate his

confidence on the

explanation.

!"#$%&'(')'*&+"#,'-$../%0

3. RESULTS

N = 97 after we filtered responses take less than 1 minute or have

all same answers. Figure 1 shows the mean responses. Figure 3

shows corresponding significant values. α = 0.05.

Diagram Types None vs.

Euler

None vs.

Causal

Euler vs.

Causal

Low Prob. 0.123 0.017 0.003

High Prob. 0.001 0.03 0.132

Explanation

Types

Control vs.

Cov

Control vs.

Mech

Cov vs.

Mech

Low Prob. 0.05 0.012 0.365

High Prob, 0 0.001 0.275

!"#$%&'1')'-"#,"2"3/,4'5/6$&+'7'28%'&976/,/4"8,'/,:':"/#%/.'407&+

High probability scenarios result in more confidence than low

ones, with p = 0.047. Moreover, comparing control conditions

(cn) versus covariational explanations– Euler diagrams (ve) yields

significant difference (p = 0.005 for low and p = 0.001 for high

probability). Similarly, comparing control conditions - Euler

circles (ce) versus covariational explanations– causal (vc)

diagrams yield significant difference (p = 0.041 for low and p =

0.036 for high probability). The latter conditions are higher in both

cases.

4. DISCUSSION

From obtained results, we accept hypothesis 1 and reject

hypothesis 2. In other words, users are more confident in

diagnoses with explanations and diagrams than the ones with none

of them. Rejection of hypothesis 2 might be due to a fact that

people are more comfortable with covariational explanations [3].

There are factors that this study did not consider: demographic

analysis, difference between probabilities and natural frequencies,

real medical situations and doctors as study subjects. We believe

by taking into account those factors, the study should produce

different outcomes.

References [1]. Ahn, W., & Kalish, C. 2000. The role of covariation vs. mechanism information in

causal attribution. In R. Wilson & F. Keil, Cognition and explanation (Vol. 54). MIT

Press.

[2]. Elting L., Martin C., Cantor S., & Rubenstein E. 1999. Influence of data display

formats on physician investigators' decisions to stop clinical trials: prospective trial with

repeated measures. British Medical Journal. 318(7197), 1527.

[3]. Nava Tintarev and Judith Masthoff. 2007. Effective explanations of

recommendations: user-centered design. In Proceedings of the 2007 ACM conference on

Recommender systems (RecSys '07). ACM, New York, NY, USA, 153-156.

[4]. Egea, J.M.O., & Gonzalez, M.V.R., (2010). Explaining physicians' acceptance of

EHCR systems: An extension of TAM with trust and risk factors. Computers in Human

Behavior, In Press"

!"#$%&';<'=&/,'82'%&+78,+&+'

The PeakScore Pipeline: Automated Peak Rating inSelected Ion Chromatograms

Justin A. DeBrabantBrown University

[email protected]

Samuel BirchBrown University

[email protected]

Arthur SalomonBrown [email protected]

ABSTRACT

We present a system for an automated analysis of individ-ual peak quality in selected ion chromatograms (SIC) thatallows real-time processing of data. We apply curve fittingtechniques to individual peaks in the SIC and use root meansquare deviation (RMSD) as a goodness of fit metric, whichwe call PeakScore. This automated quantitation of peakquality allows researchers to analyze the data that is beingcollected which is a significant speedup over the previousprocess of SIC peak analysis, manual inspection using ex-pert user intuition. We allow processing in batch mode usinga command line utility or individually using an integratedGUI.

1. BACKGROUND

In recent years, proteomics has emerged as a technique forbetter understanding the metabolic pathways of the cell,to which proteins are essential. A large part of proteomicsresearch is based around the technique of mass spectrometry(MS), a tool for measuring the molecular mass of moleculesin a sample. One of the interpretations of the MS data is theselected ion chromatogram (SIC). The SIC is a plot of timevs. intensity for a given m/z (mass over ion charge) value.In the chromatograms, the data quality varies greatly dueto uncontrollable external factors. Currently, users rely onexpert intuition to determine whether a peak is “good” or“bad” and whether or not the peak is comparable with otherexperiments. Because of the massive quantities of MS databeing produced in recent years, this manual inspection ofSIC peaks is no longer feasible. An automated quantitationmethod is needed to efficiently and accurately model peakquality. Our system provides both a metric that closelymodels expert user intuition of peak quality and can handlethe massive datasets in real-time.

2. METHODS

Our system uses several open-source APIs in the data con-version pipeline. The initial format is the Thermo RAW

CS237: Interdisciplinary Scientific Visualization 2010

RAW File Data Conversion MySQL

Peak Analysis MySQL

GUI

Figure 1: A diagram of the PeakScore Pipeline.

file, a propietarty binary format. We use the ProteoWizard[1] tool to convert from RAW to mzML. To convert frommzML to text, we use the OpenMS [2] API. Finally, we loadthe text data into a MySQL database with a Java programusing JDBC. For the PeakScore metric, we applied curve-fitting techniques and measured the goodness of fit of theresulting curve. We tried fitting several different curves tothe raw data points, and found that a gaussian best mod-eled the ideal peaks. We used the SciPy package in Pythonfor all the curve-fitting and goodness-of-fit analysis. Finally,in the interactive GUI, the user can specify a m/z value ofinterest, which will then be loaded from the database andanalyzed with the PeakScore metric. Similarly, a commandline tool allows batch processing of multiple SICs, indepen-dent of the GUI. The output of this batch processing tool isautomatically stored in a MySQL database for use later inthe protein analysis pipeline.

3. CONCLUSIONS

Our system received very positive reviews from our collabo-rators. Of particular importance was that our system couldpipeline data from RAW file to visual analysis in real time(i.e. less time than it took to run the experiment). Usersindicated that the PeakScore metric closely modeled theirexpert intuition of the underlying peak quality. Thus, oursystem has improved the efficiency of the data analysis whilemaintaining correctness with respect to manual peak inspec-tion. This automation of the analysis process will allow re-searchers to analyze their data on scale with the data beinggenerated.

4. REFERENCES[1] D. Kessner. Proteowizard: Open source software for

rapid proteomics tools development. Bioinformatics,24(21):2534–2536, 2008.

[2] O. Kohlbacher and K. Reinert. OpenMS and TOPP:Open source software for lc-ms data analysis. Proteome

Bioinformatics, 604(2):chap. 14, 2009.

1

Peak Quality Quantitation for Selected Ion Chromatograms

J. Debrabant∗, S. Birch†, A. Salomon‡

December 17, 2010

Abstract

We devloped a system to assist in interactively quan-titating peak quality for selected ion chromatograms(SICs). This allows for faster, or even automatedSIC quality quantitation, which in turn allows bi-ologists to more easily approach large datasets incontrast to slow manual classification relying on ex-pert labor. Our system works end-to-end, from theraw data of the mass-spectrometer to a visualization(or plain output) of a peak quality metric.

1 Background

This work aims to ease and automate this processesof assesing SIC quality by augmenting the SIC dis-play with an automatic quality metric and by pro-viding a more intuitive interface to do so. As pro-teomics has become increasingly important in biol-ogy scientists have worked to automate the processof measuring protein expression. The most effectivecontemporary method is through mass spectrome-try, where proteins are enzymatically broken downinto their component peptides, tagged with ions, andsampled. The result of this technique is effectivelya three-dimensional dataset: time (retention-time),intensity, and mass-charge ratio (m/z). A SIC is aplot of intensity over time for a given (small; merelyto account for measurement error) range of mass-charge ratio, and it represents the expression of agiven peptide over time. A set of SICs can thenbe used to probabalistically reconstruct the proteinsthey came from in the biological sample. SICs arenot entirely accurate themselves, however, and cur-rent processes rely on human intuition to providequality assesments for peptide presence. This workreplaces the element of intuition with an automatic

∗Brown University, [email protected]†Brown University, [email protected]‡Brown University, [email protected]

process.

2 Implementation

Our system is implemented in the following phases:

1. Conversion — We use the open-source Prote-oWizard to convert vendor-specific raw mass-spectrometric data to the mzML standard.

2. Data extraction — We use the open-sourceOpenMS library to interpret the mzML andfeed the data to a relational database.

3. Peak extraction — We extract the appropri-ate chromatogram from the database and findthe nearest peak to the user-supplied retention-time1 using a continuous wavelet transform, apeak identification algorithm.

4. Peak quantitation — We use the Levenberg-Marquardt algorithm to fit a three-parameter(µ, σ, and a scaling factor) Gaussian curve tothe extracted peak range, and combine multi-ple factors to give a quality metric.

5. Visualization — The user is presented withan interface to annotate SIC peaks given theSIC, peak retention-time, the fit curve, and thequality-of-fit metric.

3 Conclusion

Our system allows for the automation of a key pro-cess in the mass-spectrometry pipeline, which al-lows scientists to more rapidly and objectively tacklelarger datasets which weren’t assailable before. Ourcollaborator was very enthusiastic about our resultsbecause it allowed him to process unprecedentedamounts of data while integrating into his existingpipeline.

1This is supplied separately by the mass-spectrometer.

1

Interactive 2D Maps for Functional Brain Connectivity Queries

Steven R. Gomez∗

Figure 1: Interactive map of the efferent projections from the primarymotor area (MOp) in the rat brain.

ABSTRACT

We present and evaluate a method for visualizing directed neuronprojections in the brain using interactive maps of landmark brainparts. Our method improves on existing brain query tools by com-positing projections onto a reference map of brain structures; sci-entists familiar with brain anatomy may leverage this spatial orga-nization to execute and verify queries more quickly. A quantitativeevaluation suggests that our interactive maps may be faster at sometypes of brain projections queries than non-visual query tools with-out sacrificing accuracy, and that maps may be more enjoyable andeasy to use than non-visual tools.

1 BACKGROUND

Researchers in neural circuitry are concerned with understandingconnections between spatially distributed and functionally differ-entiated parts of the brain [1]. Insights from these connections maysuggest clinical applications and research directions for neurolog-ical disorders. The scale and complexity of available connectivitydata make it difficult to gain insight from individual, textual queries– the standard interface to circuitry databases in current tools likethe Brain Architecture Management System (BAMS) [2]. Recently,Bruckner et al. [3] presented a visual query system for the fruitfly brain, but the tool focuses on queries for volume sections inthe brain rather than representations of relationships between theseparts. Our map method, on the other hand, visually overlays func-tional structure on top of brain parts that serve as map landmarks.

2 METHODS

We used Java and the Prefuse toolkit to manage and drawprojection graphs. Test data included projections from theSwanson-1998 rat brain nomenclature available in XML on-line (http://brancusi.usc.edu/bkms/), which we rewrote into theGraphML format for use with graph exploration toolkits. The brainslice map background is available online (http://brainmaps.org). Inour tool, users can interactively navigate parts of interest and tog-gle incoming and outgoing neural projections as directed edges be-tween landmark nodes on the map.

2.1 User Study

We evaluated four computer science graduate students on accuracyand completion time for projection query tasks using both our maptool and BAMS. Each subject received approximately 10 minutes of

∗email: [email protected]

Projection Existence Projection Density0

50

100

150

200

250

300

350

400

450

Query Tasks

Mean T

ime (

sec)

Task Completion Times

Visual Maps

BAMS

Projection Existence Projection Density0

0.2

0.4

0.6

0.8

1

Query Tasks

Mean A

ccura

cy (

%)

Task Completion Accuracy

Visual Maps

BAMS

Figure 2: Results of query completion time (left) and accuracy (right)on prepared sets of query tasks for interactive maps and BAMS. Errorbars show ± 1 standard deviation.

training with both tools. Subjects were asked to solve two tasks asquickly as possible on each tool, each with a different set of queriesto avoid ordering bias via a Latin squares design. Tasks included:

T1: Projection existence. We asked users to answer 8 questionswith the form “Does part X project to part Y?” and “Does Xreceive a projection from part Z?”

T2: Projection density estimation. We asked users to order setsof brain parts by the number of efferent projections of each.Subjects were shown 3 sets of varying difficulty, containing3 brain parts each. For accuracy, subjects earned a point foreach pair in the set that was correctly ordered (3 possible).

Finally, we collected anecdotal feedback and had subjects rate eachtool’s usability and enjoyability on a Likert scale of 1 to 5.

3 RESULTS AND CONCLUSIONS

Results from our user study are shown in Figure 2. Users were sig-nificantly faster with the map tool than with BAMS on the densityestimation task without significantly sacrificing accuracy. This maybe due to users preferring a counting strategy with BAMS’s resultlists, which is more difficult on the map, where faster estimationtechniques may have been used. On the projection existence task,users were on average faster using BAMS, though not significantly.Some users noted that browser proficiency allowed them to navi-gate HTML results faster than in the map tool. Many strategies byusers in BAMS could be enabled by further feature development inthe map interface.

We were encouraged that surveyed users found our map, on av-erage, easier and more fun to use than BAMS. We believe a hybridapproach with textual and visual result-sets may help users performfaster and more accurately on each query task.

REFERENCES

[1] J. W. Bohland, C. Wu, et al. A proposal for a coordinated effort for

the determination of brainwide neuroanatomical connectivity in model

organisms at a mesoscopic scale. PLoS Comput Biol, 5(3):e1000334,

03 2009.

[2] M. Bota, H.-W. Dong, and L. Swanson. Brain Architecture Manage-

ment System. Neuroinformatics, 3(1):15–48, 2005.

[3] S. Bruckner, V. Solteszova, E. Groller, et al. Braingazer – visual queries

for neurobiology research. IEEE Transactions on Visualization and

Computer Graphics, 15:1497–1504, 2009.

Download - Geophysical Anomaly Detection for Archaeology using …cs.brown.edu/courses/cs237/2010/proceedings2010.pdf · 2012. 8. 24. · ways are always better than one. Southeastern Archaeology,

Top Related