Geophysical Anomaly Detection for Archaeology using
Multimodal Volume Visualization
Andrew LoomisComputer Science, Brown University
Tyler ParkerComputer Science, Brown University
Margaret WatersJoukowsky Institute, Brown University
Abstract—We present a multimodal volume visualization technique
that allows for simultaneous interactive exploration of multiple geophys-
ical surveys to aid in the detection and analysis of subsurface anomalies
at archaeological dig sites. An initial survey of archaeological students
revealed that our proposed technique is preferred for visualizing Ground
Penetrating Radar (GPR) and Electrical Resistivity Imaging (ERI) over
existing techniques used in software packages such as Radan and Geoplot.
I. INTRODUCTION
Archaeologists use multiple geophysical surveys to noninvasively
measure different geophysical properties and identify buried features
at archaeological sites. Current software tools used to analyze and
visualize these surveys are targeted primarily towards a single sur-
vey type. In addition, these tools do not fully exploit the three-
dimensional nature of some of these datasets, and instead prefer to
treat them as two-dimensional slices. Previous work has show that the
integration of multiple surveys into a single visualization is a more
effective technique for the exploration of a particular site than any
one survey alone [5], [1], [3]. Additional work has also been done to
visualize these datasets in a three-dimensional space [6], [2], [4]. We
approach both issues within the framework of multimodal volume
visualization, which allows us to highlight even subtle features of
the subsurface.
II. METHODS
Our project is focused on the integrated visualization of two
particular types of geophysical surveys, GPR and ERI. Both are
collected as two-dimensional slices arranged on a grid. The slices
combine to form three-dimensional scalar volumes. Our general
approach was to use volumetric ray casting on modern graphics
hardware to render both the GPR and ERI volumes simultaneously.
The color at any given point in the space was assigned using the
values of each volume as indices into a multidimensional transfer
function.
In order to evaluate the effectiveness of our multimodal visual-
ization, we enlisted an undergraduate archaeology class. We pre-
sented them with data from the Catholme Ceremonial Complex, an
archaeological dig site in the United Kingdom. The students had
previously been investigating this site through the use of two software
applications, Radan and Geoplot. The central features of this site
include a ring ditch, postholes, and plow furrows [6]. We gave each
student several minutes to explore the site using our application. They
were able to control several aspects of the visualization including
the density, lighting, transfer function lookup, and scaling between
the two datasets. This allowed them to both highlight and diminish
different features within the volumes. After they were finished we
gave each student a short survey asking them to compare different
aspects of our tool with their current tools by rating statements on a
Likert scale.
III. RESULTS
When asked if the proposed software was more effective for
identifying archaeological features than their current software, 62%
Fig. 1. Combined GPR and ERI volume visualization. The diagonal blueline is a medieval plow furrow.
felt that it was for GPR data and 50% felt that it was for ERI
data. When asked if the proposed software was more effective at
visualizing features, 87% felt that it was for GPR data and 62%
felt that it was for ERI data. All students felt that visualizing both
datasets simultaneously was more effective than visualizing either
dataset alone.
IV. CONCLUSION
The results from the survey are promising, but there are a number
of factors that may have influenced them. The students themselves
were not experts in the field and potentially lack experience in
analyzing these types of data. In addition, they were already familiar
with the test data using their current software. Finally the proposed
software had a user interface that was difficult to learn, which may
have hindered their ability to use the application. Overall there was
a great deal of enthusiasm among the students for continuing work
in this area.
REFERENCES
[1] R.B. Clay. Complimentary Geophysical Survey Techniques: why twoways are always better than one. Southeastern Archaeology, pages 31–43, 2001.
[2] S. Kadioglu and J.J. Daniels. 3D visualization of integrated groundpenetrating radar data and EM-61 data to determine buried objects andtheir characteristics. Journal of Geophysics and Engineering, 5:448, 2008.
[3] K.L. Kvamme. Integrating Multidimensional Geophysical Data. Archae-
ological Prospection, 13(1):57–72, 2006.[4] L. Nuzzo, G. Leucci, S. Negri, M.T. Carrozzo, and T. Quarta. Appli-
cation of 3D Visualization Techniques in the Analysis of GPR Data forArchaeology. Annals of Geophysics, 45(2), 2002.
[5] S. Piro, P. Mauriello, and F. Cammarano. Quantitative Integration ofGeophysical Methods for Archaeological Prospection. Archaeological
Prospection, 7(4):203–213, 2000.[6] M.S. Watters. Geovisualization: An Example from the Catholme Cere-
monial Complex. Archaeological Prospection, 13(4):282–290, 2006.
���������AB����CAB�D���E���FA��A�C����D���A��A���A
����C�������A�F��D�D
�F��C�A�����D�A���C�A���C��A����C�A ��C�D
���������A�B�CDE�F����A��AE���
�����������A�C���F������A��B������C���E���C������A�B����������
����A��!�������A��B������C����������C��"��#��A���A������F������!�����C���
CA��CDCF�AA!�D��AC"�������FA����A���A�C�DA
���C�����D�DAF!���CA�������A�F�C���C�A�������CA
�C����D���A��ADC�D#A��CA����A�DAC!���C�A���A��CA
�DC���FCDDA��A��DA���������A�C��F�$�CDA���A��A��A!�D����A
CF�F�CAD�%��CAD�%D����CAF����CDAF�A�F��CDCA�FC&DA%�����A
��A$������AF�A�����C��A��CF����AF�A��DD���A���C�������A
�C���CD#A CAD���C�A�������A�DC�AD����A���A��CDCA�C��F�$�CDA
�CAC��C���!CA��CFA����C�A��A��'AF�A(')A��A����C��C�A
����A���C�������AD��CD#A��������A���A!����CA!�D���E���F�A�CD�D��!���A!����CA
!�D���E���F�A���C������A���������
)#A*������F�
���C�FA���C�����D�DA�C��A�C!���A�FAF�F+�F!D�!CA
�C����D���AD��!C�DA�FA���C�A��A%C��C�A�F�C�D�F�A��CA
D�%D����CAD�������CA��A��CA���C�������AD��CAF�A��DD�%��A
��CF����A�C���CDA��A�F�C�CD�#A��D�A����CF���A!��%�CAD�����CA
���A�C��F�A���DA��A����DCDA�FA��CA!�D���E���FAF�AF��D�DA
��AAD�F��CA��ADC�A���C#A,��A�����DC�AD�����CA�DCDA!����CA
!�D���E���FA�C��F�$�CDA��A�����CAF�AF��ECA���ADC�DA��A
��A�A��CAD�CA���C#
))#AA�C����D
-��A���AD�����A�CA�DC�A����F�A�CFC����F�A.��'/A
F�A(�C������A'CD�D��!���A)���F�A.(')/A��A����C��C�A����A
��CA0�����CA0C�C��F��A0����C"�A�����A�DAFA
���C�������AD��CA���A��D�DAA����C����FA��A�����A��F��CF�D#A
��CDCA��A�C�CA�����F���A�FA��CA����A��A���CAD���CDAF�A
�C�CA��CFA����CDDC�A�F��A1+���CFD��F�A�D�C�A!����CD�AF�AA
����C�CF�A�����A��FD�C�A��F����FA�DA����C�A��A%���A��A
!����CD#
���A��CA�����AA�DC�A�FA!�C�AC���C�A��ADC�A
�F��!�������A��A�FA!���F�AD��CDA��AD��C�����D����F#A(��A
��ADC�A�FA�!CA!�D��A����%��CDA�2�D�C��AD���ADA%�����FCDD�A
!����CAD���F��AF�A�CFD���#A��CA����A�D�A����DAA�DC�A��A
�C��FCA��CA���CFD��FDA��AAD���CA���A�FA%CA��!C�A�������A��CA
!����C.D/#
�A�DC�A����AD����A�DA��F����C��A��C�CA�������F�DA
�DC�A��CA�����DC�AD�����CA��A��CF����A���C�������A�C���CD�A
D���ADAA����A���DAF�A��!C�A�C��D��D#A��CDCA�DC�DA�C�CA
��C��A������A����A��CA��A����A��CA���C�������AD��C�A
�D�F�A��CA����DA'�FA.��'/AF�A�C�����A.(')/A��AF��ECA��CA
��#A��CA�DC�DA�C�CA��CFA��!CFAAD��!C�A������F�A���A
��������FA��A��CAD�����CA��C�A����CF���A�DC#A�������F�DA�FA
��CAD����A�C�CA��CFAD�C�A��A��CA�FAA���C��A3��CA��A���A
C"�CF�A��C�A��CC�A����A��!CFAD��C�CF�D#
)))#AA'CD���D
'CD���DA��A��CAD����AD���C�A���A45#67A��A��CA�DC�DA
�C��A���A���A��������FA�DA���CAC��C���!CA��FA��C��A����CF�A
D�����CA�A!�D���E�F�A(')A���A89#67A��C�A���A��������FA
DA���CAC��C���!CA��FA��C��A����CF�AD�����CA�A!�D���E�F�A
��'A��#A:;;7A-C��A���A��CA���%�FC�A!�C�A��A��'AF�A(')A
�DA���CAC��C���!CA��FA!�C��F�AC��A��ADC�A�F��!������#
)B#A��D��DD��F
���%��DA��A���DAD����A�C�CA���A�CA�DC�AD���CF�D�A
���C�A��FA����CDD��F�D�A��AC!���CA���AD�����C#A��CDCA
D���CF�DA�C�CA��C��A��CA��A��CA�C���CDA��C�AD�����A%CA
�����F�A����AF�A��C��AC!�����FA�DAD�%2C���!C#A��CA<)A�DA
�����C��AF�A��A�!CA�F�����C�A���AD�D�C�#A���A��CDCA
���F�DA�����A�!CA��DD�%��A��F���F�C�A���A�CD���D#
B#A0�F���D��F
CA��CDCF�C�AA����A���A!�D���E�F�A��'AF�A(')A
��A�D�F�A�����+!����CA�CF�C��F�A�C��F�$�CD#A CAD���C�A
���A��CDCA�C��F�$�CDA�CAC��C���!C�A��CFA����C�A��A��A
����C��C�A����A���C�������AD��CD�A���A��D��!C��F�AF�A
C"��D�F�AD�%��CA���C�������A�C���CDA�����FA��CA��#
B)#A'(-('(=0(3
>:?A0��A #AF�A3�D�A�#A@��A)F�C���"�F�AF�A�����+
!����CA'CF�C��F�#@A"��$�CA�����$���B�%����A:8A.:AAA/BA
16A+14A#
>5?A0���A'#A*#A@0����C�CF���A�C����D���A3��!C�A
�C��F�$�CDBA���A���A��DA�CA���DA%C��C�A��FA�FC#@A
&��C�A�BCA���'����A����D�A5;;:BA1:+C1#
>1?AD!��C�AD#�#A@)F�C����F�A��������CFD��F�A�C����D���A
��#@A'����A���������F��B$A�C���A.E��FA ��C�AFA3�FD/A:1�A
F�#A:A.5;;4/BA69++95#
>C?A=�EE��A�#AF�A�C�����A�#AF�A=C����A3#AF�A0���EE��A�#�#A
F�AG����A�#A@���������FA��A1�AB�D���E���FA�C��F�$�CDA�FA
��CA�F��D�DA��A��'A��A���A����C�����#@A'����B��� �
�A�$�DB��BAC6�AF�#A5A.5;;5/#
>6?A'�%�C��A-#AF�A�C2��A(#AF�A-F��C�C��A�#AF�A(����A�#A
F�ADF����A�#A@��<+%DC�A�����+!����CA�CF�C��F�A���A��CA
!�D���E���FA��A��F����F�A%��FA���CD#@AF���AA����B��� �
&��(�B�A5;;4#A1;6++1:8#
>4?A3C��AD���H��AF�AEC���C�AEA�F�C�DA@1�A!�D���E���FA��A
�F�C���C�A����F�A�CFC����F�A���A��AF�A(�+4:A��A��A
�C�C���FCA%���C�A�%2C��DAF�A��C��A�����C��D���D@AA5;;8AE#A
�C����D#A(F�#ABACC8
>9?A�F��AI#AF�A����CFDC�C��AE#�#AF�A*��AJ#�#AF�A����A
�#�#A@�A�����+B����CAB�D���E���FA-��C����A���A3����A
����FC�AB����CDA��C�A1�K1�A)��CA'C��D�����F#@A
�������C����&��A��A�����!����AA�����)�"�&!*�A�C�A5;;ABA
169:++169C#
Visualization of Hyperspectral Images Through Interactive Sections
Ryan P. Cabeen, Department of Computer Science, Brown University
Introduction: The presented work consists of a novel
method for visualizing hyperspectral images and an eval-
uation with expert users in the context of minerology.
Advances in remote sensing and progress in scientific
missions to collect data of the Earth, Moon and Mars
are creating a need for useful visualization methods.
In particular, hyperspectral images are challenging
to visualize due to their high dimensionality, and the
presented method addresses this by reducing the spatial
data dimension to a path in the image and rendering a
visualization of the spectra along this path.
Method: Each hyperspectral image consists of a set of
images of the same scene taken at different wavelengths,
which commonly described as an image stack or cube.
Since the number of spectral measurements will typically
vastly outnumber the number of colors that can be sensed
by the human visual system, these images cannot be
visualized directly. A variety of tools exist for spectral di-
mensionality reduction, often by algebraic manipulation
of certain bands or decomposition of the entire spectrum
into several representitive groups. Spatial dimensionality
reduction techniques exist, but are limited to point
sampling and statistics on simple regions of interest.
The presented work is a spatial dimensionality reduc-
tion technique that interactively visualizes the full spec-
trum along a path, which can be imagined as a piecewise-
linear section through the image in the spatial domain.
The user chooses a series of points in spatial coordinates
to define the path, and the spectra at regular intervals of
this path are sampled. This collection of spectra is then
rendered as an image, where one dimension is the spec-
tral band, one dimension is the arclength along the path,
and the color is a mapping hyperspectral image intensi-
ties in a given band at a given arclength. The path and
color map can be edited interactively, and a traditional
point sampled spectrum is provided additionally. The
views of the spatial domain and section visualization are
linked to allow the user to track features across displays.
Implementation: The work is available as a cross-
platform and open-source software package written in
C++. Many image file formats are supported through
use of the Geospatial Data Abstraction Library (GDAL).
Qt 4 and Qt Widgets for Technical applications (Qwt)
are used for the user interface. The sampling of spectra
along the path is performed by parameterizing the curve
by arclength and bilinearly resampling at a rate of one
sample per pixel.
Results: The computational demands of curve editing,
visualization and sampling were found to be practical for
real-time applications, and the test data sets were found
to be compatible with the GDAL library.
The tool was used in an informal evaluation by mem-
bers of the Brown Department of Geology in the context
of minerological applications. They found the tool to be
generally useful and expect it to be used in the course
of research. In the context of minerology, the general
shape variation of thermal infrared spectra was found
to be readily visualized with the tool. However, the
benefits for near-infrared spectra were found to be fewer,
due to the localization of the absorbtion bands. Several
potential additions were suggested, including rendering
the spectra relative to a given spectrum, supporting
multiple paths, and overlaying an image that has been
segmented by spectral dimension reduction methods.
Discussion: Due to the nature of hyperspectral images,
there is no natural way of visualizing an entire dataset.
Consequently, a variety of methods must be employed.
The presented work aims to address the reduction of
spatial domain, allowing visualization of the full spec-
trum at more than a single point. In addition to a variety
of methods, there are many types of data and scenes
that influence the efficacy of a tool. Hence, a variety of
tools and an understanding of their relative strengths and
weaknesses is necessary to address each scenario.
References
[1] NP Jacobson. Design goals and solutions for display of
hyperspectral images. Geo. and Rem. Sens., IEEE Trans.,
43(11), 2005.
[2] M Torson. Interactive image cube visualization and
analysis. In VVS, pages 33–38, 1989.
[3] J Zhang. Vdm-rs: A visual data mining system for
exploring and classifying remotely sensed images. Comp.
Geosci., 35(9):1827–1836, 2009.
1. Both researchers are at Brown University, Providence RI, 02912, and would like to thank Steven Sloman, Nathan Seigel, David Laidlaw, and Laidlaw’s CSCI2370 graduate course for their helpful comments during the course of this investigation.
!
Assessing confidence in mechanistic and covariational explanations and diagrams in medical diagnosis
Gideon Goldin
Cognitive, Linguistic & Psychological Sciences
Diem Tran ([email protected])
Computer Science
Abstract
We conducted a study comparing the relative effects that
covariational (COV) and mechanistic (MECH) explanations,
and their corresponding diagrams (Euler and causal), have on
peoples’ confidence ratings for hypothetical diagnoses. We
empirically demonstrated that providing an explanation leads to
more confidence than not providing one. In addition, the
inclusion of Euler circles produced similar findings, where
diagnoses with these diagrams obtained better ratings than
those without them, though these ratings were not significantly
higher than those for diagnoses using causal diagrams.
1 Introduction
COV explanations rely on statistical information that
appeals to the co-occurrences of variables (e.g., Amy wrecked her car because she often gets into motor accidents), while
MECH explanations provide a causal link that explains how one
event entails another (e.g., Amy wrecked her car because she
was drunk). Here we compared these two types of explanations,
along with their diagrammatic counterparts (Euler circles and
causal diagrams) against each other and against a lack of
explanation/diagram in order to assess the levels-of-confidence
each evokes in a medical diagnosis task. We predicted that the
provision of an explanation would result in increased
confidence.
2 Related Work
The slow adoption of decision-support systems by doctors
may be due to a lack of trust or confidence in proposed
diagnoses. We hypothesize that this is due to issues with
uncertainty visualization in tasks involving risk representation.
The roles of various explanation types in expert systems have
been addressed [1] but we are not aware of any work that
compares explanation and diagram types in eliciting confidence.
3 Experiment
Design & Materials. Our design entails a 3-way, mixed model ANOVA. Both explanation (none, COV, MECH) and
diagram (none, Euler diagram, causal diagram) were within
subject, while the group variable likelihood (high, low) was
between-subject. Variables were crossed to form 18 conditions.
Each subject provided a confidence rating from 1 (not
confident at all) to 5 (very confident) for each of 9 novel,
hypothetical, counter-balanced diagnoses. For example, consider
a high-likelihood COV explanation with no accompanying
diagram: “A patient has red, square-like bumps on their body. They may or may not have Quandres’ Condition. Diagnosis: It is likely that the patient has Quandres’ Condition because people that have red, square-like bumps on their bodies usually have Quandres’ Condition.” Participants & Methods. A total of 113 U.S. subjects 18 years of age or older completed the study. Recruitment took
place online at Mechanical Turk [2] where subjects were
compensated $0.10 for their participation, which on average
required less than three minutes. Subjects were randomly
assigned to the high- or low-likelihood group. We removed 16 people from analyses because they provided identical ratings
across all items. Thus, 52 subjects constituted the low-likelihood
group—45 for the high-likelihood group.
Results. A main effect of group was found, with the high-
likelihood group showing more confidence than the low, p=.047.
Overall, providing no explanation resulted in less confidence
than providing COV or MECH ones, p<.0001. Providing no
diagram led to lower ratings than providing Euler diagrams
(p=.001), but not causal ones. Examining the cases in which the
explanation type and diagram type match reveals that ratings for
the no explanation/diagram condition (M=2.84; SD=0.13) were
significantly smaller than those for the COV/Euler condition (M=3.30; SD=0.10; p<.0001) and the MECH/causal condition
(M=3.28; SD=0.12; p=0.001). The difference between the latter
two conditions was not significant, p=.093.
!
Figure 1. Ratings for all 18 conditions.
4 Discussion
Though a group effect was not part of our prediction, we report it nonetheless as it implies that confidence may be related
to posterior probabilities in diagnosis. Notably, it seems that
providing a causal diagram for an unlikely diagnosis actually
hampers confidence. In general however, providing an
explanation (or even just an Euler diagram) greatly increases
confidence. The implications of these findings are important not
only for medicine, but for other areas of risk analysis as well,
such as public policy and scientific reasoning.
5 References
[1] C. Lacave and F. J. Díez. A review of explanation methods for Bayesian networks. The Knowledge Engineering
Review, 17(02):107-127, 2003.
[2] Amazon Mechanical Turk,
http://aws.amazon.com/mturk/. Retrieved 11-10-2010.
Mechanistic, covariational explanations and diagrams in medical diagnosis
Gideon Goldin – Diem Tran – Steven Sloman. Brown University – Fall 2010
Abstract – We hypothesize that 2 different kinds of explanations and diagrams will result in higher confidence in making medical diagnoses.
We conduct a user study to justify the hypotheses. Results from the study show significant difference in confidence when users are presented
with explanations and diagrams versus none of them.
1. BACKGROUND
In medical diagnosis, uncertainty occurs due to the natural
complexity of diseases and symptoms. Several expert systems
have been developed to help doctors in making rational diagnosis
by representing medical data in different formats, and several
researches have proved that different representation yields
different judgements [2]. We focus on 2 types of explanations
when describing uncertainty: covariational and mechanistic. The
corresponding diagrams for those 2 types are Euler and causal
diagrams.
Covariational explanation involves the coexistence of a
symptom and a disease, with some probability. For example: If
90% of time a person eats cheese and gets a stomachache, he is
likely to be lactose intolerant.
Mechanistic explanation describes the mechanism behind the
formation of a symptom caused by a disease. For example:
Patients with Alzheimer's disease have dementia because of
development of abnormal features harmful to the brain.
a. Contribution
Effects of certain types of explanations on medical diagnosis are
still in question, as physicians are hesitant to adopt expert systems
[4]. We investigate 2 discussed explanations by conducting a user
study. Results from the study provides basis to deeper research in
the field, which promises to help doctor in making medical
decisions.
b. Hypotheses
Hypothesis 1: Users give higher confidence ratings for diagnoses
with explanations and diagrams versus the control ones.
Hypothesis 2: Mechanistic explanations with causal diagrams
yield higher confidence than for covariational explanations with
Euler circles, as they are psychologically natural [1].
2. STUDY DESIGN
!The study is a 2x3x3 Mixed Model Design: High / Low
probability (Between subjects) x Explanation Type x Diagram
Type (Within Subjects) (Figure 2). We use made-up scenarios with
ordinary subjects
on Mechanical
Turk. High and low
probability
scenarios are
divided equally
among subjects.
For each condition,
a subject uses 5-
points Likert scale
to rate his
confidence on the
explanation.
!"#$%&'(')'*&+"#,'-$../%0
3. RESULTS
N = 97 after we filtered responses take less than 1 minute or have
all same answers. Figure 1 shows the mean responses. Figure 3
shows corresponding significant values. α = 0.05.
Diagram Types None vs.
Euler
None vs.
Causal
Euler vs.
Causal
Low Prob. 0.123 0.017 0.003
High Prob. 0.001 0.03 0.132
Explanation
Types
Control vs.
Cov
Control vs.
Mech
Cov vs.
Mech
Low Prob. 0.05 0.012 0.365
High Prob, 0 0.001 0.275
!"#$%&'1')'-"#,"2"3/,4'5/6$&+'7'28%'&976/,/4"8,'/,:':"/#%/.'407&+
High probability scenarios result in more confidence than low
ones, with p = 0.047. Moreover, comparing control conditions
(cn) versus covariational explanations– Euler diagrams (ve) yields
significant difference (p = 0.005 for low and p = 0.001 for high
probability). Similarly, comparing control conditions - Euler
circles (ce) versus covariational explanations– causal (vc)
diagrams yield significant difference (p = 0.041 for low and p =
0.036 for high probability). The latter conditions are higher in both
cases.
4. DISCUSSION
From obtained results, we accept hypothesis 1 and reject
hypothesis 2. In other words, users are more confident in
diagnoses with explanations and diagrams than the ones with none
of them. Rejection of hypothesis 2 might be due to a fact that
people are more comfortable with covariational explanations [3].
There are factors that this study did not consider: demographic
analysis, difference between probabilities and natural frequencies,
real medical situations and doctors as study subjects. We believe
by taking into account those factors, the study should produce
different outcomes.
References [1]. Ahn, W., & Kalish, C. 2000. The role of covariation vs. mechanism information in
causal attribution. In R. Wilson & F. Keil, Cognition and explanation (Vol. 54). MIT
Press.
[2]. Elting L., Martin C., Cantor S., & Rubenstein E. 1999. Influence of data display
formats on physician investigators' decisions to stop clinical trials: prospective trial with
repeated measures. British Medical Journal. 318(7197), 1527.
[3]. Nava Tintarev and Judith Masthoff. 2007. Effective explanations of
recommendations: user-centered design. In Proceedings of the 2007 ACM conference on
Recommender systems (RecSys '07). ACM, New York, NY, USA, 153-156.
[4]. Egea, J.M.O., & Gonzalez, M.V.R., (2010). Explaining physicians' acceptance of
EHCR systems: An extension of TAM with trust and risk factors. Computers in Human
Behavior, In Press"
!"#$%&';<'=&/,'82'%&+78,+&+'
The PeakScore Pipeline: Automated Peak Rating inSelected Ion Chromatograms
Justin A. DeBrabantBrown University
Samuel BirchBrown University
Arthur SalomonBrown [email protected]
ABSTRACT
We present a system for an automated analysis of individ-ual peak quality in selected ion chromatograms (SIC) thatallows real-time processing of data. We apply curve fittingtechniques to individual peaks in the SIC and use root meansquare deviation (RMSD) as a goodness of fit metric, whichwe call PeakScore. This automated quantitation of peakquality allows researchers to analyze the data that is beingcollected which is a significant speedup over the previousprocess of SIC peak analysis, manual inspection using ex-pert user intuition. We allow processing in batch mode usinga command line utility or individually using an integratedGUI.
1. BACKGROUND
In recent years, proteomics has emerged as a technique forbetter understanding the metabolic pathways of the cell,to which proteins are essential. A large part of proteomicsresearch is based around the technique of mass spectrometry(MS), a tool for measuring the molecular mass of moleculesin a sample. One of the interpretations of the MS data is theselected ion chromatogram (SIC). The SIC is a plot of timevs. intensity for a given m/z (mass over ion charge) value.In the chromatograms, the data quality varies greatly dueto uncontrollable external factors. Currently, users rely onexpert intuition to determine whether a peak is “good” or“bad” and whether or not the peak is comparable with otherexperiments. Because of the massive quantities of MS databeing produced in recent years, this manual inspection ofSIC peaks is no longer feasible. An automated quantitationmethod is needed to efficiently and accurately model peakquality. Our system provides both a metric that closelymodels expert user intuition of peak quality and can handlethe massive datasets in real-time.
2. METHODS
Our system uses several open-source APIs in the data con-version pipeline. The initial format is the Thermo RAW
CS237: Interdisciplinary Scientific Visualization 2010
RAW File Data Conversion MySQL
Peak Analysis MySQL
GUI
Figure 1: A diagram of the PeakScore Pipeline.
file, a propietarty binary format. We use the ProteoWizard[1] tool to convert from RAW to mzML. To convert frommzML to text, we use the OpenMS [2] API. Finally, we loadthe text data into a MySQL database with a Java programusing JDBC. For the PeakScore metric, we applied curve-fitting techniques and measured the goodness of fit of theresulting curve. We tried fitting several different curves tothe raw data points, and found that a gaussian best mod-eled the ideal peaks. We used the SciPy package in Pythonfor all the curve-fitting and goodness-of-fit analysis. Finally,in the interactive GUI, the user can specify a m/z value ofinterest, which will then be loaded from the database andanalyzed with the PeakScore metric. Similarly, a commandline tool allows batch processing of multiple SICs, indepen-dent of the GUI. The output of this batch processing tool isautomatically stored in a MySQL database for use later inthe protein analysis pipeline.
3. CONCLUSIONS
Our system received very positive reviews from our collabo-rators. Of particular importance was that our system couldpipeline data from RAW file to visual analysis in real time(i.e. less time than it took to run the experiment). Usersindicated that the PeakScore metric closely modeled theirexpert intuition of the underlying peak quality. Thus, oursystem has improved the efficiency of the data analysis whilemaintaining correctness with respect to manual peak inspec-tion. This automation of the analysis process will allow re-searchers to analyze their data on scale with the data beinggenerated.
4. REFERENCES[1] D. Kessner. Proteowizard: Open source software for
rapid proteomics tools development. Bioinformatics,24(21):2534–2536, 2008.
[2] O. Kohlbacher and K. Reinert. OpenMS and TOPP:Open source software for lc-ms data analysis. Proteome
Bioinformatics, 604(2):chap. 14, 2009.
1
Peak Quality Quantitation for Selected Ion Chromatograms
J. Debrabant∗, S. Birch†, A. Salomon‡
December 17, 2010
Abstract
We devloped a system to assist in interactively quan-titating peak quality for selected ion chromatograms(SICs). This allows for faster, or even automatedSIC quality quantitation, which in turn allows bi-ologists to more easily approach large datasets incontrast to slow manual classification relying on ex-pert labor. Our system works end-to-end, from theraw data of the mass-spectrometer to a visualization(or plain output) of a peak quality metric.
1 Background
This work aims to ease and automate this processesof assesing SIC quality by augmenting the SIC dis-play with an automatic quality metric and by pro-viding a more intuitive interface to do so. As pro-teomics has become increasingly important in biol-ogy scientists have worked to automate the processof measuring protein expression. The most effectivecontemporary method is through mass spectrome-try, where proteins are enzymatically broken downinto their component peptides, tagged with ions, andsampled. The result of this technique is effectivelya three-dimensional dataset: time (retention-time),intensity, and mass-charge ratio (m/z). A SIC is aplot of intensity over time for a given (small; merelyto account for measurement error) range of mass-charge ratio, and it represents the expression of agiven peptide over time. A set of SICs can thenbe used to probabalistically reconstruct the proteinsthey came from in the biological sample. SICs arenot entirely accurate themselves, however, and cur-rent processes rely on human intuition to providequality assesments for peptide presence. This workreplaces the element of intuition with an automatic
∗Brown University, [email protected]†Brown University, [email protected]‡Brown University, [email protected]
process.
2 Implementation
Our system is implemented in the following phases:
1. Conversion — We use the open-source Prote-oWizard to convert vendor-specific raw mass-spectrometric data to the mzML standard.
2. Data extraction — We use the open-sourceOpenMS library to interpret the mzML andfeed the data to a relational database.
3. Peak extraction — We extract the appropri-ate chromatogram from the database and findthe nearest peak to the user-supplied retention-time1 using a continuous wavelet transform, apeak identification algorithm.
4. Peak quantitation — We use the Levenberg-Marquardt algorithm to fit a three-parameter(µ, σ, and a scaling factor) Gaussian curve tothe extracted peak range, and combine multi-ple factors to give a quality metric.
5. Visualization — The user is presented withan interface to annotate SIC peaks given theSIC, peak retention-time, the fit curve, and thequality-of-fit metric.
3 Conclusion
Our system allows for the automation of a key pro-cess in the mass-spectrometry pipeline, which al-lows scientists to more rapidly and objectively tacklelarger datasets which weren’t assailable before. Ourcollaborator was very enthusiastic about our resultsbecause it allowed him to process unprecedentedamounts of data while integrating into his existingpipeline.
1This is supplied separately by the mass-spectrometer.
1
Interactive 2D Maps for Functional Brain Connectivity Queries
Steven R. Gomez∗
Figure 1: Interactive map of the efferent projections from the primarymotor area (MOp) in the rat brain.
ABSTRACT
We present and evaluate a method for visualizing directed neuronprojections in the brain using interactive maps of landmark brainparts. Our method improves on existing brain query tools by com-positing projections onto a reference map of brain structures; sci-entists familiar with brain anatomy may leverage this spatial orga-nization to execute and verify queries more quickly. A quantitativeevaluation suggests that our interactive maps may be faster at sometypes of brain projections queries than non-visual query tools with-out sacrificing accuracy, and that maps may be more enjoyable andeasy to use than non-visual tools.
1 BACKGROUND
Researchers in neural circuitry are concerned with understandingconnections between spatially distributed and functionally differ-entiated parts of the brain [1]. Insights from these connections maysuggest clinical applications and research directions for neurolog-ical disorders. The scale and complexity of available connectivitydata make it difficult to gain insight from individual, textual queries– the standard interface to circuitry databases in current tools likethe Brain Architecture Management System (BAMS) [2]. Recently,Bruckner et al. [3] presented a visual query system for the fruitfly brain, but the tool focuses on queries for volume sections inthe brain rather than representations of relationships between theseparts. Our map method, on the other hand, visually overlays func-tional structure on top of brain parts that serve as map landmarks.
2 METHODS
We used Java and the Prefuse toolkit to manage and drawprojection graphs. Test data included projections from theSwanson-1998 rat brain nomenclature available in XML on-line (http://brancusi.usc.edu/bkms/), which we rewrote into theGraphML format for use with graph exploration toolkits. The brainslice map background is available online (http://brainmaps.org). Inour tool, users can interactively navigate parts of interest and tog-gle incoming and outgoing neural projections as directed edges be-tween landmark nodes on the map.
2.1 User Study
We evaluated four computer science graduate students on accuracyand completion time for projection query tasks using both our maptool and BAMS. Each subject received approximately 10 minutes of
∗email: [email protected]
Projection Existence Projection Density0
50
100
150
200
250
300
350
400
450
Query Tasks
Mean T
ime (
sec)
Task Completion Times
Visual Maps
BAMS
Projection Existence Projection Density0
0.2
0.4
0.6
0.8
1
Query Tasks
Mean A
ccura
cy (
%)
Task Completion Accuracy
Visual Maps
BAMS
Figure 2: Results of query completion time (left) and accuracy (right)on prepared sets of query tasks for interactive maps and BAMS. Errorbars show ± 1 standard deviation.
training with both tools. Subjects were asked to solve two tasks asquickly as possible on each tool, each with a different set of queriesto avoid ordering bias via a Latin squares design. Tasks included:
T1: Projection existence. We asked users to answer 8 questionswith the form “Does part X project to part Y?” and “Does Xreceive a projection from part Z?”
T2: Projection density estimation. We asked users to order setsof brain parts by the number of efferent projections of each.Subjects were shown 3 sets of varying difficulty, containing3 brain parts each. For accuracy, subjects earned a point foreach pair in the set that was correctly ordered (3 possible).
Finally, we collected anecdotal feedback and had subjects rate eachtool’s usability and enjoyability on a Likert scale of 1 to 5.
3 RESULTS AND CONCLUSIONS
Results from our user study are shown in Figure 2. Users were sig-nificantly faster with the map tool than with BAMS on the densityestimation task without significantly sacrificing accuracy. This maybe due to users preferring a counting strategy with BAMS’s resultlists, which is more difficult on the map, where faster estimationtechniques may have been used. On the projection existence task,users were on average faster using BAMS, though not significantly.Some users noted that browser proficiency allowed them to navi-gate HTML results faster than in the map tool. Many strategies byusers in BAMS could be enabled by further feature development inthe map interface.
We were encouraged that surveyed users found our map, on av-erage, easier and more fun to use than BAMS. We believe a hybridapproach with textual and visual result-sets may help users performfaster and more accurately on each query task.
REFERENCES
[1] J. W. Bohland, C. Wu, et al. A proposal for a coordinated effort for
the determination of brainwide neuroanatomical connectivity in model
organisms at a mesoscopic scale. PLoS Comput Biol, 5(3):e1000334,
03 2009.
[2] M. Bota, H.-W. Dong, and L. Swanson. Brain Architecture Manage-
ment System. Neuroinformatics, 3(1):15–48, 2005.
[3] S. Bruckner, V. Solteszova, E. Groller, et al. Braingazer – visual queries
for neurobiology research. IEEE Transactions on Visualization and
Computer Graphics, 15:1497–1504, 2009.