Trust and Epistemic Communities
in Biodiversity Data SharingNancy Van HouseSIMS, UC Berkeley
www.sims.berkeley.edu/~vanhouse
Trust and Epistemic Communities in Biodiversity Data Sharing
DLs: ready access to unpublished information by variety of users - crossing sociotechnical boundaries Raises issues of trust and credibility
Knowledge is social What we know, whom we believe is determined by/within
epistemic cultures Biodiversity data
Great variety of information, sources, purposes CalFlora: an example of a user-oriented DL
Incorporating users’ practices of trust and credibility Negotiating differences x epistemic cultures
Implications
DLs Facilitate Access To greater variety of information:
Unpublished (unreviewed) information “Raw” data such as reports of observations Information from outside own reference
group Problems:
Which info, sources do we believe? How do we evaluate info from unfamiliar sources? Which info do we use for what purposes?
By people from outside own reference group Inappropriate use of information? Burden on data owner of making data available, usable,
and understandable to reduce misuse
Examples of Risks– Botanical Information
Unreliable Info Erroneous, duplicative observations >> belief
that a species is prevalent >> not preserving a population of a rare species
Chasing after erroneous reported sighting of a rare species –or discounting significant sighting as amateur’s error
Inappropriate Use of Info Private landowners destroying specimens of a
rare plant to avoid legal limits on land development
Collectors (over-)collecting specimens of rare species
Knowledge is Social What we know comes primarily from others.
Cognitive efficiency: we don’t have time, resources Expertise: we don’t have sufficient knowledge in all
areas Have to decide whom we trust, what we
believe. What we consider “good“ work, whom we
believe and, how we decide are determined and learned in epistemic communities
DLs need to support the diverse practices of epistemic communities
Social Nature of Knowledge is of Concern in Many Areas
Science studies Inquires into the construction of scientific
knowledge & authority Social epistemology
Asks: How should the collective pursuit of knowledge be organized?
Situated action/learning Posits knowledge, action, identity, and
community to be mutually constituted Knowledge management
Is concerned with how to share knowledge
Cognitive Trust and DLs For people to use a DL:
Information must be credible Sources must be trustworthy DL itself must be perceived to be
trustworthy How can DLs be designed to:
Facilitate users’ assessments of trust and credibility of info and sources?
Demonstrate their own trustworthiness?
Epistemic Cultures “…those amalgams of arrangements
and mechanisms … which, in a given field, make up how we know what we know.”
“Epistemic cultures…create and warrant knowledge, and the premier knowledge institution throughout the world is, still, science.”
Karen Knorr-Cetina, Epistemic Cultures
Culture Context of history and on-going events Practice: how people actually do their
do-to-day work Artifacts
Info artifacts include documents, images, thesauri, classification systems
Diversity If all the same, no culture Including diversity x areas of science
Epistemic Cultures Differ Practices of work
Practices of trust Artifacts – e.g. genres Methods of data collection and analysis Meanings, interpretations, understandings Tacit knowledge and understandings Values Methods, standards, and information for
evaluating other participants’ work and values
Institutional arrangements
Communities and Knowledge
Becoming a member of a community of practice = identity learning practices, values, orientation to the
world We learn what to believe, whom to believe,
how to decide in epistemic communities. We tend to trust people from within our own
epistemic communities. Similar values, orientation, practices, standards Ability to assess their credibility
DLs and Epistemic Cultures DLs enable information to cross epistemic
communities. More easily, more often than before. Raw data, not just syntheses, analyses – e.g. publications
Crossing communities often undermines our practices of trust. Who are these people? How did they collect the data? What do they know? What are their goals, values, priorities?
DLs need to be designed to support practices of assessing trustworthiness and credibility.
Biodiversity Data Biodiversity: studies diversity of life and ecosystems
that maintain it Central question: change over space and time Uses large quantities of data that vary in:
Precision and accuracy Methods of data collection, description, storage
Old data particularly valuable Broad range of datasets: biological, geographical,
meteorological, geological… Created and used by different professions, disciplines,
types of institutions…for different purposes Politically, economically, sensitive data
“Citizen Science” Fine-grained data from observers in the
field Observers with varying levels and types
of expertise E.g., expert on an area, habitat, taxon…
Expert amateurs Private-public cooperation
Government agencies, environmental action groups, university herbaria, membership organizations, concerned individuals…
CalFlora
http://www.calflora.org Comprehensive web-accessible
database of plant distribution information for California
Independent non-profit organization Designed/managed by people from
botanical community, not librarians or technologists
Free In conjunction with UC Berkeley Digital
Library (http://elib.cs.berkeley.edu)
Researchers & prof’ls in land management Ready access to data for
Addressing critical issues in plant biodiversity Analyzing consequences of land use alternatives and
environmental change on distribution of native and exotic species
The public: promoting interest in biodiversity Active engagement in biodiversity issues/work Wildflowers as “charismatic”
CalFlora Target Users
CalFlora Priorities
Focus on people; put technology in the back seat
Pay attention to how the world works for the people who produce and use information
Honor existing traditions of data exchange
Botanists at Work
Components of Interest Today
CalPhotos
CalFlora Occurrence Database
CalPhotos In conjunction with the UC Berkeley Digital
Library Project http://elib.cs.berkeley.edu > 28,000 images of California plants
Approx. half of all Calif. species are represented Sources
Some institutions – e.g. Cal Academy of Sciences Many from “native plant enthusiasts” Currently accepting/soliciting contributions from
users Major reported uses
Plant identification Illustrations
CalFlora Occurrence Database
> 800,000 geo-referenced reports of observations Specimens in collections Reports from literature Reports from field Checklists
Sources 19 institutions About to begin accepting reports from
registered contributors via Internet
CalFlora Occurrence Database Users can
“Click through the map to underlying data” Download data for own analyses, tools
Uses Land management decisions Legally-mandated environmental reports (NEPA,
CEQA) Identify plants (though not designed for this)
Common analyses Which species are present in an area Which are common, which are rare Which species are restricted to a habitat affected by
proposed actions Analyze various species in combination, by geo
area
CalFlora Occurrence Database: Significance
Most comprehensive source by far (for Calif) Common as well as rare taxa
Biodiversity beginning to be interested in all populations, not just rare -- requires vastly more data
Data downloadable, manipulable Easy to use (for professionals, anyway) Remote access via Internet
E.g. botanist in remote National Forest… About to accept observations from “the public”
Source of valuable data re rare and esp’ly common species
Dilemmas and Conflicts Useful place to see tensions,
breakdowns, conflicts across epistemic cultures
Not whose right, wrong but underlying differences in values, priorities, practices, understandings
CalFlora Dilemmas Quality filtering: made centrally vs. pushed down
to user Inclusiveness of observations vs. selectivity Speed of additions vs. review, filtering Labelling data for quality vs. providing info for users
Access Benefits vs. dangers of wide access to information
Free vs. fee Cost recovery Discourage frivolous use
Who bears the costs? Externalities
Dilemmas, Cont. Institutional independence:
Autonomy, ability to be responsive to multiple stake-holder communities vs. security and credibility of institutional sponsorship
How (Some) Experts Assess Occurrence Reports
The evidence: Type of report (specimen, field observation,
list) Type of search (casual, directed)
The source: Personal knowledge of contributor’s expertise Examination of other contributions, same
person Annotations by trusted others
Ancillary conditions: Likelihood of that species appearing at that
time, habitat, geographical location Other, similar reports
How CalFlora Presents Occurrence Data
Links to data source(s) – personal and institutional
Compliance with institutional source’s requirements Fuzzed locations Links to institutional source’s caveats,
explanations Publicly-contributed observations
Info about observer Info about observation Annotations by experts
Contributor Registration Biography, credentials (free text) Expertise/interests (free text) Affiliation Contact info/web site “I will submit only my own observations of wild
plants. I realize that this system is only for first-hand reports about plants, native and introduced, that are growing without deliberate planting or cultivation.”
“I will…make sure I have the correct scientific name…I will submit uncertain identifications only if I believe them to be very important and time sensitive, and will label such reports ‘uncertain.’”
Contributor Registration (cont) Experience level (self-assessment; check one)
I am a professional biologist/botanist, or have professional training in botany.
Although I do not have formal credentials, I am recognized as a peer by professional botanists.
Although I do not consider myself to have professional-level knowledge, I am quite experienced in the use of keys and descriptions, and/or have expertise with the plants for which I will be submitting observations.
I do not have extensive experience or background in botany, but I am confident that I can accurately identify the plants for which I will be submitting observations.
Occurrence Form Species identification, habitat, location, date Method of identification
“I recognize …from prior determinations and experience” “I compared this plant with herbarium specimens” “I keyed this plant in a botanical reference” “I compared … with published taxonomic descriptions” “An expert reviewed and confirmed this identification”
Certainty of identification “I am confident of this identification, and submit this as a
positive observation.” “I am not certain of this identification but believe it to be a
significant observation and submit it here as an alert only.”
Annotations Herbarium practice: experts annotate
records with corrections, comments. CalFlora: registered experts can
annotate photos and occurrence records. Annotation by an expert raises the
credibility of a record. Actually – how often?
CalFlora Data and Trust Trusting data
Every observation trackable to source(s) Detailed info & contact info for source, observer Detailed info about observation Observations categorized by type Annotation
Trusting users NOT registering or charging users Respecting source’s limits, caveats on data Leaving quality decisions to the users
Trusting CalFlora Detailed list of contributing organizations, advisors NOT affiliated with another organization
Concerns CalFlora relies on record-by-record examination
Looking at methods of classifying records in collections CalFlora relies on voluntary contributions of data
Experts with lots of data and no time to contribute Well-meaning volunteers with time but not expertise
Users need to be able to track back to source of each record, each data point Concern about “modalities,” uncertainties being lost
Archiving Concern about dynamicism of CalFlora Stability of electronic media Stability of the organization
Delegating decisions about quality of observations to (inexpert) users
Implications for DLs, Other Info Systems
The social nature of knowledge We have to decide on whom we will depend We learn from others whom and what we can
depend on Information must be credible to be used The importance of culture in constituting
knowledge Practice, values, orientations…
Epistemic cultures differ Not simply a matter of experts vs. public
Therefore: DLs need to accommodate practices
Incl. practices of trust and credibility Users need to know provenance of data
Users differ and not just experts vs. nonexperts
DLs serve multiple, varied epistemic cultures Same person,multi cultures
Users need flexibility to accommodate the DL to their needs, practices Some users need decisions made for them
>> involvement of users in design
Implications for DL Creation and Management
Different epistemic cultures participate in the design and management of DLs, as well Librarians Technologists Various, differing user groups
Differences in practices, understandings, values >> differences in priorities and decisions
A continual process of negotiation and translation
`