franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Post on 16-Apr-2017

92 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Non-unitary synthesesof systematic knowledge

Please

@taxonbytes

Nico Franz

School of Life Sciences, Arizona State University

CIRSS Seminar – Center for Informatics Research in Science and Scholarship

February 17, 2017 – iSchool, University of Illinois Urbana-Champaign

@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

doi:10.1038/nature.2016.20567

"Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included."

doi:10.1038/nmicrobiol.2016.48

The pluralistic domain of human taxonomy making

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.

• They consist of sets of labels, data, and theories about the natural world.

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.

• They consist of sets of labels, data, and theories about the natural world.

• Over time, these theories change – converge or conflict (often in parallel).

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

A model to separate the human-made versus natural domains• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa –

which 'took' millions of years to realize – tend to not change much.

Domain of human taxonomy making("mimic")

• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.

Natural domain ("model")

A model to separate the human-made versus natural domains

Domain of human taxonomy making("mimic")

• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.

• At any time, our labels and theories (concepts) aim to stand for taxa; yet the correspondence may be approximate.

Reliable?

Reliable?

Reliable?

A model to separate the human-made versus natural domains

Natural domain ("model")

Domain of human taxonomy making("mimic")

Remsen: Using names, we're lucky when revisions are infrequent

"In biology, there are many taxa that are so under-studied thatthey are only known from their original description and

none or very few subsequent references […].

The name alone, so long as it is a unique name,is sufficient to locate all related material."

– David Remsen 2016: 213

Source: Remsen. 2016. The use and limits of scientific names in biological informatics. doi:10.3897/zookeys.550.9546

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

The challenge: names refer to non-type specimens contingently

Source: Dubois. 2005. Zoosystema 27: 365-426. http://sciencepress.mnhn.fr/sites/default/files/articles/pdf/z2005n2a8.pdf

Names

Non-types

doi:10.1017/S1477200003001063

4: Amauris (Amaura) (damocles) hyalites makuyuensis Carcasson (1964) sec. Vane-Wright (2003)genus superspecies subspecies subgenus semispecies

Oscillating meanings of the species epithet hyalites – 1911 to 2003

Phenotypic diversityTy

pe-a

ncho

red

nam

e id

entit

y re

latio

ns

Narrowest holotype "region"

Connecting to the occurrence level.

Holdings, May 2015• 27 herbarium collections• 607,300 occurrences• 17,300 species-level units

sernecportal.org

Introducing the SERNEC portal (sustained by Symbiota)

Andropogon glomeratus- Bushy bluestem

Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755

Ok. SERNEC search!

Search for "Andropogon glomeratus" returns 255 occurrences1

Source herbaria: 9Year collected: 1885-2013Year identified: 1973-2010Identifier named: 161 occ.

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

Isn't that one similar to virginicus?

Source herbaria: 13Year collected: 1873-2013Year identified: 1973-2015Identifier named: 200 occ.

Search for "Andropogon virginicus" returns 442 occurrences1

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

What about the nominal subspecies?

Source herbaria: 6Year collected: 1920-2013Year identified: 2003Identifier named: 66 occ.

Search for "A. virginicus var. virginicus" returns 101 occurrences1

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

I believe some Floras recognize capillipes.

Source herbaria: 5Year collected: 1940-2006Year identified: 1986Identifier named: 1 occ.

Search for "Andropogon capillipes" returns 72 occurrences1

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

Show four-in-one occurrence-based maps.

Combined four-in-one search returns 769 occurrences1

Source herbaria: 13Year collected: 1873-2013Year identified: 1973-2015Identifier named: 407

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

Ready to do science?

Maybe. There are some issues.

Taxonomic concept alignment, Andropogon glomeratus-virginicus complex, spanning across 11 classifications authored 1889-2015

• 36 unique taxonomic names

• 88 taxonomic concept labels name sec. author strings

• Alignment by A.S. Weakley row position = congruence

• 1/36 names with unique 1 : 1 name : meaning cardinality across all classifications

• Andropogon virginicus

• Source: Franz et al. 20161

1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web Journal (IOS). doi:10.3233/SW-160220

Also: This is how we built this.(provenance tracking)

"When I first came here, this was all swamp. Everyone said I was daft to build a castle on a swamp, but I built it all the same, just to show them."

"It sank into the swamp."

"So I built a second one."

"That sank into the swamp."

"So I built a third."

"That burned down, fell over, then sank into the swamp."

"But the fourth one stayed up. And that's what you're going to get, Lad, the strongest castle in all of England."

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

Why Euler/X?

Concepts: tracking progress and conflict in the human domain• Taxonomic names and nomenclatural relationships are only so-so in terms of

tracking congruent and incongruent taxonomic perspectives.

• Taxonomic names and nomenclatural relationships are only so-so in terms of tracking congruent and incongruent taxonomic perspectives.

• Logic-based multi-taxonomic alignments require better contextualization of labels and relationships, and better specification of "taxonomic sameness".

1912 vs. 1967Logically

reconcilable?

Δ = ?Δ

Δ

Δ

Concepts: tracking progress and conflict in the human domain

Querying systematic advancement – premises & questions

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

Querying systematic advancement – premises & questions

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"

Querying systematic advancement – premises & questions

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

Querying systematic advancement – premises & questions

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

3. "How can an applied comparative study tied to one (earlier) hierarchybe "updated" (integrated) with another (later) hierarchy?"

Querying systematic advancement – premises & questions

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

3. "How can an applied comparative study tied to one (earlier) hierarchybe "updated" (integrated) with another (later) hierarchy?"

Service We can prioritize research agendas accordingly.

Service Sampling an issue? Or are signals complementary?

Service Effects of "systematic variable" on conclusions can be controlled for.

Querying systematic advancement – premises & questions

An update on Euler/X:

Logic, use cases, and novel services

Euler/X – logically consistent RCC–5 alignments

• Input: multiple taxonomies and/or phylogenies; expert-provided articulations.

• Output: logic consistency checking; Maximally Informative Relations (MIR); alignment visualizations.

Products – concept taxonomy in theory and in practice ZooKeys. doi:10.3897/zookeys.528.6001

Semantic Web. doi:10.3233/SW-160220

Biological Theory. doi:10.1007/s13752-017-0259-5

PloS ONE. doi:10.1371/journal.pone.0118247

Systematics Biodiv. doi:10.1080/14772000.2013.806371

Systematic Biology. doi:10.1093/sysbio/syw023

Biodiversity Data Journal. doi:10.3897/BDJ.5.e10469 Research Ideas and Outcomes. doi: 10.3897/rio.2.e10610

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (set constraints)

== < > >< !• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (set constraints)

== < > >< !• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

• RCC–5 articulations answer the query: "can we join regions N and M?"

• Taxonomies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens.

Use cases – primate classifications & avian phylogenies

1. Primate classifications sec. MSW2 (1993) versus MSW3 (2005)

a. Microcebus + Mirza sec. MSW3 (2005) with coverage constraint

b. Quantifying name (identifier) reliability

c. Reasoning achieves scalability (matrix)

2. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)

a. Psittaciformes with & without coverage

b. Alignment of the "Neoavian explosion"

Use case 1:

Two primate classifications –

MSW2 (1993) versus MSW3 (2005)

Starts with a live demo.

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

"Taxonomic concept labels"identify input concept regions

RCC–5 articulations providedfor each species-level concept

• Input visualization: MSW3 (2005) versus MSW2 (1993)

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

• Alignment visualization: "grey means taxonomically congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

One name &congruent region

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

One name &congruent region

Many names &congruent region

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Application of coverage constraint: parent-to-parent articulations (><) are fully defined by alignment signal propagated from their respective children.

Sensible when complete sampling of children is intended.

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Use case 1.b.: Quantifying name (identifier) reliability

One name &congruent region

• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

One name &congruent region

• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence is not only caused by differential low-level sampling; (3) alignment constitutes a taxonomic meaning integration map to navigate across MSW3 & MSW2.

Use case 1.b.: Quantifying name (identifier) reliability

1 in 3 names is unreliable across MSW2/MSW3 classifications

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

Use case 1.c.: Reasoning achieves scalability (MIR matrix)

Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf

• Input: 402 articulations. Output: 153,111 Maximally Informative Relations

Salmon cells↔ reasoning

Use case 2:

Avian phylogenies sec. Prum et al. (2015)

versus Jarvis et al. (2014)

Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638

2015 2014

Phylogenetic inferencescan vary over time.

Use case 2: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)

• Sampling is highly differential: 198 versus 48 species-level entities• Only 12 species-level concept pairs are congruent [green cells]

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with global coverage constraint

Input visualization Only disjoint articulations

• Psittaciformes sec. 2015 – with global coverage constraint• No low-level congruence ↔ no congruent alignment regions

Input visualization Only disjoint articulations

Alignment visualization 108 MIR; all disjoint

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with coverage locally relaxed

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]

• Allows for 3 congruent & 7 inclusive RCC–5 articulations

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Additional 2015 low-level sampling

Use case 2.a.: Psittaciformes with & without coverage constraint

Use case 2.b.: Alignment of the "Neoavian explosion"

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within2015.Paleognathae Non-congruence within

2014.Pelecanimorphae

Use case 2.b.: Alignment of the "Neoavian explosion"

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within2015/2014.Neoaves

(see next slide)

Use case 2.b.: Precise semiotics for the "avian explosion"

• Neoaves sec. 2015/2014, and 3–4 less inclusive levels

26 overlapping articulations in the sub- Neoavian alignment region cannot be assigned to differential sampling 'Genuine' phylogenetic conflict

Use case 2.b.: Precise semiotics for the "avian explosion"

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

Largely derived from doi:10.3897/rio.2.e10610

91dd0ee1-8a37-4efc-85b7-8176874cf5be

Thesis: Unitary hierarchies create mistrust in aggregated data

91dd0ee1-8a37-4efc-85b7-8176874cf5be

• Many aggregators are designed to impose a single taxonomic hierarchy –

one at a time – onto all taxonomically annotated records.

91dd0ee1-8a37-4efc-85b7-8176874cf5be

• Many aggregators are designed to impose a single taxonomic hierarchy –

one at a time – onto all taxonomically annotated records.

• By design, these "backbones" are rarely attributable to individual (expert)

authors, but instead are newly created systematic theories that only appear

at the system level.

Thesis: Unitary hierarchies create mistrust in aggregated data

91dd0ee1-8a37-4efc-85b7-8176874cf5be

• Many aggregators are designed to impose a single taxonomic hierarchy –

one at a time – onto all taxonomically annotated records.

• By design, these "backbones" are rarely attributable to individual (expert)

authors, but instead are newly created systematic theories that only appear

at the system level.

• Data are aggregated accordingly; yet backbone-driven modifications may

newly disrupt the original integrity of submitted data packages.

Thesis: Unitary hierarchies create mistrust in aggregated data

91dd0ee1-8a37-4efc-85b7-8176874cf5be

• Many aggregators are designed to impose a single taxonomic hierarchy –

one at a time – onto all taxonomically annotated records.

• By design, these "backbones" are rarely attributable to individual (expert)

authors, but instead are newly created systematic theories that only appear

at the system level.

• Data are aggregated accordingly; yet backbone-driven modifications may

newly disrupt the original integrity of submitted data packages.

• By deflecting on responsibilities, aggregators may cause additional self-harm.

Ultimately, the power balance – as presently built in – must shift to bring

experts back into the process of licensing succinct, trustworthy data

packages.

Thesis: Unitary hierarchies create mistrust in aggregated data

Taxonomic views of a frequently revised organismal lineage

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")

Snapshot of a more frequently revised organismal lineage

• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")

• Vertical sections identify taxonomic concept regions

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Snapshot of a more frequently revised organismal lineage

• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")

• Vertical sections identify taxonomic concept regions

• Colors identify lineages of taxonomic names (epithets) in use

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Snapshot of a more frequently revised organismal lineage

• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids)

• Vertical sections identify taxonomic concept regions

• Colors identify lineages of taxonomic names (epithets) in use

• There is no consensus! Five incongruent schemata are used concurrently

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Further diagnosis:

If incongruent taxonomies are endorsed– locally, provisionally, and democratically –

then what is the impact foraggregated biodiversity data?

Further diagnosis:

Taxonomy becomes a variable that we need to represent,

and thereby control for (at the system level)

The 'consensus'

• Query: "Where do these orchid species occur?"

• Same set of 250 orchid specimens, according to 4 taxonomies.

"Contr

olling

the t

axonom

ic var

iable" Example: the Cleistes use case

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

The 'consensus' The 'bible'

"Contr

olling

the t

axonom

ic var

iable"

• Query: "Where do these orchid species occur?"

• Same set of 250 orchid specimens, according to 4 taxonomies.

Example: the Cleistes use case

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

The 'consensus' The 'bible'

The (formerly) federal 'standard'"C

ontr

olling

the t

axonom

ic var

iable"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

Expert views are in conflict

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

Expert views are in conflict

"Just bad"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora

Impact:Name-based aggregation has created

a novel synthesis that nobody believes in

"Contr

olling

the t

axonom

ic var

iable"

"Just bad"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

"Just bad"

Expert views are in conflict

Solution:Instead of aggregating

an artificial 'consensus', …

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

"Just bad"

Expert views are reconciled

Solution:Instead of aggregating

an artificial 'consensus',build translation services

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Challenges:

How can we redesign aggregation to yieldhigh-quality biodiversity data packages?

Challenges:

How can we redesign aggregation to yieldhigh-quality biodiversity data packages?

What does this mean for Darwin Core1

and how we use this aggregation standard?

1 Wieczorek et al. 2012. Darwin Core: an evolving […]. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715

Preview of solution with eight steps

• DwC is insufficient, and part of the problem

# 5: Identify occurrence records only to TCLs

Records: EKY39235 MTSU003611 NCSC00040204 …

Records: BOON8098 CLEMS0061133 WILLI39399 …

Records: GMUF-0039355 IBE006808 USCH58399 …

Records: CONV0006268 MDKY00006482 NCU00038930 …

Records: BRYV0023582, BRYV0023584 KHD00032030, MISS0016604 MMNS000227, NCSC00040206 USMS_000002923, USMS_000002924 VSC0053223, VSC0065528 …

Records: ARIZ393087 DBG39049 USCH51217 …

Records: NCU00040710 USCH96248 VSC0053218 …

Records: CLEMS0012881 FUGR0003293 GA023130 …

Records: BOON8100 NCSC00040210 SJNM45487 …

Records: GA023144 LSU00012494 MISS0016608 …

Records: IBE006810, IND-0012374, MMNS000227

Records: NY8654

• Syntax (ID): Occurrence / organism is identified to TCL

"CLEMS0012881"is identified to

Cleistes divaricata sec. Smith et al. 2004

[additional ID metadata]

# 6: Generate comprehensive, consistent RCC–5 alignments

• Euler/X is a toolkit that infers logically consistent RCC–5 alignments

# 6: Generate comprehensive, consistent RCC–5 alignments

• Valued-added: MIR – set of Maximally Informative Relations containing

the RCC–5 articulation for every possible TCL pair scalability

Reasoner inference

# 7: Joining occurrence-to-TCL identifications & RCC–5 alignments

Records: BOON8098, CLEMS0061133, CONV0006268, EKY39235 GMUF-0039355, IBE006808, IBE006810, IND-0012374 MDKY00006482, MMNS000227, MTSU003611, NCSC00040204 NCU00038930, NY8654, USCH58399, WILLI39399 …

Records: ARIZ393087, BRYV0023582, BRYV0023584, DBG39049 KHD00032030, MISS0016604, MMNS00022, NCSC00040206 USMS_000002923, USMS_000002924, VSC0053223, VSC0065528 …

Records: BOON8100, CLEMS0012881, FUGR0003293 GA023130, GA023144, LSU00012494 MISS0016608, NCSC00040210, NCU00040710 SJNM45487, USCH96248, VSC0053218 …

• Specimen integration is fully driven by TCL-to-TCL RCC–5 signals

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

Impact:"Please select your preference (A – D);

we can perform all translations"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

• We can now respond to queries such as:

• "Show all specimens identified to the taxonomic name Cleistes divaricata"

• Returns many records resolves incongruent lineage of name usages

# 8: "Do you trust us now?" Aggregation as a translational service

• We can now respond to queries such as:

• "Show all specimens identified to the taxonomic name Cleistes divaricata"

• Returns many records resolves incongruent lineage of name usages

• "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015"

• Returns record subset resolving only one narrowly circumscribed concept

# 8: "Do you trust us now?" Aggregation as a translational service

# 8: "Do you trust us now?" Aggregation as a translational service

• We can now respond to queries such as:

• "Show all specimens identified to the taxonomic name Cleistes divaricata"

• Returns many records resolves incongruent lineage of name usages

• "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015"

• Returns record subset resolving only one narrowly circumscribed concept

• "Now show specimens identified to the TCL Cleistes divaricata sec. RAB 1968,

yet translated into the more granular TCLs sec. Weakley 2015"

• Returns (again) many records, yet represents and contrasts two treatments,

as opposed to providing the ambiguous lineage view (above)

• "Show all specimens with ambiguous 2010/2015 TCL identifications…" (etc.)

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

"As the ongoing efforts to integrate prokaryote phylogenyinto universal phylogeny demonstrate, integration does not alwaysmean greater inclusiveness of data, methods or explanation […].

Integration may involve considerable exclusivenessto achieve the desired integrative aim.

– Maureen O'Malley 2013: 559

Source: O'Malley. 2013. When integration fails. Stud. Hist. Philos. Biol. Biomed. Sci. doi:10.1016/j.shpsc.2012.10.003

Rethinking systematic synthesis as an agreed-upon conflict alignment

Acknowledgements & links to products and references

• CIRSS hosts: Bertram L. & Janet Eke!

• Euler/X & ETC teams (extended): Shawn Bowers, Mingmin Chen, Hong Cui, Parisa Kianmajd, James Macklin, Timothy McPhillips, Robert Morris, Thomas Rodenhausen, and Shizhuo Yu.

• ProvenanceMatrix: Tuan Nhon Dang.

• NSF DEB–1155984, DBI–1342595 (PI Franz).

• NSF IIS–118088, DBI–1147273 (PI Ludäscher).

• Information @ http://taxonbytes.org/tag/concept-taxonomy/

• Euler/X code @ https://github.com/EulerProject/EulerX

Interested in exploringmulti-taxonomy & -

phylogeny alignments?Please contact us.

nico.franz@asu.edu@taxonbytes

https://biokic.asu.edu/

top related