franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

121
Non-unitary syntheses of systematic knowledge Pleas e @taxonbyte s Nico Franz School of Life Sciences, Arizona State University CIRSS Seminar – Center for Informatics Research in Science and Scholarship February 17, 2017 iSchool, University of Illinois Urbana-Champaign @ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic- knowledge

Upload: taxonbytes

Post on 16-Apr-2017

91 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Non-unitary synthesesof systematic knowledge

Please

@taxonbytes

Nico Franz

School of Life Sciences, Arizona State University

CIRSS Seminar – Center for Informatics Research in Science and Scholarship

February 17, 2017 – iSchool, University of Illinois Urbana-Champaign

@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge

Page 2: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

Page 3: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

Page 4: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

doi:10.1038/nature.2016.20567

Page 5: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Page 6: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included."

doi:10.1038/nmicrobiol.2016.48

Page 7: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The pluralistic domain of human taxonomy making

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

Page 8: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.

• They consist of sets of labels, data, and theories about the natural world.

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

Page 9: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.

• They consist of sets of labels, data, and theories about the natural world.

• Over time, these theories change – converge or conflict (often in parallel).

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

Page 10: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

A model to separate the human-made versus natural domains• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa –

which 'took' millions of years to realize – tend to not change much.

Domain of human taxonomy making("mimic")

Page 11: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.

Natural domain ("model")

A model to separate the human-made versus natural domains

Domain of human taxonomy making("mimic")

Page 12: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.

• At any time, our labels and theories (concepts) aim to stand for taxa; yet the correspondence may be approximate.

Reliable?

Reliable?

Reliable?

A model to separate the human-made versus natural domains

Natural domain ("model")

Domain of human taxonomy making("mimic")

Page 13: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Remsen: Using names, we're lucky when revisions are infrequent

"In biology, there are many taxa that are so under-studied thatthey are only known from their original description and

none or very few subsequent references […].

The name alone, so long as it is a unique name,is sufficient to locate all related material."

– David Remsen 2016: 213

Source: Remsen. 2016. The use and limits of scientific names in biological informatics. doi:10.3897/zookeys.550.9546

Page 14: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

Page 16: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The challenge: names refer to non-type specimens contingently

Source: Dubois. 2005. Zoosystema 27: 365-426. http://sciencepress.mnhn.fr/sites/default/files/articles/pdf/z2005n2a8.pdf

Names

Non-types

Page 17: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

doi:10.1017/S1477200003001063

Page 18: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

4: Amauris (Amaura) (damocles) hyalites makuyuensis Carcasson (1964) sec. Vane-Wright (2003)genus superspecies subspecies subgenus semispecies

Page 19: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Oscillating meanings of the species epithet hyalites – 1911 to 2003

Phenotypic diversityTy

pe-a

ncho

red

nam

e id

entit

y re

latio

ns

Narrowest holotype "region"

Page 20: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Connecting to the occurrence level.

Page 21: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Holdings, May 2015• 27 herbarium collections• 607,300 occurrences• 17,300 species-level units

sernecportal.org

Introducing the SERNEC portal (sustained by Symbiota)

Page 22: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Andropogon glomeratus- Bushy bluestem

Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755

Ok. SERNEC search!

Page 23: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Search for "Andropogon glomeratus" returns 255 occurrences1

Source herbaria: 9Year collected: 1885-2013Year identified: 1973-2010Identifier named: 161 occ.

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

Page 24: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Isn't that one similar to virginicus?

Page 25: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Source herbaria: 13Year collected: 1873-2013Year identified: 1973-2015Identifier named: 200 occ.

Search for "Andropogon virginicus" returns 442 occurrences1

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

Page 26: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

What about the nominal subspecies?

Page 27: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Source herbaria: 6Year collected: 1920-2013Year identified: 2003Identifier named: 66 occ.

Search for "A. virginicus var. virginicus" returns 101 occurrences1

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

Page 28: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

I believe some Floras recognize capillipes.

Page 29: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Source herbaria: 5Year collected: 1940-2006Year identified: 1986Identifier named: 1 occ.

Search for "Andropogon capillipes" returns 72 occurrences1

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

Page 30: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Show four-in-one occurrence-based maps.

Page 31: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Combined four-in-one search returns 769 occurrences1

Source herbaria: 13Year collected: 1873-2013Year identified: 1973-2015Identifier named: 407

1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.

Page 32: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Ready to do science?

Maybe. There are some issues.

Page 33: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Taxonomic concept alignment, Andropogon glomeratus-virginicus complex, spanning across 11 classifications authored 1889-2015

• 36 unique taxonomic names

• 88 taxonomic concept labels name sec. author strings

• Alignment by A.S. Weakley row position = congruence

• 1/36 names with unique 1 : 1 name : meaning cardinality across all classifications

• Andropogon virginicus

• Source: Franz et al. 20161

1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web Journal (IOS). doi:10.3233/SW-160220

Page 34: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Also: This is how we built this.(provenance tracking)

Page 35: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"When I first came here, this was all swamp. Everyone said I was daft to build a castle on a swamp, but I built it all the same, just to show them."

Page 36: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"It sank into the swamp."

Page 37: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"So I built a second one."

Page 38: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"That sank into the swamp."

Page 39: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"So I built a third."

Page 40: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"That burned down, fell over, then sank into the swamp."

Page 41: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"But the fourth one stayed up. And that's what you're going to get, Lad, the strongest castle in all of England."

Page 42: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

Page 43: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Why Euler/X?

Page 44: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Concepts: tracking progress and conflict in the human domain• Taxonomic names and nomenclatural relationships are only so-so in terms of

tracking congruent and incongruent taxonomic perspectives.

Page 45: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Taxonomic names and nomenclatural relationships are only so-so in terms of tracking congruent and incongruent taxonomic perspectives.

• Logic-based multi-taxonomic alignments require better contextualization of labels and relationships, and better specification of "taxonomic sameness".

1912 vs. 1967Logically

reconcilable?

Δ = ?Δ

Δ

Δ

Concepts: tracking progress and conflict in the human domain

Page 46: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Querying systematic advancement – premises & questions

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

Page 47: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

Querying systematic advancement – premises & questions

Page 48: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"

Querying systematic advancement – premises & questions

Page 49: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

Querying systematic advancement – premises & questions

Page 50: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

3. "How can an applied comparative study tied to one (earlier) hierarchybe "updated" (integrated) with another (later) hierarchy?"

Querying systematic advancement – premises & questions

Page 51: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

3. "How can an applied comparative study tied to one (earlier) hierarchybe "updated" (integrated) with another (later) hierarchy?"

Service We can prioritize research agendas accordingly.

Service Sampling an issue? Or are signals complementary?

Service Effects of "systematic variable" on conclusions can be controlled for.

Querying systematic advancement – premises & questions

Page 52: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

An update on Euler/X:

Logic, use cases, and novel services

Page 53: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Euler/X – logically consistent RCC–5 alignments

• Input: multiple taxonomies and/or phylogenies; expert-provided articulations.

• Output: logic consistency checking; Maximally Informative Relations (MIR); alignment visualizations.

Page 54: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Products – concept taxonomy in theory and in practice ZooKeys. doi:10.3897/zookeys.528.6001

Semantic Web. doi:10.3233/SW-160220

Biological Theory. doi:10.1007/s13752-017-0259-5

PloS ONE. doi:10.1371/journal.pone.0118247

Systematics Biodiv. doi:10.1080/14772000.2013.806371

Systematic Biology. doi:10.1093/sysbio/syw023

Biodiversity Data Journal. doi:10.3897/BDJ.5.e10469 Research Ideas and Outcomes. doi: 10.3897/rio.2.e10610

Page 55: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (set constraints)

== < > >< !• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

Page 56: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (set constraints)

== < > >< !• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

• RCC–5 articulations answer the query: "can we join regions N and M?"

• Taxonomies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens.

Page 57: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use cases – primate classifications & avian phylogenies

1. Primate classifications sec. MSW2 (1993) versus MSW3 (2005)

a. Microcebus + Mirza sec. MSW3 (2005) with coverage constraint

b. Quantifying name (identifier) reliability

c. Reasoning achieves scalability (matrix)

2. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)

a. Psittaciformes with & without coverage

b. Alignment of the "Neoavian explosion"

Page 58: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use case 1:

Two primate classifications –

MSW2 (1993) versus MSW3 (2005)

Starts with a live demo.

Page 59: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

"Taxonomic concept labels"identify input concept regions

RCC–5 articulations providedfor each species-level concept

• Input visualization: MSW3 (2005) versus MSW2 (1993)

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

Page 60: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Alignment visualization: "grey means taxonomically congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 61: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

One name &congruent region

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 62: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

One name &congruent region

Many names &congruent region

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 63: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 64: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 65: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 66: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Application of coverage constraint: parent-to-parent articulations (><) are fully defined by alignment signal propagated from their respective children.

Sensible when complete sampling of children is intended.

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 67: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use case 1.b.: Quantifying name (identifier) reliability

One name &congruent region

• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

Page 68: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

One name &congruent region

• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence is not only caused by differential low-level sampling; (3) alignment constitutes a taxonomic meaning integration map to navigate across MSW3 & MSW2.

Use case 1.b.: Quantifying name (identifier) reliability

Page 69: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

1 in 3 names is unreliable across MSW2/MSW3 classifications

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

Page 70: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use case 1.c.: Reasoning achieves scalability (MIR matrix)

Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf

• Input: 402 articulations. Output: 153,111 Maximally Informative Relations

Salmon cells↔ reasoning

Page 71: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use case 2:

Avian phylogenies sec. Prum et al. (2015)

versus Jarvis et al. (2014)

Page 72: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638

2015 2014

Phylogenetic inferencescan vary over time.

Page 73: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use case 2: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)

• Sampling is highly differential: 198 versus 48 species-level entities• Only 12 species-level concept pairs are congruent [green cells]

Page 74: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with global coverage constraint

Input visualization Only disjoint articulations

Page 75: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Psittaciformes sec. 2015 – with global coverage constraint• No low-level congruence ↔ no congruent alignment regions

Input visualization Only disjoint articulations

Alignment visualization 108 MIR; all disjoint

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 76: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Psittaciformes sec. 2015 – with coverage locally relaxed

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 77: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 78: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]

• Allows for 3 congruent & 7 inclusive RCC–5 articulations

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 79: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 80: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Additional 2015 low-level sampling

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 81: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Use case 2.b.: Alignment of the "Neoavian explosion"

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Page 82: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within2015.Paleognathae Non-congruence within

2014.Pelecanimorphae

Use case 2.b.: Alignment of the "Neoavian explosion"

Page 83: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within2015/2014.Neoaves

(see next slide)

Use case 2.b.: Precise semiotics for the "avian explosion"

Page 84: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Neoaves sec. 2015/2014, and 3–4 less inclusive levels

26 overlapping articulations in the sub- Neoavian alignment region cannot be assigned to differential sampling 'Genuine' phylogenetic conflict

Use case 2.b.: Precise semiotics for the "avian explosion"

Page 85: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

Page 86: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Largely derived from doi:10.3897/rio.2.e10610

91dd0ee1-8a37-4efc-85b7-8176874cf5be

Page 87: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Thesis: Unitary hierarchies create mistrust in aggregated data

91dd0ee1-8a37-4efc-85b7-8176874cf5be

• Many aggregators are designed to impose a single taxonomic hierarchy –

one at a time – onto all taxonomically annotated records.

Page 88: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

91dd0ee1-8a37-4efc-85b7-8176874cf5be

• Many aggregators are designed to impose a single taxonomic hierarchy –

one at a time – onto all taxonomically annotated records.

• By design, these "backbones" are rarely attributable to individual (expert)

authors, but instead are newly created systematic theories that only appear

at the system level.

Thesis: Unitary hierarchies create mistrust in aggregated data

Page 89: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

91dd0ee1-8a37-4efc-85b7-8176874cf5be

• Many aggregators are designed to impose a single taxonomic hierarchy –

one at a time – onto all taxonomically annotated records.

• By design, these "backbones" are rarely attributable to individual (expert)

authors, but instead are newly created systematic theories that only appear

at the system level.

• Data are aggregated accordingly; yet backbone-driven modifications may

newly disrupt the original integrity of submitted data packages.

Thesis: Unitary hierarchies create mistrust in aggregated data

Page 90: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

91dd0ee1-8a37-4efc-85b7-8176874cf5be

• Many aggregators are designed to impose a single taxonomic hierarchy –

one at a time – onto all taxonomically annotated records.

• By design, these "backbones" are rarely attributable to individual (expert)

authors, but instead are newly created systematic theories that only appear

at the system level.

• Data are aggregated accordingly; yet backbone-driven modifications may

newly disrupt the original integrity of submitted data packages.

• By deflecting on responsibilities, aggregators may cause additional self-harm.

Ultimately, the power balance – as presently built in – must shift to bring

experts back into the process of licensing succinct, trustworthy data

packages.

Thesis: Unitary hierarchies create mistrust in aggregated data

Page 92: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Taxonomic views of a frequently revised organismal lineage

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")

Page 93: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Snapshot of a more frequently revised organismal lineage

• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")

• Vertical sections identify taxonomic concept regions

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 94: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Snapshot of a more frequently revised organismal lineage

• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")

• Vertical sections identify taxonomic concept regions

• Colors identify lineages of taxonomic names (epithets) in use

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 95: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Snapshot of a more frequently revised organismal lineage

• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids)

• Vertical sections identify taxonomic concept regions

• Colors identify lineages of taxonomic names (epithets) in use

• There is no consensus! Five incongruent schemata are used concurrently

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 96: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Further diagnosis:

If incongruent taxonomies are endorsed– locally, provisionally, and democratically –

then what is the impact foraggregated biodiversity data?

Page 97: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Further diagnosis:

Taxonomy becomes a variable that we need to represent,

and thereby control for (at the system level)

Page 98: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus'

• Query: "Where do these orchid species occur?"

• Same set of 250 orchid specimens, according to 4 taxonomies.

"Contr

olling

the t

axonom

ic var

iable" Example: the Cleistes use case

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 99: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

"Contr

olling

the t

axonom

ic var

iable"

• Query: "Where do these orchid species occur?"

• Same set of 250 orchid specimens, according to 4 taxonomies.

Example: the Cleistes use case

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 100: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

The (formerly) federal 'standard'"C

ontr

olling

the t

axonom

ic var

iable"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 101: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 102: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

Expert views are in conflict

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 103: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

Expert views are in conflict

"Just bad"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 104: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora

Impact:Name-based aggregation has created

a novel synthesis that nobody believes in

"Contr

olling

the t

axonom

ic var

iable"

"Just bad"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 105: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

"Just bad"

Expert views are in conflict

Solution:Instead of aggregating

an artificial 'consensus', …

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 106: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

"Just bad"

Expert views are reconciled

Solution:Instead of aggregating

an artificial 'consensus',build translation services

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 107: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Challenges:

How can we redesign aggregation to yieldhigh-quality biodiversity data packages?

Page 108: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Challenges:

How can we redesign aggregation to yieldhigh-quality biodiversity data packages?

What does this mean for Darwin Core1

and how we use this aggregation standard?

1 Wieczorek et al. 2012. Darwin Core: an evolving […]. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715

Page 109: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Preview of solution with eight steps

• DwC is insufficient, and part of the problem

Page 110: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

# 5: Identify occurrence records only to TCLs

Records: EKY39235 MTSU003611 NCSC00040204 …

Records: BOON8098 CLEMS0061133 WILLI39399 …

Records: GMUF-0039355 IBE006808 USCH58399 …

Records: CONV0006268 MDKY00006482 NCU00038930 …

Records: BRYV0023582, BRYV0023584 KHD00032030, MISS0016604 MMNS000227, NCSC00040206 USMS_000002923, USMS_000002924 VSC0053223, VSC0065528 …

Records: ARIZ393087 DBG39049 USCH51217 …

Records: NCU00040710 USCH96248 VSC0053218 …

Records: CLEMS0012881 FUGR0003293 GA023130 …

Records: BOON8100 NCSC00040210 SJNM45487 …

Records: GA023144 LSU00012494 MISS0016608 …

Records: IBE006810, IND-0012374, MMNS000227

Records: NY8654

• Syntax (ID): Occurrence / organism is identified to TCL

"CLEMS0012881"is identified to

Cleistes divaricata sec. Smith et al. 2004

[additional ID metadata]

Page 111: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

# 6: Generate comprehensive, consistent RCC–5 alignments

• Euler/X is a toolkit that infers logically consistent RCC–5 alignments

Page 112: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

# 6: Generate comprehensive, consistent RCC–5 alignments

• Valued-added: MIR – set of Maximally Informative Relations containing

the RCC–5 articulation for every possible TCL pair scalability

Reasoner inference

Page 113: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

# 7: Joining occurrence-to-TCL identifications & RCC–5 alignments

Records: BOON8098, CLEMS0061133, CONV0006268, EKY39235 GMUF-0039355, IBE006808, IBE006810, IND-0012374 MDKY00006482, MMNS000227, MTSU003611, NCSC00040204 NCU00038930, NY8654, USCH58399, WILLI39399 …

Records: ARIZ393087, BRYV0023582, BRYV0023584, DBG39049 KHD00032030, MISS0016604, MMNS00022, NCSC00040206 USMS_000002923, USMS_000002924, VSC0053223, VSC0065528 …

Records: BOON8100, CLEMS0012881, FUGR0003293 GA023130, GA023144, LSU00012494 MISS0016608, NCSC00040210, NCU00040710 SJNM45487, USCH96248, VSC0053218 …

• Specimen integration is fully driven by TCL-to-TCL RCC–5 signals

Page 114: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

The 'consensus' The 'bible'

The (formerly) federal 'standard'

The 'best', latest regional flora"C

ontr

olling

the t

axonom

ic var

iable"

Impact:"Please select your preference (A – D);

we can perform all translations"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

Page 115: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• We can now respond to queries such as:

• "Show all specimens identified to the taxonomic name Cleistes divaricata"

• Returns many records resolves incongruent lineage of name usages

# 8: "Do you trust us now?" Aggregation as a translational service

Page 116: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• We can now respond to queries such as:

• "Show all specimens identified to the taxonomic name Cleistes divaricata"

• Returns many records resolves incongruent lineage of name usages

• "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015"

• Returns record subset resolving only one narrowly circumscribed concept

# 8: "Do you trust us now?" Aggregation as a translational service

Page 117: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

# 8: "Do you trust us now?" Aggregation as a translational service

• We can now respond to queries such as:

• "Show all specimens identified to the taxonomic name Cleistes divaricata"

• Returns many records resolves incongruent lineage of name usages

• "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015"

• Returns record subset resolving only one narrowly circumscribed concept

• "Now show specimens identified to the TCL Cleistes divaricata sec. RAB 1968,

yet translated into the more granular TCLs sec. Weakley 2015"

• Returns (again) many records, yet represents and contrasts two treatments,

as opposed to providing the ambiguous lineage view (above)

• "Show all specimens with ambiguous 2010/2015 TCL identifications…" (etc.)

Page 118: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

• Why are phylogenies and classifications (so) unstable?

• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?

• Introducing the Euler/X alignment tool

• The primate use case (classifications)

• The avian use case (phylogenies)

• Biodiversity data aggregation

• Implications of achieving synthesis (CSCW..)

Overview

Page 119: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

"As the ongoing efforts to integrate prokaryote phylogenyinto universal phylogeny demonstrate, integration does not alwaysmean greater inclusiveness of data, methods or explanation […].

Integration may involve considerable exclusivenessto achieve the desired integrative aim.

– Maureen O'Malley 2013: 559

Source: O'Malley. 2013. When integration fails. Stud. Hist. Philos. Biol. Biomed. Sci. doi:10.1016/j.shpsc.2012.10.003

Rethinking systematic synthesis as an agreed-upon conflict alignment

Page 120: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Acknowledgements & links to products and references

• CIRSS hosts: Bertram L. & Janet Eke!

• Euler/X & ETC teams (extended): Shawn Bowers, Mingmin Chen, Hong Cui, Parisa Kianmajd, James Macklin, Timothy McPhillips, Robert Morris, Thomas Rodenhausen, and Shizhuo Yu.

• ProvenanceMatrix: Tuan Nhon Dang.

• NSF DEB–1155984, DBI–1342595 (PI Franz).

• NSF IIS–118088, DBI–1147273 (PI Ludäscher).

• Information @ http://taxonbytes.org/tag/concept-taxonomy/

• Euler/X code @ https://github.com/EulerProject/EulerX

Page 121: Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge

Interested in exploringmulti-taxonomy & -

phylogeny alignments?Please contact us.

[email protected]@taxonbytes

https://biokic.asu.edu/