franz et al evol 2016 representing phylogeny as a logically tractable variable

59
Representing phylogeny as a logically tractable variable Pleas e @taxonbyte s Nico M. Franz 1 , Guanyang Zhang 1 , Shizhuo Yu 2 & Bertram Ludäscher 3 1 School of Life Sciences, Arizona State University 2 Computer Science, University of California at Davis 3 iSchool, University of Illinois at Urbana-Champaign Systematics / Bioinformatics Session Evolution 2016 Meetings June 20, 2016 - Austin, Texas (#Evol2016) @ http ://www.slideshare.net/taxonbytes /franz-et-al-evol-2016-representing-phylogeny-as-a-logically-tractable-variable

Upload: taxonbytes

Post on 09-Jan-2017

436 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Representing phylogeny asa logically tractable variable

Please

@taxonbytes

Nico M. Franz1 , Guanyang Zhang1, Shizhuo Yu2 & Bertram Ludäscher3

1 School of Life Sciences, Arizona State University2 Computer Science, University of California at Davis 3 iSchool, University of Illinois at Urbana-Champaign

Systematics / Bioinformatics Session – Evolution 2016 MeetingsJune 20, 2016 - Austin, Texas (#Evol2016)

@ http://www.slideshare.net/taxonbytes/franz-et-al-evol-2016-representing-phylogeny-as-a-logically-tractable-variable

Page 2: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638

2015 2014

Phylogenetic inferencescan vary over time.

Page 3: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Querying phylogenetic advancement – premises & questions

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

Page 4: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:

Querying phylogenetic advancement – premises & questions

Page 5: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related phylogenetic inferenceshave a stabilizing or destabilizing trend?"

Querying phylogenetic advancement – premises & questions

Page 6: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related phylogenetic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more phylogenies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

Querying phylogenetic advancement – premises & questions

Page 7: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related phylogenetic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more phylogenies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

3. "How can an evolutionary study tied to one (earlier) phylogenybe "updated" (integrated) with another (later) phylogeny?"

Querying phylogenetic advancement – premises & questions

Page 8: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:

1. "Does this sequence of related phylogenetic inferenceshave a stabilizing or destabilizing trend?"

2. "Are two or more phylogenies – each differentially sub-sampledat lower levels – in congruence or in conflict?"

3. "How can an evolutionary study tied to one (earlier) phylogenybe "updated" (integrated) with another (later) phylogeny?"

Querying phylogenetic advancement – premises & questions

Service We can prioritize research agendas accordingly.

Service Sampling an issue? Or are signals complementary?

Service Effects of "phylogenetic variable" on conclusions can be controlled for.

Page 9: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Research question:

How to build such a service?

Page 10: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Phylogenetic advancement service – design specifications

Example service queries:

1. Do succeeding inferences have stabilizing or destabilizing trends?

2. Are multiple, differentially sub-sampled phylogenies congruent or conflicting?

3. How to control for the "phylogenetic variable" affecting evolutionary inferences?

Page 11: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Phylogenetic advancement service – design specifications

Example service queries:

1. Do succeeding inferences have stabilizing or destabilizing trends?

2. Are multiple, differentially sub-sampled phylogenies congruent or conflicting?

3. How to control for the "phylogenetic variable" affecting evolutionary inferences?

• The service needs more granular identifiers than "just Linnaean names" to account for study-specific phylogenetic (sub-)tree concepts.

Page 12: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Phylogenetic advancement service – design specifications

Example service queries:

1. Do succeeding inferences have stabilizing or destabilizing trends?

2. Are multiple, differentially sub-sampled phylogenies congruent or conflicting?

3. How to control for the "phylogenetic variable" affecting evolutionary inferences?

• The service needs more granular identifiers than "just Linnaean names" to account for study-specific phylogenetic (sub-)tree concepts.

• The service needs identifier-to-identifier relationships that match the query semantics by representing congruence, conflict, and achieving reconciliation across concepts associated with multiple inferences.

Page 13: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Phylogenetic advancement service – design specifications

Example service queries:

1. Do succeeding inferences have stabilizing or destabilizing trends?

2. Are multiple, differentially sub-sampled phylogenies congruent or conflicting?

3. How to control for the "phylogenetic variable" affecting evolutionary inferences?

• The service needs more granular identifiers than "just Linnaean names" to account for study-specific phylogenetic (sub-)tree concepts.

• The service needs identifier-to-identifier relationships that match the query semantics by representing congruence, conflict, and achieving reconciliation across concepts associated with multiple inferences.

• Compatibility with both human cognitive constraints1 and computational logic is desirable to balance service quality (allowing expert interactions) with scalability (semi-automated reasoning products).

1 Atran, S. 1998. Folk biology and the anthropology of science: cognitive universals and cultural particulars. Behavioral and Brain Sciences 21: 547–569.

Page 14: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Proposed solution: Euler/X – logically consistent RCC–5 alignments

• Input: multiple taxonomies and/or phylogenies; expert-provided articulations

• Output: logic consistency checking; Maximally Informative Relations (MIR); alignment visualizations

Page 15: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (RCC–5) articulations

• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

Page 16: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (RCC–5) articulations

• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

• RCC–5 articulations match the query: "can we join regions N and M?"

• Phylogenies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens

Page 17: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Overview of RCC–5 alignment use cases

1. Two primate classifications – MSW2 (1993) versus MSW3 (2005)

a. Microcebus + Mirza sec. MSW3 (2005)b. Quantifying name (identifier) reliabilityc. Reasoning achieves scalability (matrix)

2. Lorisiformes sec. MSW3 (2005) versus Springer et al. (2012) phylogeny

a. Phylogeny refines classificationb. Sampling causes non-congruence

3. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)

a. Psittaciformes with & without coverageb. Precise semiotics for the "avian explosion"

Page 18: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 1:

Two primate classifications –

MSW2 (1993) versus MSW3 (2005)

Page 19: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

"Taxonomic concept labels"identify input concept regions

RCC–5 articulations providedfor each species-level concept

• Input visualization: MSW3 (2005) versus MSW2 (1993)

Source: Franz et al. 2016. Two influential primate classifications logical aligned. Systematic Biology 65: 561–582.

Page 20: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Alignment visualization: "grey means congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 21: Franz et al evol 2016 representing phylogeny as a logically tractable variable

One name &congruent region

• Alignment visualization: "grey means congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 22: Franz et al evol 2016 representing phylogeny as a logically tractable variable

One name &congruent region

Many names &congruent region

• Alignment visualization: "grey means congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 23: Franz et al evol 2016 representing phylogeny as a logically tractable variable

One name &congruent region

Many names &congruent region

One name &non-congruent regions

• Alignment visualization: "grey means congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 24: Franz et al evol 2016 representing phylogeny as a logically tractable variable

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

• Alignment visualization: "grey means congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 25: Franz et al evol 2016 representing phylogeny as a logically tractable variable

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Alignment visualization: "grey means congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 26: Franz et al evol 2016 representing phylogeny as a logically tractable variable

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Application of coverage constraint: parent-to-parent articulations (><) are fully defined by alignment signal propagated from their respective children.

Sensible when complete sampling of children is intended.

• Alignment visualization: "grey means congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 27: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 1.b.: Quantifying name (identifier) reliability

One name &congruent region

• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

Page 28: Franz et al evol 2016 representing phylogeny as a logically tractable variable

One name &congruent region

• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence is not only caused by differential low-level sampling; (3) alignment constitutes a taxonomic meaning integration map to navigate across MSW3 & MSW2.

Use case 1.b.: Quantifying name (identifier) reliability

Page 29: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Source: Franz et al. 2016. Two influential primate classifications logical aligned. Systematic Biology 65: 561–582.

1 in 3 names are unreliable across MSW2/MSW3 classifications

Page 30: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 1.c.: Reasoning achieves scalability (MIR matrix)

Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf

• Input: 402 articulations. Output: 153,111 Maximally Informative Relations

Salmon cells↔ reasoning

Page 31: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 2:

Lorisiformes sec. MSW3 versus

Springer et al. (2012) phylogeny

Page 32: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 2.a.: Phylogeny (2012) refines classification (2005)

Springer et al. (2012) @ OpenTree https://tree.opentreeoflife.org/curator/study/view/pg_2656

Page 33: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Convert OpenTree-supported NeXML files into Euler/X input (credit: Daisie Huang): https://github.com/daisieh/phylogenomics/blob/master/converting/nexml_to_euler.py & https://github.com/daisieh/phylogenomics/blob/master/converting/relabel_euler.py

Springer et al. (2012) @ OpenTree https://tree.opentreeoflife.org/curator/study/view/pg_2656

Euler/X input visualization

Use case 2.a.: Phylogeny (2012) refines classification (2005)

Page 34: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Convert OpenTree-supported NeXML files into Euler/X input (credit: Daisie Huang): https://github.com/daisieh/phylogenomics/blob/master/converting/nexml_to_euler.py & https://github.com/daisieh/phylogenomics/blob/master/converting/relabel_euler.py

Springer et al. (2012) @ OpenTree https://tree.opentreeoflife.org/curator/study/view/pg_2656

Euler/X input visualization

MSW3 (2005)

Use case 2.a.: Phylogeny (2012) refines classification (2005)

Page 35: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Alignment visualization

Use case 2.a.: Phylogeny (2012) refines classification (2005)

Page 36: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• At low-to-mid levels, phylogeny often congruently refines classification.

Congruent refinement

Alignment visualization

Use case 2.a.: Phylogeny (2012) refines classification (2005)

Page 37: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• At low-to-mid levels, phylogeny often congruently refines classification.

Congruent refinement

Unique species concepts

Alignment visualization

Use case 2.a.: Phylogeny (2012) refines classification (2005)

Page 38: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 2.b.: Sampling alone causes high-level non-congruence

• At low-to-mid levels, phylogeny often congruently refines classification.• However, differential low-level sampling causes 'excessive' non-congruence

at higher levels that may not reflect the authors' intentions.

Congruent refinement

Unique species concepts

Higher level conflict

Alignment visualization

Page 39: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 3:

Avian phylogenies sec. Prum et al. (2015)

versus Jarvis et al. (2014)

Page 40: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638

2015 2014

Let's recall..

Page 41: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 3: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)

• Sampling is highly differential: 198 versus 48 species-level entities• Only 12 species-level concept pairs are congruent [green cells]

Page 42: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 3.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with global coverage constraint

Input visualization Only disjoint articulations

Page 43: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Psittaciformes sec. 2015 – with global coverage constraint• No low-level congruence ↔ no congruent alignment regions

Input visualization Only disjoint articulations

Alignment visualization 108 MIR; all disjoint

Use case 3.a.: Psittaciformes with & without coverage constraint

Page 44: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Psittaciformes sec. 2015 – with coverage locally relaxed

Input visualization

Use case 3.a.: Psittaciformes with & without coverage constraint

Page 45: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Psittaciformes sec. 2015 – with coverage locally relaxed• No coverage constraint for 2014/2015.[Psittacidae, Nestor]

Input visualization

Use case 3.a.: Psittaciformes with & without coverage constraint

Page 46: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Psittaciformes sec. 2015 – with coverage locally relaxed• No coverage constraint for 2014/2015.[Psittacidae, Nestor]

• Allows for 3 congruent & 7 inclusive RCC–5 articulations

Input visualization

Use case 3.a.: Psittaciformes with & without coverage constraint

Page 47: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Use case 3.a.: Psittaciformes with & without coverage constraint

Page 48: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Additional 2015 low-level sampling

Use case 3.a.: Psittaciformes with & without coverage constraint

Page 49: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Use case 3.b.: Precise semiotics for the "avian explosion"

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Page 50: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within2015.Paleognathae Non-congruence within

2014.Pelecanimorphae

Use case 3.b.: Precise semiotics for the "avian explosion"

Page 51: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within2015/2014.Neoaves

(see next slide)

Use case 3.b.: Precise semiotics for the "avian explosion"

Page 52: Franz et al evol 2016 representing phylogeny as a logically tractable variable

• Neoaves sec. 2015/2014, and 3–4 less inclusive levels

26 overlapping articulations in the sub- Neoavian alignment region cannot be assigned to differential sampling 'Genuine' phylogenetic conflict

Use case 3.b.: Precise semiotics for the "avian explosion"

Page 53: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Example service queries:1. Do succeeding inferences have stabilizing or destabilizing trends?2. Are multiple, differentially sub-sampled

phylogenies congruent or conflicting?3. How to control for the "phylogenetic variable" affecting evolutionary inferences?

In conclusion:

How to build such a service?

Page 54: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Creating a Phylogenetic Knowledge Advancement Service

• Explicitly represent human theories, and only indirectly taxa (species, clades).

• Identify study-specific phylogenetic (sub-)tree concepts with precision.

Page 55: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Creating a Phylogenetic Knowledge Advancement Service

• Explicitly represent human theories, and only indirectly taxa (species, clades).

• Identify study-specific phylogenetic (sub-)tree concepts with precision.

• Utilize identifier-to-identifier relationships to represent the multi-component nature of phylogenetic congruence & non-congruence:• RCC–5 for nodes & terminals: w/o coverage ( sampling)• RCC–5 for character concepts & specimen sets (not shown)

Page 56: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Creating a Phylogenetic Knowledge Advancement Service

• Explicitly represent human theories, and only indirectly taxa (species, clades).

• Identify study-specific phylogenetic (sub-)tree concepts with precision.

• Utilize identifier-to-identifier relationships to represent the multi-component nature of phylogenetic congruence & non-congruence:• RCC–5 for nodes & terminals: w/o coverage ( sampling)• RCC–5 for character concepts & specimen sets (not shown)

• Excel at representing and explaining conflict, as opposed to 'voting' (which is external, a posteriori); enforce multi-theory alignment, not synthesis.

Page 57: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Creating a Phylogenetic Knowledge Advancement Service

• Explicitly represent human theories, and only indirectly taxa (species, clades).

• Identify study-specific phylogenetic (sub-)tree concepts with precision.

• Utilize identifier-to-identifier relationships to represent the multi-component nature of phylogenetic congruence & non-congruence:• RCC–5 for nodes & terminals: w/o coverage ( sampling)• RCC–5 for character concepts & specimen sets (not shown)

• Excel at representing and explaining conflict, as opposed to 'voting' (which is external, a posteriori); enforce multi-theory alignment, not synthesis.

• Explicitly identify speaker intentions – by modeling phylogenetically localized intentions of semantic precision (and in what specific sense?) versus ambiguity – vis-à-vis their own products, and in relation to those of other author teams. Otherwise, signal interpretation remains unreliable.

Page 58: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Acknowledgements & links to products and references

• Euler/X team: Shawn Bowers, Parisa Kianmajd, Timothy McPhillips.

• ProvenanceMatrix: Tuan Nhon Dang.

• NSF DEB–1155984, DBI–1342595 (PI Franz).• NSF IIS–118088, DBI–1147273 (PI Ludäscher).

• Information @ http://taxonbytes.org/tag/concept-taxonomy/

• Euler/X code @ https://github.com/EulerProject/EulerX

• Franz et al. 2015. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE 10(2): e0118247. Link

• Franz et al. 2016. Two influential primate classifications logically aligned. Systematic Biology 65(4): 561–582. Link

Page 59: Franz et al evol 2016 representing phylogeny as a logically tractable variable

Interested in exploringmulti-phylogeny alignments?

Please contact me.

[email protected]@taxonbytes

https://biokic.asu.edu/