franz et al evol 2016 representing phylogeny as a logically tractable variable
TRANSCRIPT
Representing phylogeny asa logically tractable variable
Please
@taxonbytes
Nico M. Franz1 , Guanyang Zhang1, Shizhuo Yu2 & Bertram Ludäscher3
1 School of Life Sciences, Arizona State University2 Computer Science, University of California at Davis 3 iSchool, University of Illinois at Urbana-Champaign
Systematics / Bioinformatics Session – Evolution 2016 MeetingsJune 20, 2016 - Austin, Texas (#Evol2016)
@ http://www.slideshare.net/taxonbytes/franz-et-al-evol-2016-representing-phylogeny-as-a-logically-tractable-variable
Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638
2015 2014
Phylogenetic inferencescan vary over time.
Querying phylogenetic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:
Querying phylogenetic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:
1. "Does this sequence of related phylogenetic inferenceshave a stabilizing or destabilizing trend?"
Querying phylogenetic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:
1. "Does this sequence of related phylogenetic inferenceshave a stabilizing or destabilizing trend?"
2. "Are two or more phylogenies – each differentially sub-sampledat lower levels – in congruence or in conflict?"
Querying phylogenetic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:
1. "Does this sequence of related phylogenetic inferenceshave a stabilizing or destabilizing trend?"
2. "Are two or more phylogenies – each differentially sub-sampledat lower levels – in congruence or in conflict?"
3. "How can an evolutionary study tied to one (earlier) phylogenybe "updated" (integrated) with another (later) phylogeny?"
Querying phylogenetic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge advancement service". The service satisfies queries such as:
1. "Does this sequence of related phylogenetic inferenceshave a stabilizing or destabilizing trend?"
2. "Are two or more phylogenies – each differentially sub-sampledat lower levels – in congruence or in conflict?"
3. "How can an evolutionary study tied to one (earlier) phylogenybe "updated" (integrated) with another (later) phylogeny?"
Querying phylogenetic advancement – premises & questions
Service We can prioritize research agendas accordingly.
Service Sampling an issue? Or are signals complementary?
Service Effects of "phylogenetic variable" on conclusions can be controlled for.
Research question:
How to build such a service?
Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing or destabilizing trends?
2. Are multiple, differentially sub-sampled phylogenies congruent or conflicting?
3. How to control for the "phylogenetic variable" affecting evolutionary inferences?
Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing or destabilizing trends?
2. Are multiple, differentially sub-sampled phylogenies congruent or conflicting?
3. How to control for the "phylogenetic variable" affecting evolutionary inferences?
• The service needs more granular identifiers than "just Linnaean names" to account for study-specific phylogenetic (sub-)tree concepts.
Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing or destabilizing trends?
2. Are multiple, differentially sub-sampled phylogenies congruent or conflicting?
3. How to control for the "phylogenetic variable" affecting evolutionary inferences?
• The service needs more granular identifiers than "just Linnaean names" to account for study-specific phylogenetic (sub-)tree concepts.
• The service needs identifier-to-identifier relationships that match the query semantics by representing congruence, conflict, and achieving reconciliation across concepts associated with multiple inferences.
Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing or destabilizing trends?
2. Are multiple, differentially sub-sampled phylogenies congruent or conflicting?
3. How to control for the "phylogenetic variable" affecting evolutionary inferences?
• The service needs more granular identifiers than "just Linnaean names" to account for study-specific phylogenetic (sub-)tree concepts.
• The service needs identifier-to-identifier relationships that match the query semantics by representing congruence, conflict, and achieving reconciliation across concepts associated with multiple inferences.
• Compatibility with both human cognitive constraints1 and computational logic is desirable to balance service quality (allowing expert interactions) with scalability (semi-automated reasoning products).
1 Atran, S. 1998. Folk biology and the anthropology of science: cognitive universals and cultural particulars. Behavioral and Brain Sciences 21: 547–569.
Proposed solution: Euler/X – logically consistent RCC–5 alignments
• Input: multiple taxonomies and/or phylogenies; expert-provided articulations
• Output: logic consistency checking; Maximally Informative Relations (MIR); alignment visualizations
Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (RCC–5) articulations
• Two regions N, M are either:
• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)
Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (RCC–5) articulations
• Two regions N, M are either:
• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)
• RCC–5 articulations match the query: "can we join regions N and M?"
• Phylogenies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens
Overview of RCC–5 alignment use cases
1. Two primate classifications – MSW2 (1993) versus MSW3 (2005)
a. Microcebus + Mirza sec. MSW3 (2005)b. Quantifying name (identifier) reliabilityc. Reasoning achieves scalability (matrix)
2. Lorisiformes sec. MSW3 (2005) versus Springer et al. (2012) phylogeny
a. Phylogeny refines classificationb. Sampling causes non-congruence
3. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)
a. Psittaciformes with & without coverageb. Precise semiotics for the "avian explosion"
Use case 1:
Two primate classifications –
MSW2 (1993) versus MSW3 (2005)
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"identify input concept regions
RCC–5 articulations providedfor each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. Systematic Biology 65: 561–582.
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
One name &congruent region
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
One name &congruent region
Many names &congruent region
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
One name &congruent region
Many names &congruent region
One name &non-congruent regions
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) are fully defined by alignment signal propagated from their respective children.
Sensible when complete sampling of children is intended.
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
Use case 1.b.: Quantifying name (identifier) reliability
One name &congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
One name &congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence is not only caused by differential low-level sampling; (3) alignment constitutes a taxonomic meaning integration map to navigate across MSW3 & MSW2.
Use case 1.b.: Quantifying name (identifier) reliability
Source: Franz et al. 2016. Two influential primate classifications logical aligned. Systematic Biology 65: 561–582.
1 in 3 names are unreliable across MSW2/MSW3 classifications
Use case 1.c.: Reasoning achieves scalability (MIR matrix)
Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf
• Input: 402 articulations. Output: 153,111 Maximally Informative Relations
Salmon cells↔ reasoning
Use case 2:
Lorisiformes sec. MSW3 versus
Springer et al. (2012) phylogeny
Use case 2.a.: Phylogeny (2012) refines classification (2005)
Springer et al. (2012) @ OpenTree https://tree.opentreeoflife.org/curator/study/view/pg_2656
Convert OpenTree-supported NeXML files into Euler/X input (credit: Daisie Huang): https://github.com/daisieh/phylogenomics/blob/master/converting/nexml_to_euler.py & https://github.com/daisieh/phylogenomics/blob/master/converting/relabel_euler.py
Springer et al. (2012) @ OpenTree https://tree.opentreeoflife.org/curator/study/view/pg_2656
Euler/X input visualization
Use case 2.a.: Phylogeny (2012) refines classification (2005)
Convert OpenTree-supported NeXML files into Euler/X input (credit: Daisie Huang): https://github.com/daisieh/phylogenomics/blob/master/converting/nexml_to_euler.py & https://github.com/daisieh/phylogenomics/blob/master/converting/relabel_euler.py
Springer et al. (2012) @ OpenTree https://tree.opentreeoflife.org/curator/study/view/pg_2656
Euler/X input visualization
MSW3 (2005)
Use case 2.a.: Phylogeny (2012) refines classification (2005)
Alignment visualization
Use case 2.a.: Phylogeny (2012) refines classification (2005)
• At low-to-mid levels, phylogeny often congruently refines classification.
Congruent refinement
Alignment visualization
Use case 2.a.: Phylogeny (2012) refines classification (2005)
• At low-to-mid levels, phylogeny often congruently refines classification.
Congruent refinement
Unique species concepts
Alignment visualization
Use case 2.a.: Phylogeny (2012) refines classification (2005)
Use case 2.b.: Sampling alone causes high-level non-congruence
• At low-to-mid levels, phylogeny often congruently refines classification.• However, differential low-level sampling causes 'excessive' non-congruence
at higher levels that may not reflect the authors' intentions.
Congruent refinement
Unique species concepts
Higher level conflict
Alignment visualization
Use case 3:
Avian phylogenies sec. Prum et al. (2015)
versus Jarvis et al. (2014)
Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638
2015 2014
Let's recall..
Use case 3: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)
• Sampling is highly differential: 198 versus 48 species-level entities• Only 12 species-level concept pairs are congruent [green cells]
Use case 3.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with global coverage constraint
Input visualization Only disjoint articulations
• Psittaciformes sec. 2015 – with global coverage constraint• No low-level congruence ↔ no congruent alignment regions
Input visualization Only disjoint articulations
Alignment visualization 108 MIR; all disjoint
Use case 3.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed
Input visualization
Use case 3.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• No coverage constraint for 2014/2015.[Psittacidae, Nestor]
Input visualization
Use case 3.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• No coverage constraint for 2014/2015.[Psittacidae, Nestor]
• Allows for 3 congruent & 7 inclusive RCC–5 articulations
Input visualization
Use case 3.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
Alignment visualization
Use case 3.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
Alignment visualization
Additional 2015 low-level sampling
Use case 3.a.: Psittaciformes with & without coverage constraint
Use case 3.b.: Precise semiotics for the "avian explosion"
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
Non-congruence within2015.Paleognathae Non-congruence within
2014.Pelecanimorphae
Use case 3.b.: Precise semiotics for the "avian explosion"
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
Non-congruence within2015/2014.Neoaves
(see next slide)
Use case 3.b.: Precise semiotics for the "avian explosion"
• Neoaves sec. 2015/2014, and 3–4 less inclusive levels
26 overlapping articulations in the sub- Neoavian alignment region cannot be assigned to differential sampling 'Genuine' phylogenetic conflict
Use case 3.b.: Precise semiotics for the "avian explosion"
Example service queries:1. Do succeeding inferences have stabilizing or destabilizing trends?2. Are multiple, differentially sub-sampled
phylogenies congruent or conflicting?3. How to control for the "phylogenetic variable" affecting evolutionary inferences?
In conclusion:
How to build such a service?
Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.
Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.
• Utilize identifier-to-identifier relationships to represent the multi-component nature of phylogenetic congruence & non-congruence:• RCC–5 for nodes & terminals: w/o coverage ( sampling)• RCC–5 for character concepts & specimen sets (not shown)
Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.
• Utilize identifier-to-identifier relationships to represent the multi-component nature of phylogenetic congruence & non-congruence:• RCC–5 for nodes & terminals: w/o coverage ( sampling)• RCC–5 for character concepts & specimen sets (not shown)
• Excel at representing and explaining conflict, as opposed to 'voting' (which is external, a posteriori); enforce multi-theory alignment, not synthesis.
Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.
• Utilize identifier-to-identifier relationships to represent the multi-component nature of phylogenetic congruence & non-congruence:• RCC–5 for nodes & terminals: w/o coverage ( sampling)• RCC–5 for character concepts & specimen sets (not shown)
• Excel at representing and explaining conflict, as opposed to 'voting' (which is external, a posteriori); enforce multi-theory alignment, not synthesis.
• Explicitly identify speaker intentions – by modeling phylogenetically localized intentions of semantic precision (and in what specific sense?) versus ambiguity – vis-à-vis their own products, and in relation to those of other author teams. Otherwise, signal interpretation remains unreliable.
Acknowledgements & links to products and references
• Euler/X team: Shawn Bowers, Parisa Kianmajd, Timothy McPhillips.
• ProvenanceMatrix: Tuan Nhon Dang.
• NSF DEB–1155984, DBI–1342595 (PI Franz).• NSF IIS–118088, DBI–1147273 (PI Ludäscher).
• Information @ http://taxonbytes.org/tag/concept-taxonomy/
• Euler/X code @ https://github.com/EulerProject/EulerX
• Franz et al. 2015. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE 10(2): e0118247. Link
• Franz et al. 2016. Two influential primate classifications logically aligned. Systematic Biology 65(4): 561–582. Link
Interested in exploringmulti-phylogeny alignments?
Please contact me.
[email protected]@taxonbytes
https://biokic.asu.edu/