integrating data with phylogenies, at scale

Post on 16-Apr-2017

114 Views

Category:

Science

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Integra(ngdatawithphylogenies,atscale

NicoCellineseUniversityofFlorida

&HilmarLappDukeUniversity

WHAT’SINANAME?

What’sinaname?

Chaos!• NamesandConceptsdonotreconcilethateasily• Namesaretextstrings•  Contextislackingorsubjec(ve• Meaningisnotcomputable

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

Idon’tunderstandanyofthoseconceptswhetherinLaDnorEnglish,butIcansDlllinkthemtotheirnames,asinoneobject

tooneobject

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

…and200+

…and400+

Idiosyncratic Russian dolls syndrome

Idiosyncratic Russian dolls syndrome

Idiosyncratic Russian dolls syndrome

Idiosyncratic Russian dolls syndrome

Idiosyncratic Russian dolls syndrome

Idiosyncratic Russian dolls syndrome

Idiosyncratic Russian dolls syndrome

FromahumanperspecDve,welosetrackofconcepts.Hardtoreconcileallofthem.Weneedhelp!Canwecomputethem?

Idiosyncratic Russian dolls syndrome

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

…and200+

…and400+

•  WecanuncluNerconcepts,andtherebynomenclature

•  HowdowenavigatealongtheTreeofLiferepurposingLinneannames,whicharelinkedtotradi(onalconcepts?

Darktaxa!

Darktaxa!

Howdoweintegratedatawiththistree?

Tree-thinkingCommondescentàevoluDonatthecenteroftaxonomy

B C D

Branches

Synapomorphies

A

Clades=taxa

Discovery

Tree-thinkingCommondescentàevoluDonatthecenteroftaxonomy

Discovery

CommunicaDonHow??

014

7De

nsity

0.07

0.22

0.72Diversification rate

Tree-thinking

Berberidopsidaceae

OpilionesZingiberaceae

HamamelidaceaeSarcolaenaceae

Lingulidae

Hymenoptera

Mammalia

Apocynaceae

Galliformes

Rubiaceae

Anarthriaceae

Lineidae

CrocodylidaeStylosiphonia

Andrenidae Cracidae

Gavialis

Globba

Micrella Rhodoleia

Phalangiidae Tachyglossa

Lyginia

Mediusella

Chamaeclitandra

Tree-thinking

Berberidopsidaceae

OpilionesZingiberaceae

HamamelidaceaeSarcolaenaceae

Lingulidae

Hymenoptera

Mammalia

Apocynaceae

Galliformes

Rubiaceae

Anarthriaceae

Lineidae

CrocodylidaeStylosiphonia

Andrenidae Cracidae

Gavialis

Globba

Micrella Rhodoleia

Phalangiidae Tachyglossa

Lyginia

Mediusella

Chamaeclitandra

ThesenamesarenotgeneratedinanevoluDonary-basedframework(Groupsdefinedbycharactersimilarityvs.commondescent)

BoththeEncyclopediaofLife(EOL)andtheOpenTreeofLifesuggestthatCampanuloideaeisamisspellingofCampaniloidea(marinegastropods!)GBIFdoesnotcurrentlyhaveCampanuloideaeinitsbackbonetaxonomy.

Areyoukiddingme?

ThesearetheCampanuloideae!

Wangetal.2014

LifeasastreetmapHowtonavigatelifeasamachine

Mappingdatatophylogene(cknowledgespace

Streetsignsservepeople,notmachines

•  HowdowebuildareliableGPSforphylogenies?•  Howdowereproduciblyfindtherightnodes?

Mappingdatatophylogene(cknowledgespace

FEED

Textual Definition –

The hyoglossus is a muscle that attaches to the hyoid and tongue and is innervated by Cranial Nerve XII.

Computable Definition –

('attached to' some 'hyoid bone') and ('attached to' some tongue) and ('innervated by' some 'hypoglossal nerve') and spatially disjoint with 'intrinsic tongue muscle'

Druzinskyetal(2015):LogicdefiniDonsofmammalianfeedingmusclesbymeansofnecessaryandsufficientcondiDonstrueforallmammals

Nomenclature≠Seman(cs

Phyloreference=

Logicdefini(onofaclade,usingthepropertycommonto

alloflife

PhyloreferencesStatementsformallyexpressingthepaaernswediscover

(analogoustomapcoordinates)

Node-Based Branch-Based Apomorphy-Based

A B C A B C A B C

X

ThecladeoriginaDngwiththelastcommonancestorofBandC.

ThecladeoriginaDngwiththefirstancestorofBthatisnotanancestorofA.

ThecladeoriginaDngwiththefirstancestorofCtoevolveX.

PhyloreferencesyieldacoordinatesystemfortheTreeofLife

•  Anynode,branch,subtreeisreferenceable•  Referencesareunambiguous•  Referencesarecomputable•  Referencesareportable•  Adaptstonewandchangingknowledge

Manyneededtechnologiesalreadyexist

•  OWLontologiesdesignedfor–  PhylogeneDcknowledge:

CDAO

–  Phenotypicknowledge:Uberon,PATO,…

–  Efficientandexpressivereasoners:FaCT++,HermiT,Racer,ELK

0.0

Campanula_rotundifolia

Pseudonemacladus_oppositifolius

Lobelia_cardinalis

Campanula_latifolia

Cyphocarpus_rigescens

Wahlenbergia_linifolia

Nemacladus_ramosissmus

Lobelia_coronopifolia

Cyphia_elata

Pentaphragma

Crysanthemum

Sphenoclea

Platycodon_grandiflorus

Cyphia_bulbosa

53

Campanula

1

7

8

9

4

Lobelia

Cyphia

6

1 0

2

Class:Campanulaceae_1889_to_1980EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Crysanthemum

0.0

Campanula_rotundifolia

Pseudonemacladus_oppositifolius

Lobelia_cardinalis

Campanula_latifolia

Cyphocarpus_rigescens

Wahlenbergia_linifolia

Nemacladus_ramosissmus

Lobelia_coronopifolia

Cyphia_elata

Pentaphragma

Crysanthemum

Sphenoclea

Platycodon_grandiflorus

Cyphia_bulbosa

53

Campanula

1

7

8

9

4

Lobelia

Cyphia

6

1 0

2

Class:Campanulaceae_1980EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Lobelia

0.0

Campanula_rotundifolia

Pseudonemacladus_oppositifolius

Lobelia_cardinalis

Campanula_latifolia

Cyphocarpus_rigescens

Wahlenbergia_linifolia

Nemacladus_ramosissmus

Lobelia_coronopifolia

Cyphia_elata

Pentaphragma

Crysanthemum

Sphenoclea

Platycodon_grandiflorus

Cyphia_bulbosa

53

Campanula

1

7

8

9

4

Lobelia

Cyphia

6

1 0

2

Class:Campanulaceae_aier_1995EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Sphenoclea

Phyloreferencesasontologicalexpressions

Phyloreferenceexpressionscanbe:•  Easilygeneratedbyanyone

•  Canworkonanytree•  Namedandregistered

– Topromotereuseandconsistency

– Toimproveusabilityandaccessibility

Class:CampanulaceaeAnnota(ons:rdfs:label“Campanulaceae_aier_1995”dc:descripDon“thecladethatincludesCampanulalaDfoliabutnotSphenoclea”EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Sphenoclea

Class:AGF4-SHRU-3560EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Sphenoclea

vs.

Challenges

•  OWL-baseddatamodeltosaDsfyphylogeneDctaxonomy,reasoningexpressivity,scalability

•  ConvenDonsfordatatransformaDon,andconsequencesofdifferentchoices

•  LeastcommonancestorreasoningforOWLdata

•  LackofcanonicalspecimenidenDfiersystem•  Specifiermappingontologies

TreeofLife,ontologized:Auniversalcoordinatesystem

•  TheTreeofLifeisitselfanaggregaDonandintegraDonofourphylogeneDcknowledge.

•  Phyloreferencingisaddressingintoaknowledgeuniverse.

•  Ontologies,reasoning,andotherKRtechniquesarepowerfultoolsforthis.

Acknowledgements

•  NaDonalScienceFoundaDon(DBI-1458484)•  KenandLindaMcGurn•  Phenoscape•  EvoIO

top related