whole-tale: the experience of research
TRANSCRIPT
WholeTale:TheExperience ofResearch…through reproducible,computationalnarratives
YesWorkflow:Revealingworkflow,provenancefromscriptsKurator:AutomatingdatacleaningworkflowsEulerX:Agreeingtodisagreeaboutvarianttaxonomies
BertramLudä[email protected]
BCoN Workshop2018-02-13..14UKansas
Director,CenterforInformaticsResearchinScience&Scholarship(CIRSS)SchoolofInformationSciences(iSchool@Illinois)
&NationalCenterforSupercomputingApplications(NCSA)&DepartmentofComputerScience(CS@Illinois)
1
WholeTale:Thenextstepintheevolutionofthescholarlyarticle:The“Living”Paper
• 1st Generation:– narrative (prose)
• 2nd Generation:plus …– name..identify..include(accessto)data
• 3rd Generation:plus …– name..reference..includecode (software)..– andprovenance …andexecenvironment(containers)
Ludäscher:Whole-Tale++ 2
WholeTale
WholeTaleDashboard
WholeTale:What’sinaname?
(1)WholeTale⇔WholeStory:◦ Support(computational /data)scientists◦…alongthecompleteresearchlifecycle◦ ...fromexperimentto(newkindof)publication◦ ...andback!
(2)WholeTale⇔ fortheLongTailofScience–Easysharingofyourcomputationalnarratives,data,andexec-env since2017!
–Powerapplicationsforeveryone!
3Ludäscher:Whole-Tale++
TheWholeTale:MergingScienceandCyberinfrastructurePathways
NSF-DIBBSaward (5years,5institutions)• Illinois(NCSA&iSchool)• BertramLudäscher(PI),MTCampbell(PM)[KandaceTurner],VictoriaStodden(coPI),MattTurk(coPI),KacperKowalik(sw-architect),CraigWillis(dev)
•UofChicago• KyleChard(coPI),MihaelHategan(dev)
•UTAustin/TACC•NiallGaffney(coPI),SivaKulasekaran(dev)
•UNotreDame• JarekNabrzyski(coPI),IanTaylor(dev),AdamBrinckman(dev)
•UCSB/NCEAS•Matt Jones(coPI),BryceMecum(dev)
4
Whole TaleMotivation• Can'treproduceresultbecause:
• Don'tknowhowtorunanalysis
• Can'tgetthesoftwarerunning
• Can'tpayforthecomputerorcomputepowertheresultwascomputedon
Source:BryceMecum,WTteam@NCEAS5
Whole TaleVision• Livingpublication
(data+code+environment)
• Facilitatereproducibility
• Encourageinvestigationofresultsmakingiteasytorecreatetheenvironmenttheresultwascreatedin
Article
7
AnotherexampleTale:LIGOgravitationalwavedetection
(tutorialJupyter notebook)
Ludäscher:Whole-Tale++ 20
New&UpcomingFeaturesinWT...• AddyourownFrontends(e.g.OpenRefine,..)• Persistent,sharedorpersonalfiles:
– /data/(registered/externaldata,read-only,associatedwithatale)– /home/(yourowndata,r/w,associatedwithallyourtales)– /workspace/(sharedr/wdata,associatedwithatale,acrossallusers)
• WT“DerivedTales”:– takeatale;modifyittoyourliking;andpublishasaderivedwork
• WT“Take-Out”:– Wanttorunyourtaleselsewhere?– Take-out yourtaleandrunonyouron(orcloud)platform
• WT“Scale-Out”:– IftheWT-dashboardisn’tenoughè runyourownWTsystem!
• WT Provenance support:– …viaDataONE provenancetools,ProvONE model(W3CPROVextension)– …viaYesWorkflow
• InterestinjoiningaWTBiodiversityInformaticsWorkingGroup!?– Wealreadyhave:archaeology&ecology,astronomy,materialsscience– Yourinputwanted!(isWTdevelopingsomethingusefulforyou?)– TryoutWT,createsomeexamples(inR,Python,...)andprovidefeedback!– =>possibilitytofundasummerintern!
Ludäscher:Whole-Tale++ 28
Provenanceis:keepingrecords …
• GrandCanyon’srocklayersarearecordoftheearlygeologichistoryofNorthAmerica.Theancestralpuebloan granariesatNankoweap Creektellarchaeologistsaboutmorerecenthumanhistory.(ByDrenaline,licensedunderCCBY-SA3.0)
• Notshown:computationalarchaeologistsreconstructingpastclimatefrommultipletree-ringdatabasesè computationalprovenanceiskeyfortransparency &reproducibility
Ludäscher:Workflows&Provenance=>Understanding 30
...andprovenanceis:Understanding whathappened!
Zrzavý,Jan,DavidStorch,and StanislavMihulka.Evolution:EinLese-Lehrbuch.
Springer-Verlag,2009.
Author:Jkwchui (BasedondrawingbyTruth-seeker2004)
Ludäscher:Workflows&Provenance=>Understanding 31
Computational Provenance …• Origin,processinghistoryofartifacts
– dataproducts,figures,...– also:underlyingworkflowè understandmethods,dataflow,anddependencies
Ludäscher:Workflows&Provenance=>Understanding 32
Climate Change Impacts in the United States
U.S. National Climate AssessmentU.S. Global Change Research Program
YesWorkflow:Prospective&RetrospectiveProvenance…(almost)forfree!
• YWannotationsina(Python,R,…)scriptrecreateaworkflowviewfromthescript…
cassette_id
sample_score_cutoff
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
YW!
Ludäscher:Whole-Tale++ 34
@BEGIN..@END..@IN..@OUT..@URI..@LOG..
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
Paleoclimate Reconstruction(openSKOPE.org)• …explainedusingYesWorkflow!
KyleB.,(computational)archaeologist:"Ittookmeabout20minutestocomment.LessthananhourtolearnandYW-annotate,all-told."
Ludäscher:Whole-Tale++ 35
DwCA TaxonLookupWorkflow
• Declareinputs,outputs,andsteps ofascript(orwf)withYWannotationsto...– communicateprovenancegraphically(viagraphviz)
– combine differentformsofprovenance
– query provenance• SimpleYWannotationsincomments:– @BEGINStep,@ENDStep– @INData,@OUTData– @URITemplate,@LOGPattern
Ludäscher:Whole-Tale++ 39
�����������������
�������������������������������������������������������������������
��������������������������������������������������������������
������������������������������������������������
�������������������������
�������������������������������������������������������������
����������
�������������������������������������������������������������������������������������������������������
����������������
���������������������
�������������������������������������������������������
����������������
�������������������������������������������������������
�������������������
������������������������������������������
������������������
����������������������������������������
�����������������
���������������������������������������
������������
�������������������������������������������������������������������
��������������������������������������������������������
�����������������
Thestoryoftwoindividual
records
Ludäscher:Whole-Tale++ 41
�����������������
�����������������
�������������������
�������
����������
����������
�����������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
������
������������������
����������������
�������������������������������
�����������
������������������
����
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
�����������������
������������������
����������������
�������
����������
�����������
������������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
���������
�����������������
�������������������
���������������������������������
����������
�����������������
��������������������������������������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
������������������������������������������������������������������
• OnetooktheGBIFroute,while…
• … theotherwentallWORMS!
Non-Marine?è GBIF
Marine?èWORMS
Theaggregate story..
Ludäscher:Whole-Tale++ 42
�����������������
�����
���������
��������������
����������������
��������������������
�����������������
��������������������������
�������
����������
������������������
�������������������������
�����������������
����������������������������
�����������
�������������������������������
���������
����������
������������������������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
• Howmanyrecordswereobservedasinputsoroutputsofworkflowsteps?
• WerethereanyNULLvalues?Howmany?
YesWorkflow Summary• Lightweight YWannotationscan
beaddedeasilytoyourscriptstoreapworkflowbenefits– Documentation ofwhat’s
important– Visualization ofdependencies– Queryingprovenance(prospective,
retrospective,andhybrid)– Independent ofsystemorlanguage
used(R,Python,MATLAB,workflowtools,…)
èmake provenanceactionableè provenanceforself!
=> github.com/yesworkflow-org/yw=> try.yesworkflow.org
Ludäscher:Whole-Tale++ 43
�����������������
�������������������������������������������������������������������
��������������������������������������������������������������
������������������������������������������������
�������������������������
�������������������������������������������������������������
����������
�������������������������������������������������������������������������������������������������������
����������������
���������������������
�������������������������������������������������������
����������������
�������������������������������������������������������
�������������������
������������������������������������������
������������������
����������������������������������������
�����������������
���������������������������������������
������������
�������������������������������������������������������������������
��������������������������������������������������������
�����������������
�����������������
�����
���������
��������������
����������������
��������������������
�����������������
��������������������������
�������
����������
������������������
�������������������������
�����������������
����������������������������
�����������
�������������������������������
���������
����������
������������������������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
DemoTime
Ludäscher:Whole-Tale++ 44
(Disclaimer) https://github.com/idaks/dataone-ahm-2016-posterhttps://github.com/idaks/wt-prov-summer-2017https://github.com/yesworkflow-org/yw-idcc-17
Adding YesWorkflow to DataONEYaxing’s script withinputs &outputproducts
Christopher’sYesWorkflow
model
ChristopherusingYaxing’s outputsasinputsforhisscript
Christopher’sresultscanbetracedbackall
thewaytoYaxing’sinput
Ludäscher:Whole-Tale++ 47
Yi-YunCheng1,NicoFranz2,JodiSchneider1,Shizhuo Yu3,ThomasRodenhausen4,BertramLudäscher11SchoolofInformationSciences,UniversityofIllinoisatUrbana-Champaign;2SchoolofLifeSciences,ArizonaStateUniversity;3DepartmentofComputerScience,UniversityofCaliforniaatDavis;4SchoolofInformation,UniversityofArizona
Agreeing to Disagree: Reconciling Conflicting Taxonomic Views using a Logic-based Approach
Acknowledgments
Supportoftheauthors’researchthroughtheNationalScienceFoundationiskindlyacknowledged(DEB-1155984,DBI-1342595,andDBI-1643002).TheauthorsthankProfessorKathrynLaBarreforhercommentsandsuggestions.WewouldalsoliketothankDr.LaetitiaNavarroandJeffTerstriep forhelpwithcreatingmapoverlaysinQGIS.
CONCLUSION
• Ourlogic-basedtaxonomyalignmentapproachcanbeusedtosolvecrosswalking issuesWewillbeabletomitigatethemembershipconditionproblemsthatoccurinequivalentcrosswalking.
• RCC-5approachpreservestheoriginaltaxonomieswhileprovidinganalignmentviewWecansolvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking,whichotherwiseissubjectedtoinformationloss.
• Ourstudyalsounderscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottomupvs.Top-down)tomatchtheneedsofspecifictaxonomyalignmentproblemsBottom-upapproach:seemstoworkwellwheneverwehavenon-overlappingrelationshipsattheleaf-level(lowest-level)articulations,andwearenotsurehowthehigher-levelconceptsshouldbealigned.
Top-downapproach:seemsfavorablewhenthereisanexpectationofcertainhigher-levelarticulationsinconjunctionwithunder-specified,complex,andoftenoverlappingleaf-levelrelations.
RELATEDWORK
• TaxonomyAlignmentProblems(TAP)TaxonomiesT1,T2 areinter-linkedviaasetofinputarticulations A,definedasRCC-5relations, toyielda“merged”taxonomyT3 .
• Euler/XArticulations – aconstraintorrulethatdefinesarelationship(asetconstraint)betweentwoconceptsfromdifferenttaxonomies.
RegionConnectionCalculus(RCC-5)
PossibleWorlds–WhenencodingandsolvingTAPsviaASP,thedifferentanswersetsrepresentalternativetaxonomymergesolutionsorpossibleworlds(PWs).
INTRODUCTION
Tina:HeyAmy,canyourecommendasignaturedishfromwhereyoulive?
Amy:Oh,definitelythehalf-smokesfromtheNortheast!Theyarethesetastyhalf-porkandhalf-beefsausages.
Tina:Whatacoincidence!Wehavehalf-smokesintheSouth,too!WheredoyouliveintheNortheast?NewYork?Boston?
Amy:Wrongguesses!WheredoyouliveintheSouth?
TinaandAmytogether:Washington,D.C.
[Thetwoofthemlookateachother,confused.]
“Inthefaceofincompatibleinformationordatastructuresamongusersoramongthosespecifyingthesystem,attemptstocreateunitaryknowledgecategoriesarefutile.Rather,parallelormultiplerepresentationalformsarerequired…”(Bowker&Star,2000).
CASE1RESULTS:CENvs.NDC
• State-levelalignmentsareallcongruent(Bottom-up)• Inferrednewarticulationsforregional-levelalignments
CASE2RESULTS:CENvs.TZ
Figure 3. (Left) CEN-NDC taxonomy alignment problem with 49 input articulations between TCEN and TNDCFigure 4. (Right) The unique possible world (PW) T3 reconciling TCEN and TNDC via inferred relationships
Figure 1. National Diversity Council map (NDC) vs. Census Bureau map (CEN)
• Github link:https://github.com/EulerProject/ASIST17
• Email:[email protected]
West
Southwest Southeast
Midwest North-east
West
South
Midwest North-east
PacificMountain
CentralEastern
West
South
Midwest
North-east
RESEARCHDESIGN
Step1. SupplyinputtaxonomiesT1 andT2Step2.FormulateRCC-5articulationsbetweenT1 andT2Step3. IterativelyeditarticulationsinEuler/X
Y X X YX Y X Y X Y
CongruenceX == Y
InclusionX > Y
Inverse InclusionX < Y
OverlapX>< Y
DisjointnessX ! Y
T1 T2
T1 T2
Inconsistent (N=0) Ambiguous (N>1)
T3
Add/Edit Articulations A
Euler/X
N Possible Worlds
N=1 N=0 or N>1
R1
R2
R3
R4
R5
R6
R7
R8
R9
CEN.Midwest
CEN.USATZ.USA
CEN.West
CEN.NortheastTZ.Eastern\CEN.Midwest
TZ.Eastern\CEN.South
CEN.South
CEN.South*TZ.CentralTZ.Central\CEN.Midwest
CEN.South\TZ.Eastern
CEN.South\TZ.Mountain
TZ.Central
CEN.Midwest\TZ.Eastern
TZ.Mountain\CEN.SouthTZ.Mountain
CEN.Midwest\TZ.Mountain
TZ.Mountain\CEN.Midwest
CEN.Midwest*TZ.Mountain
CEN.Midwest\TZ.Central
TZ.Mountain\CEN.West
CEN.Midwest*TZ.Eastern
CEN.West*TZ.Mountain
CEN.South*TZ.MountainCEN.South\TZ.Central
TZ.Eastern
CEN.South*TZ.Eastern
CEN.Midwest*TZ.CentralTZ.Central\CEN.South
TZ.PacificCEN.West\TZ.Mountain
Nodes
CEN 4newComb 18comb 1TZ 4
Edges
input 6inferred 37
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.Northeast
TZ.Eastern
<
CEN.Midwest><
TZ.Mountain
><
TZ.Pacific
!
CEN.South
><
><
!
TZ.Central
><
CEN.USA
CEN.West
TZ.USA
==
!
><
!
Nodes
CEN 5TZ 5
Edges
isa_CEN 4isa_TZ 4Art. 12
CEN.Midwest
CEN.USATZ.USA
TZ.Eastern
TZ.Central
TZ.Mountain
CEN.South
CEN.Northeast
CEN.West TZ.Pacific
Nodes
CEN 4comb 1TZ 4
Edges
input 7overlapsinput 6overlapsinferred 1
inferred 1
R1 R2
R3
R4
R5
R6 R7
R8
R9
Figure 2. The process of aligning taxonomies T1 and T2 with Euler/X
Figure 5. Top-downinput alignments between TCEN and TTZ
Figure 6. The unique PW for the TCEN with TTZ alignment
Figure 10. Combined concepts solution for TCEN and TTZ
taxonomy CEN Census_Regions(USA Northeast Midwest South West)(Northeast CT MA ME NH NJ NY PA RI VT)(Midwest IL IN IA KS MI MN MO NE ND OH SD WI)(South AL AR DE DC FL GA KY LA MD MS NC OK SC TN TX VA WV)(West AZ CA CO ID MT NV NM OR UT WA WY)
taxonomy NDC National_Diversity_Council(USA Midwest Northeast Southeast Southwest West)(Northeast CT DC DE MD MA ME NH NJ NY PA RI VT)(Midwest IA IL IN KS MI MN MO ND NE OH SD WI)(Southeast AL AR FL GA KY LA MS NC SC TN VA WV)(Southwest AZ NM OK TX)(West CA CO ID MT NV OR WA WY UT)
articulations CEN NDC[CEN.AL equals NDC.AL][CEN.AR equals NDC.AR][CEN.AZ equals NDC.AZ][CEN.CA equals NDC.CA][CEN.CO equals NDC.CO][CEN.CT equals NDC.CT][CEN.DC equals NDC.DC][CEN.DE equals NDC.DE][CEN.FL equals NDC.FL][CEN.GA equals NDC.GA][CEN.IA equals NDC.IA][CEN.ID equals NDC.ID][CEN.IL equals NDC.IL][CEN.IN equals NDC.IN][CEN.KS equals NDC.KS][CEN.KY equals NDC.KY][CEN.LA equals NDC.LA][CEN.MA equals NDC.MA][CEN.MD equals NDC.MD][CEN.ME equals NDC.ME][CEN.MI equals NDC.MI][CEN.MN equals NDC.MN]...
Quick Scan!
taxonomy CEN Census_Regions(USA Midwest South West Northeast)
taxonomy TZ Time_Zone(USA Pacific Mountain Central Eastern)
articulations CEN TZ[CEN.Midwest disjoint TZ.Pacific][CEN.Midwest overlaps TZ.Eastern][CEN.Midwest overlaps TZ.Mountain][CEN.Northeast is_included_in TZ.Eastern][CEN.South disjoint TZ.Pacific][CEN.South overlaps TZ.Central][CEN.South overlaps TZ.Eastern][CEN.South overlaps TZ.Mountain][CEN.USA equals TZ.USA][CEN.West disjoint TZ.Central][CEN.West disjoint TZ.Eastern][CEN.West overlaps TZ.Mountain]
Ludäscher:Whole-Tale++ 48
Foranothertime?Non-unitary syntheses
of systematic knowledgeNico Franz
School of Life Sciences, Arizona State University
CIRSS Seminar – Center for Informatics Research in Science and Scholarship
February 17, 2017 – iSchool, University of Illinois Urbana-Champaign
@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge 49Ludäscher:Whole-Tale++
Tracingtaxonomicnames(concepts!)overtime…
Taxonomic concept alignment, Andropogon glomeratus-virginicus complex, spanning across 11 classifications authored 1889-2015
• 36 unique taxonomic names
• 88 taxonomic concept labelsÞ name sec. author strings
• Alignment by A.S. WeakleyÞ row position = congruence
• 1/36 names with unique 1 : 1name : meaning cardinalityacross all classifications
• Andropogon virginicus
• Source: Franz et al. 20161
1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex.Semantic Web Journal (IOS). doi:10.3233/SW-160220
http://taxonbytes.org/wp-content/uploads/2014/10/Peet-BIGCB-2014-Changing-Perspectives-on-Plant-Distributions.pdf51Ludäscher:Whole-Tale++
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"identify input concept regions
RCC–5 articulations providedfor each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
52Ludäscher:Whole-Tale++
• Alignment visualization: "grey means taxonomically congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
53Ludäscher:Whole-Tale++
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) arefully defined by alignment signal propagated from their respective children.
è Sensible when complete sampling of children is intended.
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
54Ludäscher:Whole-Tale++
1 in 3 names is unreliable across MSW2/MSW3 classifications
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
55Ludäscher:Whole-Tale++
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
"Controlling the taxonomic variable"
Expert viewsare in
conflict
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
56Ludäscher:Whole-Tale++
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
Impact:Name-based aggregation has created
a novel synthesis that nobody believes in
"Controlling the taxonomic variable"
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
57Ludäscher:Whole-Tale++
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
"Controlling the taxonomic variable"
"Just bad"
Expert viewsare
reconciled
Solution:Instead of aggregating
an artificial 'consensus',build translation services
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
58Ludäscher:Whole-Tale++
Leavingtaxonandspeciesheadaches…• ToillustrateEulerthinkofasimplerusecase:• Agreeingtodisagree!• …whentherearemultiple,legitimateperspectives
• Sortingthingsout!– Eulerasataxonconcept(&name)“microscope”...– ..or“timemachine”?
59Ludäscher:Whole-Tale++
TwoTaxonomies:NDC vs CEN
“…in the face of incompatible information or data structures among users or among thosespecifying the system, attempts to create unitary knowledge categories are futile. Rather, parallelor multiple representational forms are required” [Bowker & Star, 2000, p.159]
West
Southwest Southeast
Midwest North-east
West
South
Midwest North-east
NationalDiversityCouncilmap(NDC) USCensusBuero map(CEN)
Source:Yi-Yun(Jessica)Cheng(PhDstudent,iSchool @Illinois)Ludäscher:Whole-Tale++ 60
Thetaxonomies
Ludäscher:Whole-Tale++
• TheCensusRegionsMap(CEN),consistsoffour regions:West,Midwest,Northeast,andSouth,i.e.,thecontiguous48statesandWashingtonD.C.
West
South
Midwest
North-east
61
Thetaxonomies
• TheNationalDiversityCouncilMap(NDC),consistsoffiveregions:West,Southwest,Midwest,Northeast,Southeast,the48statesandWashingtonD.C.
NDC(withstates)
West
Southwest Southeast
Midwest North-east
• NDC splits South into SW and SE
• Do NDC and CEN agree on “West”? “Midwest”? …
• How can we sort this out?
Ludäscher:Whole-Tale++ 62
Sortingthingsout…
Ludäscher:Whole-Tale++
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.South
NDC.Northeast
o
NDC.Southwest
o
NDC.Southeast>
CEN.Midwest NDC.Midwest=
CEN.USA
CEN.West
CEN.NortheastNDC.USA
=
!
oNDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
• Given:– taxonomiesT1,T2– andrelationsT1~T2
(articulations,alignment)• Find:
– mergedtaxonomyT3• Suchthat:
– T1,T2arepreserved– allpairwiserelationsare
explicit
T1 T2
63
5waystorelateconcepts(regions)
• Idea:relateconceptsXandYwitharticulations
• ArticulationLanguage:RegionConnectionCalculus (RCC5):congruence,inclusion,inverseinclusion,overlap,disjointness
Y X X YX Y X Y X Y
CongruenceX == Y
InclusionX > Y
Inverse InclusionX < Y
OverlapX>< Y
DisjointnessX ! Y
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest NDC.Midwest==
CEN.USA
CEN.West
CEN.NortheastNDC.USA
==
!
><NDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
Ludäscher:Whole-Tale++ 64
MergedtaxonomyT3
CEN.South
NDC.Northeast
NDC.Southwest
CEN.USANDC.USA
CEN.West
CEN.Northeast
NDC.Southeast
NDC.West
CEN.MidwestNDC.Midwest
Nodes
CEN 3NDC 4
congruent 2 Edges
is_a (input) 8overlaps (input) 3
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest NDC.Midwest==
CEN.USA
CEN.West
CEN.NortheastNDC.USA
==
!
><NDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
T1 T2
T1~T2 T3
Ludäscher:Whole-Tale++ 65
HowwealigntwotaxonomiesT1andT2
• Step1. SupplyinputtaxonomiesT1andT2
• Step2.DescribetherelationshipsbetweenT1 andT2
• Step3. IterativelyeditarticulationsinEuler/X
T1 T2
T1 T2
Inconsistent (N=0) Ambiguous (N>1)
T3
Add/Edit Articulations A
Euler/X
N Possible Worlds
N=1 N=0 or N>1
• … but where do the articulationscome from??– expert opinion– automatically derived from data
Ludäscher:Whole-Tale++ 66
Case1:CensusRegionvs.NationalDiversityCouncil
Ludäscher:Whole-Tale++
West
South
Midwest
North-east
NDC(withstates)
West
Southwest Southeast
Midwest North-east
CEN NDC
• … but where do the articulationscome from??– automatically derived from data– expert input
67
Ludäscher:Whole-Tale++
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
68
Ludäscher:Whole-Tale++
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
USA,MidwestandState-levelalignmentsareallcongruent
69
Ludäscher:Whole-Tale++
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
Theoverlappingrelationsareautomaticallyderivedfromdata
70
Ludäscher:Whole-Tale++
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
DCisinboththeSouthandtheNortheast
71
Case2:CensusRegionvsTimeZone
Ludäscher:Whole-Tale++
PacificMountain
CentralEastern
West
South
Midwest
North-east
CEN TZ
• … but where do the articulationscome from??– automatically derived from data– expert input
72
Ludäscher:Whole-Tale++
CEN.Northeast
TZ.Eastern
<
CEN.Midwest><
TZ.Mountain
><
TZ.Pacific
!
CEN.South
><
><
!
TZ.Central
><
CEN.USA
CEN.West
TZ.USA
==
!
><
!
Nodes
CEN 5TZ 5
Edges
isa_CEN 4isa_TZ 4Art. 12
CEN.Midwest
CEN.USATZ.USA
TZ.Eastern
TZ.Central
TZ.Mountain
CEN.South
CEN.Northeast
CEN.West TZ.Pacific
Nodes
CEN 4comb 1TZ 4
Edges
input 7overlapsinput 6overlapsinferred 1
inferred 1
InputOutput:PossibleWorld
Top-downregionalalignment
73
Howdoweknowifour‘expertarticulations’arecorrect?
Ludäscher:Whole-Tale++
R1 R2
R3
R4
R5
R6 R7
R8
R9
GIS solution as the Ground Truth..
74
Ludäscher:Whole-Tale++
R1
R2
R3
R4
R5
R6
R7
R8
R9
CEN.Midwest
CEN.USATZ.USA
CEN.West
CEN.NortheastTZ.Eastern\CEN.Midwest
TZ.Eastern\CEN.South
CEN.South
CEN.South*TZ.CentralTZ.Central\CEN.Midwest
CEN.South\TZ.Eastern
CEN.South\TZ.Mountain
TZ.Central
CEN.Midwest\TZ.Eastern
TZ.Mountain\CEN.SouthTZ.Mountain
CEN.Midwest\TZ.Mountain
TZ.Mountain\CEN.Midwest
CEN.Midwest*TZ.Mountain
CEN.Midwest\TZ.Central
TZ.Mountain\CEN.West
CEN.Midwest*TZ.Eastern
CEN.West*TZ.Mountain
CEN.South*TZ.MountainCEN.South\TZ.Central
TZ.Eastern
CEN.South*TZ.Eastern
CEN.Midwest*TZ.CentralTZ.Central\CEN.South
TZ.PacificCEN.West\TZ.Mountain
Nodes
CEN 4newComb 18comb 1TZ 4
Edges
input 6inferred 37
Combinedconceptssolutionforregional-levelalignments
75
DothetaxonomieshavetobespatialinordertouseRCC-5?
• No!Themoretypicalcasesfortaxonomyalignmentareusuallybetweennon-spatialtaxonomies– forwhichno“GISroute”ordirectvisualcuesaboutregionalextensionsareavailable
– theuseofRCC-5asanalignmentvocabularyisasuitableapproachtoperformawiderangeofmulti-hierarchyreconciliations
Ludäscher:Whole-Tale++ 76
Conclusion&Discussion• Underscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottom-upvs.Top-Down)– Bottom-up:non-overlappingrelationshipsatthelowest-levelarticulations,notsurehowtoalignthehigher-levelconcepts
– Top-Down:whenthereisoftenoverlappingleaf-levelrelations..Expertinputwillfrequentlybeneededtoestablishsuchexpectationsunderthetop-downapproach
Ludäscher:Whole-Tale++
https://github.com/EulerProject/[email protected]
77
Implications
• Logic-basedtaxonomyalignmentapproach– Disambiguatename-basedtaxonomyalignmentovertime
• 40%oftheconceptsinbiologytaxonomiesundergoesnamechangeovertime(Franzetal.,2016)
– Maymitigateproblemsinequivalentcrosswalking• Membershipconditionproblemthatwasoftencriticizedincrosswalking
– Preservestheoriginaltaxonomieswhileprovidinganalignmentview
• Solvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking
Ludäscher:Whole-Tale++
https://github.com/EulerProject/[email protected]
78
• …Aristotle…• …Euler…• …• …GregWhitbread…
• [BPB93]J.H.Beach,S.Pramanik,andJ.H.Beaman.Hierarchictaxonomicdatabases.,Advances inComputerMethodsforSystematicBiology:ArtificialIntelligence,Databases,ComputerVision,1993
• [Ber95]WalterG.Berendsohn.Theconceptof“potentialtaxa” indatabases.Taxon,44:207–212,1995.
• [Ber03]WalterG.Berendsohn.MoReTax – HandlingFactualInformationLinkedtoTaxonomicConceptsinBiology.No.39inSchriftenreihe fürVegetationskunde.Bundesamt für Naturschutz,2003.
• [GG03]M.Geoffroy andA.Güntsch.Assemblingandnavigatingthepotentialtaxongraph.In[Ber03],pages71–82,2003.
• [TL07]Thau,D.,&Ludäscher,B.(2007).Reasoningabouttaxonomiesinfirst-orderlogic.EcologicalInformatics,2(3),195-209.
• [FP09]Franz,N.M.,&Peet,R.K.(2009).Perspectives:towardsalanguageformappingrelationshipsamongtaxonomicconcepts.SystematicsandBiodiversity,7(1),5-20.
• … 79
SomeEulerXHistory
Ludäscher:Whole-Tale++