biodiversity information standards: are we going wrong, or just not quite right?
DESCRIPTION
Biodiversity Information Standards: are we going wrong, or just not quite right?. Jim Croft Australian National Herbarium. Australian National Herbarium Centre for Plant Biodiversity Research Australian National Botanic Gardens Parks Australia Taxonomy Research and Information Network - PowerPoint PPT PresentationTRANSCRIPT
Biodiversity Information Standards: are we going wrong, or just not quite
right?
Jim CroftAustralian National Herbarium
Australian National Herbarium
Centre for Plant Biodiversity Research
Australian National Botanic Gardens
Parks Australia
Taxonomy Research and Information Network
Parks Australia
Department of the Environment, Water, Heritage and the Arts
TDWG IN AUSTRALIA
Hobart
Devonport
Launceston
Adelaide
Perth
Melbourne
Hobart
Launceston
Townsville
Devonport
Armidale
Darwin
BrisbaneLismore
Orange
SydneyCanberra
Adelaide
Perth
INSTITUTIONS – Northern TerritoryDarwin
Maroochydore
Gosford
v Australian National Insect Collection (CSIRO)v Australian National Herbarium (CSIRO)v Australian National Wildlife Collection (CSIRO)v GAUBA Herbariumv Australian Biological Resources Study
INSTITUTIONS – Queensland
TDWG in Australia
Alice Springs
Australian examples
• Australian Plant Name Index– Australian Plant Census
• Australian Fauna Directory• Australia’s Virtual Herbarium• Online Zoological Catalogue of Australian Museums
• Flora of Australia On-line• Atlas of Living Australia• Identify Life• Taxonomy Research and Information Network
Australian examples
• Australian Plant Name Index– Australian Plant Census
• Australian Fauna Directory• Australia’s Virtual Herbarium• Online Zoological Catalogue of Australian Museums
• Flora of Australia On-line• Atlas of Living Australia• Identify Life• Taxonomy Research and Information Network
HISCOM
• Herbarium Information Systems Committee– Representatives at TDWG 2008
– Ben Richardson, Alex Chapman (PERTH)– Bill Barker (AD)– Alison Vaughan (MEL)– Karen Wilson (NSW)– Donna Lewis (DNA)– Jerry Cooper (CHR, NZ)– Helen Thompson (ABRS)– Greg Whitbread, Jim Croft (CANB)
– The crucible of biodiversity informatics creativity
TDWG principle # 0
• A good idea has a thousand fathers
• A bad one is a bastard
TDWG: making anarchy chaos the standard
TDWG principle # VI-a
“Before the beginning of great brilliance, there must be chaos.
Before a brilliant person begins something great, they must look foolish in the crowd.”
- I Ching
TDWG: the art of herding cats
TDWG: changing standards, or making change the standard?
TDWG: Standardizing stuff...
orstuffing standards?
Outline
• What is TDWG?• TDWG and ‘Standards’• Where TDWG Standards are needed• Some TDWG projects• TDWG Standards compliance• Tensions for TDWG• Future
WHAT IS TDWG?
TDWG Mission
• Develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms
• Promote the use of standards through the most appropriate and effective means and
• Act as a forum for discussion through holding meetings and through publications
TDWG Mission
• Develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms
• Promote the use of standards through the most appropriate and effective means and
• Act as a forum for discussion through holding meetings and through publications
Who are we?
‘TDWG is us’
Who are we?
• Intersection of specimens, taxonomy, knowledge, information management
• Biologists, taxonomists, computer scientists– Each with an interest in the other’s domains
– Each with something to offer each other’s domains
Who are we?
• If TDWG did not exist, we would have to invent it
• Successful– Enduring– Popular– Moderately well recognized
When are we?
• Phases of TDWG– Phase 0 (1985)
• seemed like a good idea at the time– phase 1 (first decade)
• Data dictionaries, data models– phase 2 (second decade)
• E-R models, DIGIR, DwC, XML, etc.– phase 3 (nowish)
• Schemas, ontologies, RDF– Phase 4 (?)
• ?
Why are we?
• Collaboration and sharing is essential– Taxonomy has become too big– Too diverse– Too complex– No one person can do it all– A ‘complete’ treatment requires collaboration
– Collaboration requires consistency, standards
** notes **Biodiversity Tower of Babel
Why are we?
• Untangle the ‘biodiversity Babel’
• Develop common communication
• Harness efficiency of collaboration
• Economic pressures to reduce duplication
Why are we?
• Science of information meets science of information technology
• Take advantage of new technology
• Taxonomy needs to be seen to be evolving
• “Business as usual is not an option”
Why are we?
• An annual excuse to meet in warm places when it is cold elsewhere?
Where do we fit?
xkcd.com
taxonomistscomputerists
TDWGinformaticists
Where have we come from?
• Frustrated taxonomists– Looking for a better way– Largely self taught
• Bored computer scientists– Looking for excitement, challenge
• Misfits and visionaries– In search of a ‘Brave New World’
• Egomaniacs– In search of glory, fame, power, riches
What are we now?
• Frustrated taxonomists– Looking for a better way– Largely self taught
• Bored computer scientists– Looking for excitement, challenge
• Misfits and visionaries– In search of a ‘Brave New World’
• Egomaniacs– In search of glory, fame, power, riches
Where are we going?
?
Where are we going?
• Did we go wrong?– Where did we go wrong?– Why did we go wrong?
• Lost the plot?– Regain credibility?
• Our community?• Our funders?• Ourselves?
Where are we going?
• Perceptions of TDWG?– First decade
• Taxonomists organizing their domain• Content focused• Understandable by taxonomists
– Second decade• Taxonomists reaching limitations• Engaging technologists• Protocol and systems focussed• Opaque to taxonomists
– Third decade?
Where are we going?
• Perceptions of TDWG?– First decade
• Content• Data dictionaries• Lists, vocabularies
– Second decade• Protocols• Formats, structure • Applications
– Third decade?• Ontologies?• Semantics?
Where are we going?
• What should TDWG be about?
– The data?
– The technology?
– The applications?
– The community?
TDWG Impediments
• Resources, funds• Time• Impetus, will, drive• Complexity, domain knowledge• Conservatism• Rivalry• Intellectual property, revenue advantage
THE TDWG VISION
A vision for TDWG
• Our domain in biodiversity?– Taxonomy?– Systematics?– Collections?– Biodiversity?– Publications?– Knowledge Management?– Knowledge discovery?
– All of the above?
A vision for TDWG
• Our Community?– Herbaria and museums?– Researchers?– Government and policy?– Conservation agencies? NGOs?– Natural resource management?– Education?– Public?
– All of the above?
A vision for TDWG
• Our questions?– What is it? How can I find out?– What does it look like?– Where does it occur?– Was it still there? When?– What occurs there with it?– What might occur there with it?– What is it related to?– Who says so?– How? Why?
– All of the above?
A vision for TDWG
• Our Products?– Data content standards?– Data storage standards?– Data communications protocols?– Data management applications?– Data management infrastructure?– Data visualization applications?– Data analysis applications?
– All of the above?
Knowledge pyramid
The Real World
DataInformation
Knowledge
Samples
Wisdom
TDWG AND STANDARDS
What is a standard?
• In common English:– A flag– An upright pole or beam– A backing for currency– American automobile– A bush on a long stalk– An ideal to be judged against– Model of authority or excellence– A basis for comparison– 1,980 board feet of wood– A newspaper– An established norm
What is a standard?
• Rarely implies:– Requirement– Obligation– Compulsion– Compliance– ‘The law’
• But not so ‘technical standards’– Specify behaviour– Mandate behaviour
What is a standard?
• “an explicit set of requirements to be satisfied by a material, product, or service”
- (ATSM International)
TDWG STANDARDS
TDWG Standards categories
• Technical specification (TS) (3)– Protocol, service, procedure, format
• Applicability statement (AS) (1 draft)– How a tech. spec. might be applied
• Best current practice (BCP) (0)– A description of good behaviour
• Data standard (DS) (0)– Content or controlled vocabularies
TDWG Standards status
• Current standard– (3)
• Current 2005 Standard– (3?)
• Draft Standard– (3)
• Prior Standard– (7 tech specs; 6 data standards)
• Retired Standard– (0)
THE STANDARDS PROCESS
ISO Standards process
• ISO standards are:
– Consensus– Industry wide– Voluntary
ISO Standards process
• 0 preliminary– Study period underway
• 1 proposal– New project under consideration
• 2 preparatory– Working draft(s) under consideration
• 3 committee– Committee draft(s) under consideration
• 4 approval– Final draft standard under consideration
• 5 publication– Standard prepared for publication
TDWG Standards process
• TDWG standards are:
– Consensus– Community wide (+/-)– Voluntary
TDWG Standards Process
TDWG STANDARDS PRESENT
TDWG standards – present
• ABCD– Access to biological collections data
• SDD– Structured Descriptive Data
• TCS– Taxon Concept Schema
Not bad for 22 years work...
TDWG STANDARDS PAST
TDWG standards - past
• ‘Prior Standards’
• Technical Specs (protocol stds):– HISPID 3 (now on v.5)– POSS (Plant Occurrence and Status)– Economic Botany Data Collection Std– Plant Names in Botanical Databases– XDF – language for definition and exchange– ITF – Botanic Gardens Records– DELTA
TDWG standards - past
• ‘Prior Standards’
• Data standards (Content stds)– Authors of Plant Names– World Geographic Scheme for Plant Distributions
– Botanico Periodicum Huntianum– Index Herbariorum– Floristic Regions of the World– TL2 – Taxonomic Literature and suppl.
TDWG STANDARDS FUTURE
TDWG standards – future
• ‘Draft standards’– Real soon now
• Standards documentation spec.– The standard way to do standards
• LSID Applicability Statement– How to do LSIDs
• NCD– Natural Collections Description
TDWG standards – future
• Watch this space?
• Observation data– Occurrence without specimens?– Ecological metadata language
• Phylogenetics data– Phylogeny repositories– Trees of life– Phylocode
TDWG standards – future
• Watch this space?
• SPM – Species Profile Model– Online Journals; On-line Floras– Interactive Keys
• Images and multimedia
• Ethnobotany ontology
TDWG standards – future
• How are we going to manage this?– Activities straddle many standards– Potential for duplication, conflict
• Technical Architecture Group– Ontologies– Vocabularies– Conflict identification, resolution– Evaluation, advice, recommendations
WHERE TDWG STANDARDS ARE NEEDED
Where are TDWG standards needed?
• Nomenclature• Taxonomy• Bibliographic• Specimens• Identification• Description• Images• Multimedia
• Occurrence• Spatial• Observation• Molecular• Phylogeny• People• Institutions• etc.
Where are TDWG standards needed?
• The problem:
• TDWG activities have been activity and discipline based– ABCD as an example
• Names, taxa, specimens, places, people, etc.
• Need to look at data from an ontological perspective– Data based
• Not activity based
TDWG – the 3-legged stool
• (definition of ‘stool’?)
• GUIDs• Ontologies• Exchange protocols
TDWG – the 3-legged stool
• Management cliche
• Planning• Money• Management
---• Production• Marketing• Administration
---• etc
TDWG – the 3-legged stool
TDWG STANDARDS COMPLIANCE
TDWG standards compliance
• Pretty poor– Within institutions / projects– Between institutions / projects
• Partial compliance is not compliance
• Enhancement is not compliance
• Extension is not compliance
TDWG standards compliance
• Why not?– Too complicated?– Inappropriate?– Deficient?– Too costly to implement?
– Conservatism?– Apathy?– Individual arrogance?– Institutional arrogance?
TDWG standards compliance
• Need for stability
• TDWG has a reputation– Pursuing the ‘bleeding edge’– “Keeping up with the Jones’s”– Introducing new recommendations before old ones settled
– Frustrating users• Especially smaller institutions
TDWG standards compliance
• Total cost of ownership– Ultra technical solutions
• Rare specialist skills• Expensive contractors
– Maintenance costs– Upgrade costs– Migration costs
– Users get stuck
TDWG standards compliance
• What can be done?– Rationalization of standards?– More control of standards process?– Seek ‘appropriate technology’?
• Not necessarily the best– Seek cheaper solutions?– Focus on the ontologies, not activities?
– Apply institutional pressure?– Institutional mentorship and support?
THE TENSIONS FOR TDWG
Tensions in TDWG
• Taxonomy / technology• Innovation / stability• Innovation / conservatism• Names / taxonomy• Names / specimens• Names / names• Authority / credit• Ownership / responsibility• Data / metadata
Why not?
• Why not web 2.0 / 3.0?
• Why not annotations?
• Why not Wikipedia?
• Why not microformatting?
Disconnects
• Free access / ownership– Licensing, attribution, IP, credit
• Taxonomy / specimens– The big lie
• Concepts / names– Another big lie
• Linking taxa through basionyms– Another big lie
• Data / metadata• Distributed systems vs cache
Metadata
• So-called ‘data about data’
• “One man’s data is another’s metadata”
• Not a good or inspiring look
• Need a common and agreed understanding in TDWG domain
Metadata
• Problem of LSID byte persistence– Applies to data– Does not apply to metadata– Redefine data as metadata?– Sophistry?– Distorting our ontologies?
• Need to sort this out• Need to communicate the result
Metadata
Yesterday upon the stairMetadata wasn't thereIt wasn't there again todayHow I wish it would go away
The 3 big lies
• Names and specimens– That there is some real connection between specimens bearing the same name
– That distribution maps of specimens bearing the same name are meaningful
– That identifications bearing the same name represent the same taxon
– The ‘taxon concept problem’– Concept not explicit
The 3 big lies
• Names and concepts– That names somehow imply an unambiguous taxon concept
– That a taxon concept can be inferred from a name
– An assumption
– The ‘taxon concept problem’– Concept not explicit
The 3 big lies
• Names and types– That if we are talking about names based on the same type they are the same taxon concept
– That lists of names and synonyms based on the same type can be automatically merged
– The ‘taxon concept problem’– Concept not explicit
The 3 big lies
• What can we do?– Taxon reporting not unambiguous– Our results are at best indicative
• Users assume or infer concepts– Perhaps biggest problem in taxonomy and biodiversity informatics
– Be absolutely rigorous in talking about names and named concepts
– Educate taxonomists– Educate clients
• Limitations of data, applications• Implications of using data, limitations
TDWG value for money
• Are we worth it?– This meeting cost c. $ 1,000,000
• Airfares, accommodation, salaries, etc.– What did we accomplish?
• Tangibles?• Intangibles?
– What have we produced so far?• 3 standards, several +/- standards• Compliance?• A ‘state of mind’?
TDWG value for money
• Can we do it better?– Can we do it cheaper, faster?
• Use the wiki/listserv better– Accomplish more?
• New standards• Better standards
– Produce more?• New standards?• Retire standards?• Rationalize standards?
WHERE TO FROM HERE
Where to from here?
• Tools at our disposal– TWDG Executive– Technical Architecture Group– TDWG working groups– On-line forums, lists– Web and Wiki– On-line Journal
Where to from here?
• Increase TDWG Profile– ‘Market penetration’– Greater implementation, compliance– Attention to smaller institutions
• ‘the long tail’– Multilingual standards
– Strengthen partnerships, collaboration• GBIF, EoL, etc.• National initiatives
Where to from here?
• TAG– Coordination of standards– Ontologies– Resolve metadata issues– Retire or deprecate standards
• ‘Us’– Participation– Implementation– Compliance
Where to from here?
xkcd.com
TDWG – a glass half full
• TDWG has a lot to do• But it has accomplished a lot• Without the foundation of TDWG there could be:– No AVH– No ALA– No GBIF– No EoL– No [name your biodiversity acronym]
TDWG – a glass half full
• TDWG has strong participant support– C. 200 participants in TDWG 2008
• Key institutional engagement– International– National – Regional – Local
• Increasing demand for products– Global change, habitat depletion, etc.
TDWG Mission
• Develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms
• Promote the use of standards through the most appropriate and effective means and
• Act as a forum for discussion through holding meetings and through publications
** notes **
TDWG?