ubio presentation to jim edwards 2006

37
universal Biological indexer and organizer 1 New Dimensions in Managing Biological Information @ the MBLWHOI LIBRARY David Remsen June 27, 2006

Upload: david-remsen

Post on 13-Apr-2017

41 views

Category:

Science


2 download

TRANSCRIPT

Page 1: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 1

New Dimensions in Managing Biological Information @ the MBLWHOI LIBRARY

David Remsen

June 27, 2006

Page 2: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 2

All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.

- Grimaldi & Engel, 2005, Evolution of the Insects

Page 3: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 3

a name that serves as a link between what has been learned in the past

From T.E. Glover, The Fishes of Southwestern Japan, c.1870

Page 4: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 4

Universal Biological Indexer and OrganizerResearch Funded by the Andrew W. Mellon Foundation

MBL / WHOI LIBRARY

…and what we today add to the body of knowledge.

Page 5: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 5

Page 6: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 6

The challenge of names as keywordsFinding this…

Type keyword…

With this…

Page 7: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 7

Names – the only universal metadata for Biology

Names offer a logical way to search for and index content

• Names annotate data objects• All names annotate all data objects• A compilation of all names ever used is the foundation of a universal index for biology• or for a semantic web for biology

Page 8: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 8

• Many names refer to one concept• Vernacular concept• Lexical or Nominal synonym• Nomenclatural synonym• Taxonomic Synonym

• Single name refers to many concepts• Homonyms• Taxonomic concepts• Vernacular concepts• Taxonomic Groups/Classifications

The Taxonomic Names Problem in Biology

Page 9: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 9

Many to One: Vernacular Concepts

• Equivalence implicit through co-occurrence

Page 10: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 10

Many to One: Lexical Synonyms

Many to One: Nomenclatural Synonyms

Page 11: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 11

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Retention of lexical & nomenclatural variation

Loligo pealeiiLoligo pealiiLoligo pealei

Doryteuthis pealei

Page 12: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 12

Peranema – the fern

One to Many: Homonyms

Peranema – the euglenid

Page 13: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 13

Universal Biological Indexer and OrganizerResearch Funded by the Andrew W. Mellon Foundation

MBL / WHOI LIBRARY

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Taxonomic Concept

Page 14: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 14

LibrariesPublishers

MuseumsFederal Agencies

Name IR impediments in current systems: NLM, JSTOR

Page 15: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 15

Name IR impediments in current systems: OBIS

One organism

4 scientific names

4 maps

We want one map

Page 16: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 16

• Basis for Relationships: Facts• Vernacular concept• Lexical or Nominal synonym• Nomenclatural synonym• Homonyms

• Basis for Relationship: Opinion• Taxonomic Synonym• Vernacular concepts• Taxonomic Groups/Classifications

Division of Concepts

Page 17: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 17

Lexical SynonymsNomenclatural SynoymsVernacular Names

Taxonomic HierarchiesTaxonomic Synonyms

Primary Components of uBio

Indexes to content

Indexes to taxonomic views

Page 18: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 18

NameBank: An index of names and sources

Page 19: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 19

ClassificationBank

An index of taxon concepts

Page 20: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 20

Universal Biological Indexer and OrganizerResearch Funded by the Andrew W. Mellon Foundation

MBL / WHOI LIBRARY

Fitting In

Page 21: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 21

Fitting In: A datacentric perspective

Page 22: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 22

Network Service :Attribution

• Every datum sent out via service is logged– nameBankID– datestamp– Client IP– Calling method– requestorIP

• <client optional>

Page 23: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 23

Universal Biological Indexer and OrganizerResearch Funded by the Andrew W. Mellon Foundation

MBL / WHOI LIBRARY

Tools and Applications: FindIT

• Is trainable

• Locates names & authorities

• Finds names it doesn’t know

• Finds names mangled by OCR

Page 24: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 24

Universal Biological Indexer and OrganizerResearch Funded by the Andrew W. Mellon Foundation

MBL / WHOI LIBRARY

Tools and Applications: LinkIT

Page 25: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 25

Applications

Page 26: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 27

Taxonomic intelligence applied to search

Synonymies expand the scope of queries

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 27: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 28

uBioRSS: Embedding taxonomies into literature retrieval

Page 28: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 29

Embedding uBio into remote services: uBioRSS

Page 29: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 30

Taxonomic hierarchies enhance data browsing

• Birds of the Belgian Congo

• 4500 pages• One page has a

species of dipteran• How would someone

interested find it?• 50,000+ Diptera

species to choose from

Both enhancements apply to all name-annotated content

Page 30: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 31

uBio Portal: Building communities, enabling connections

Page 31: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 32

Elements of the PortalIndexing power from NameBank

Page 32: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 33

Alternative names

Vernacular names

Expert view

More or less specific

Suggestions & corrections

Indexing power from NameBank

Page 33: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 34

Results from an array of resources

Page 34: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 35

Additional information from specific projects

Page 35: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 36

content certified linkouts to authoritative resources

XML source

Additional information from specific projects

Page 36: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 37

Text source

Additional information from specific projects

Page 37: uBio presentation to Jim Edwards 2006

universal Biological indexer and organizer 38

• data from various sources may be merged

• red dots on the maplink back to the website thatprovided the geographical co-ordinates

Specimen distribution data from remote sources