resource curation and automated resource discovery

19
Resource Curation and Automated Resource Discovery

Upload: august-reeves

Post on 13-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Resource Curation and Automated Resource Discovery

Resource Curationand

Automated Resource Discovery

Page 2: Resource Curation and Automated Resource Discovery

NIF Resources

• NIF is cataloging websites that house information about databases, atlases, software tools, data, transgenic mice and other things that we consider of value to the neuroscience community.

Page 3: Resource Curation and Automated Resource Discovery

Definition of Resource

• Individual resource boundary: shall be considered an individual resource if it is maintained by a single entity, and has the properties of one or more individual web pages that are related by a theme and html links.

Page 4: Resource Curation and Automated Resource Discovery

Resource Nomination

Registry(4500)

Public Registry(2100)

NIF Web(499,952)

Level 2/3(24)

User Feedback*Automated tools Web

Crawl

RegistrySubset

Nomination

Check: -Links

-Annotation-Vocabulary

*Automated updates Level 2 tools

*In Development

Page 5: Resource Curation and Automated Resource Discovery

Resource is NominatedNIF Staff, Contact at Meetings, Web Form

In NIF already?

Assign Metadata-short name, long name, url

-description (short description 1-3 sentences, longer description)-parent organization (physical location, university)

-support (grant numbers)-keywords (species, technique, structure, age, level, disease, topic)

Decision: Should it be included?

Assign resource type

Do not includeKeep Record

Page 6: Resource Curation and Automated Resource Discovery

Resources Difficult to Categorize• Link aggregates• Large organizations (NIH)• Poorly documented databases• Private data sites• Clinical trials that are still recruiting 

– Experimental protocol 

• Commercial entities• Journals

– JOVE– supplemental materials

Page 7: Resource Curation and Automated Resource Discovery

CINdy the resource curation tool

Page 8: Resource Curation and Automated Resource Discovery
Page 9: Resource Curation and Automated Resource Discovery

Resource Ontology (BRO)• Data Resource: provides access to data;

database, atlas, book• Software Resource: software programs or

source code• Material Resource: reagents, tissue samples or

organisms• Funding Resource: grants or contracts• Training Resource: educational materials,

training programs• Job Resource: employment opportunities• People Resource: access to individual people’s

web sites

Page 10: Resource Curation and Automated Resource Discovery

NIF Service vs BRO Service

Page 11: Resource Curation and Automated Resource Discovery

Solutions Consolidating Classes• Synonyms where appropriate: ex. Material

storage service vs. Material storage repository.

• Temporary mapping, where appropriate– *Deprecated terms must be maintained*

• Data loss

• Moving forward with a joint descriptive terminology!

Page 12: Resource Curation and Automated Resource Discovery

Evolution of the NIF Resource Ontology

Object Function Target Audience

Data Type Data Format

Materials -Biomaterials -Reagents

Software

People

Grants

Jobs

Information

Service -Storage -Production

Funding

Job Service

Community-building

General

Kids

Student

Medical

Researcher

Structured -Database -Atlas

Unstructured -Journal -Webpage

Text

RDF Text

Picture

Video

Page 13: Resource Curation and Automated Resource Discovery
Page 14: Resource Curation and Automated Resource Discovery

Resource Boundary?• Software Library

– Software tool• Plugin: I2B2

• Our solution: use url as a uniqueness qualifier– Our problem: a single url may house several

resources– Individual plugins can have individual urls

Page 15: Resource Curation and Automated Resource Discovery

Boundary cont.• Individual resource boundary: shall be

considered an individual resource if it is maintained by a single entity, and has the properties of one or more individual web pages that are related by a theme and html links.

• Solution to random boundary problem:

Human Curator

Page 16: Resource Curation and Automated Resource Discovery

Issues of Scope• Single line or short paragraph + keywords

– Resource discovery problem

*Stanford ontologies description is very short (as are many) finding this resource by keyword will be difficult unless we index the content of the website.

• Data dump– Small vs. Large databases– Updates

Page 17: Resource Curation and Automated Resource Discovery

Internal referencing• Stanford example:

– License: “same as bioportal” – does not match any license types in any list.

– Problem: non standard terminology, reference to another project (no url), can create loops • also true in publications: ex., used same protocol

as paper X, which used the same protocol as paper Y

– Automated text mining tools have a hard time recognizing these

Page 18: Resource Curation and Automated Resource Discovery

What can we gain from automated systems?

• Basic information: Name, url, contact info

• Some keywords• Some descriptive text

• No resource boundary• No resource description

Page 19: Resource Curation and Automated Resource Discovery

How do we help the computers?

• Common naming project (neurocommons)

http://sharedname.org/page/Main_Page• Automated uri’s• Community building:

– Shared data models– Shared ontology– RDF entity tags? (mouse vs mouse)