1/8/2003© bradley hemminger unc-ch neoref the knowledge management system of the future bradley...

42
1/8/2003 © Bradley Hemminger UNC- CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University of North Carolina, Chapel Hill http://ils.unc.edu/bioinfo

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

NeoRefthe knowledge management

system of the future

Bradley Hemminger

School of Information and Library Science, University of North Carolina, Chapel Hill

http://ils.unc.edu/bioinfo

Page 2: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

What if….

• Publishing material was a simply as formatting it in a standard format like PDF or JPEG and clicking a button to submit it to an archive and/or journal? Your article gets to keep all its links to color pictures and graphs, even dynamic graphs or videos. Just like you originally produced them.

• Power tools for publishing

Page 3: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

What if…

• You could name the subjects you were interested in, and you would instantly be given a list of all articles on that topic, and anytime a new one was published, you would receive a link to it? It would automatically be added to your Reference Database so you could add a citation with a click.

• Universal archive (OAI), controlled vocabulary indexing and retrieval

Page 4: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

What if

• You could search for any arguments in any literature, public comment, review, or database that related to your new research proposal or paper?

• controlled vocabulary indexing and retrieval, full support of Dublin Core and qualifiers.

• Examples: – What genes are linked with causing schizophrenia?– What articles disagree with the claims in my research

proposal?

Page 5: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

What if

• You are able to filter the 1000 articles you received from the search “breast cancer” and “smoking” in PubMed so that you got only the 31 articles specifically referring to clinical studies establishing whether smoking was a causal factor of breast cancer?

• controlled vocabulary indexing and retrieval, full support of Dublin Core and qualifiers, extensions to domain specific, and “concepts/claims”.

Page 6: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Design for Open Archives

MIT Dspace

UNCDuke

NC State

Stanford Digital Library

University of Washington

Cornell ArXiv OAI harvester

USER

contributor

Page 7: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Methodology

• DCI = Digital Content Items

• DOI = Digital Object Identifiers

• OAI = Open Archive Initiative

Page 8: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

NeoRef Methodology

• Anyone can submit a DCI• All submitted DCIs, regardless of type (journal

article or Joe Bob’s comments), receive DOIs and are stored on one or more OAI archives.

• Authors provide initial metadata with submission.• Articles reside on one or more physical archives.

All archives together operating under OAI, form one logical universal archive that is harvestable and searchable by one interface.

Page 9: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

NeoRef Methodology cont’d

• Standardize domain extensions in science and medicine via extending Dublin Core Subject encoding schemes to include GO (Gene Ontologies), etc.

• Extend types in Dublin core metadata to finer granularity, specifically “concepts” and “claims”.

• Add more structure to what’s indexed—instead of narrative descriptions only (journal articles) allow more structured, logic-based statements.

Page 10: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

DCI Example Types

– Journal articles

– Books

– Research notes

– Genetic Sequence data

– Concepts

– Abstracts

– Indexing

– Reviews

– Claims

Page 11: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Methodology, Metadata

• Metadata (Dublin Core) is required for DCI Submissions.

• Date ISO YYYY-MM-DD• Format: CV Internet media type, or extension of.• Resource ID: URI (DOI, URL, ISBN)• Language: ISO 639 (Internet RFC 1766: en, fr, ..)• Creator: Name• Publisher: Name

Page 12: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Methodology, Metadata

• Contributor: Name• Rights Management: CV for intellectual property

rights, or IP text• Title: Text• Subject and Keywords: author provided CV from

LCSH, DDC, UDC, LCC, MeSH, GO, etc, and/or free text.

• Description: author provided free text abstract

Page 13: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Methodology, Metadata

• Resource Type: CV from DCMI Type Vocabulary or similar (we may need to extend).

• Source CV reference to another resource this is based on (URI).

• Coverage: CV from Thesaurus of Geographic Names, or similar. (spatial or temporal mainly)

• Relation: CV from Dublin Core, or CV from NeoRef extensions (ScholOnto, MeSH, GO, etc).

Page 14: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Methodology cont’d

• Reviews, abstracts, indexing information can be stored as their own items (related to item they reference).

• Claims (concept A relates to concept B) would be stored as Concept items. Concepts give finer granularity than a paper, and support more structured logic than simple keyword searching.

Page 15: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Representing Claims

• DOI #1 Concept “geneX” in article U • DOI #2 Concept “lung cancer” in article U• DOI #3 Claim U: DOI #1 (concept geneX) has

relation “causes” to DOI #2 (“lung cancer”).

• Claim U has relationship is inconsistent with claim V.

• Claims are statements about concepts in an item, and how they relate to other items.

Page 16: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Representing Claims via Concepts

• DOI #1 Concept “lung cancer” in article U• DOI #2 Concept “geneX” in article U has relation

“causes” to DOI #1 (concept lung cancer).• PROBLEM: can’t reference claims directly, i.e.

only indirectly via concepts. For example how do you indicate that DOI #4 (Claim B) is inconsistent with DOI #2 (claim A)?

Page 17: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Example Retrievals

• Retrieve all articles on Smad4 published in any refereed journal, or any article reviewed by someone in my Respected_Reviewer list.

• Retrieve any article with (index term Fish Oil OR concept Fish Oil) having any relationship with (index term Raynaud’s disease OR concept Raynaud’s disease).

Page 18: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

What Changes with the NeoRef Model?

• Anyone can submit• Anyone can review/comment/index• Anyone can retrieve any item in the universal

archive, based on Dublin Core metadata.• Reviews, ratings, journal acceptance, citations,

hits, become measures of quality. The scale is not binary (accepted in journal) but more continuous.

• Significantly improve ability to track arguments about concepts throughout the literature.

Page 19: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Where does the Work go?

• Submission Work is pushed onto the self contributing author to describe and index their material properly.

• Search&Retrieval Work is pushed to the retrieval side where you must provide powerful filtering and good user interfaces so that the searcher is not overwhelmed.

Page 20: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Tools/Services Needed (NeoRef)

• Automatic metadata extraction (authors, date, title, keywords) to save the author from manually repeating this. In the future Word style sheets or XML entry may make automatic.

• Support for putting your materials on the open archive. (Librarian).

Page 21: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Tools/Services Needed (NeoRef)

• Choice of classification schema to code keywords in, and easy selection and addition of keywords (I.e. MeSH tree).

• Support for putting your materials on the open archive. (Librarian).

• PubMed type interface to search OAI archives metadata.

• Google to search full text of articles?

Page 22: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Part 2--uncOpenArchive

• Open Archives

• Digital Libraries

• Publishers (will they disappear?)

• Copyright

• uncOpenArchive

Page 23: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

OAI helps facilitate new Publishing Models

• Now that all the parts of the publication process are digital, the independent parts can be separated. Separable are– Classification (Ed Staff: appropriateness)– Review (Scholars: quality rating, acceptance

judgment, feedback to author)– Copy Editing (publishing staff)– Printing or Rendering into permanent form

(publishing staff).

Page 24: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Status Quo

Publisher (Commercial)

Creator (Academic)

Consumer (Academic)

Purchaser Representative (Library)

archive

Reviewer (Academic)

Page 25: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Modest Proposal (NeoRef)

Professional Society

Creator (Academic)

Consumer (Academic)

University Library

archive

Reviewer (Academic)

Page 26: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

For-Profit Licensing Model

• Publisher: Commercial company• Cost: $0 to $4000 for full review and copy edit,

plus operation costs, and profit.• Cost paid for by purchasing library.• Copyright: Author transfers copyright of final

(valued added) version to journal. Publisher negotiates licensing with libraries to recoup cost. Publisher requires that author give up rights to final version, and may require that preliminary versions not be available (e.g. Chemical Abstracts).

Page 27: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Non-Profit Licensing Model

• Publisher: individual, professional society, institution, government.

• Cost: $0 to $4000 for full review and copy edit• Cost paid for by purchasing library, possibly with

cost offset by publisher or author as in Free model.• Copyright: Author transfers copyright of final

version to journal so that they can license to libraries, but retains rights to preliminary version and possibly final version, which may be put on web for free access.

Page 28: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Free Model

• Publisher: individual, professional society, institution, government.

• Cost: $0 to $4000 for full review and copy edit• Cost paid for by

– Subsidized by institution (e.g. University library, Genbank by NCBI)

– Subsidized by professional society– Paid by author (e.g. $250 for MRS Internet Journal of

Nitride Semiconductor Research, $500 BioMedCentral)

• Copyright: fully maintained by author

Page 29: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Other costs• Submissions: essentially $0 cost to prepare final

reasonably high quality PDF with available tools.• Review: (covered on previous pages).• Archive: paid by some combination of Review

participants: professional society, publisher, institution (university), government (NLM, NSF)

• Retrieval/Searching: Either the archive, or OAI harvesters (free eg. CiteBase, Arc, or commercial).

• For profit models generally control all these services, while other models allow separate entities to provide archiving, or retrieval services.

Page 30: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Opportunities

• Review of digital objects (free and professional)

• Indexing of digital objects (author, free, professional)

• Archiving of digital objects (universities, commercial)

• Search and retrieval of digital objects (free harvesters, commercial tools).

Page 31: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Related work

• Digital archives

• Archive standards

• Harvesting and searching

• Digital library software

• Publisher policies

• E-journals

• Peer Review

Page 32: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Digital Library/Archives

• arXiv: digital Print archive, example of academic community.

• MIT’s Dspace, excellent example of university support (with industry help)

• Arizona’s DLIST (Information Science and Technology digital archive).

Page 33: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Digital Publisher/Libraries

• Public Library of Science (editorial board, peer review, etc in house; free access, $1500(?) author submission cost).

• BioMed Central (individual e-journals participate as part of this, utilizing their infrastructure; free access; $500 author submission cost; reviews, images, other additional materials cost extra).

• Stanford’s Highwire

Page 34: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Archive standards

• Open Archive Initiative (OAI) standards for open federated archives and metadata harvesting.Current registered OAI archives

• Dublin Core: standard minimal set of metadata common across domains. Dublin Core Library Profile.

Page 35: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Harvesting & Search

• Cite-base

• Arc

• Open Journal Systems (example)

Page 36: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Digital Library Commercial Software

• Endeavor Systems (ENCompass)

• Ex Libris (DigiTool)

• Sirsi (Hyperion)

• Artesia (TEAMS)

Page 37: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Publisher Policies

• Listing of what publishers allow regarding copyright and submission of articles to e-print servers. Publisher Survey

Page 38: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

E-only Journals

• Survey of E-only journals, discussion of tradeoffs of E-only journals. Llewellyn 2002 – Subject: 100+ journals, 10+ subjects

– 85% are free access

– Indexing a problem (33% not indexed)

– cataloging: (3% no OCLD holdings)

– Few citations (probably primarily because not indexed or cataloged).

• Peer-reviewed E-only journals listing

Page 39: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Peer Review

• Faculty of 1000: (prior pages on publication).

• ScholOnto: paid by some combination of Review participants (professional society, non-profit, for profit)

• dEbates in Science Magazine

Page 40: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Where to Next? (NeoRef)

• NeoRef (bioivlab prototype)– Make possible the inclusion of reviews and claims with

DC metadata.– Extend existing openarchives efforts to include metadata

(keywords, indexing, concepts) from bioinformatics domain.

• Develop convenient and accurate author deposit of materials and metadata, and searching and retrieval.

• Create Information and Library Science Research in Bioinformatics E-journal/E-print archive.

Page 41: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

Where to next? (UNC)

• Create a developmental digital archive resource at SILS to support the submission and archival of scholarly materials by anyone in UNC (uncOpenArchive)

• Create a collection of digital libraries at SILS• Help work towards the creation of a UNC-wide

production digital library hosted by the libraries.

Page 42: 1/8/2003© Bradley Hemminger UNC-CH NeoRef the knowledge management system of the future Bradley Hemminger School of Information and Library Science, University

1/8/2003 © Bradley Hemminger UNC-CH

SILS Center for Digital Libraries

(CDL)CollectionsBotnetDocSouthGovStatMinds of CarolinaNeoRefOpenVideoUNC CoursesuncOpenArchive

Submit Material

Search

Contact a CDL Librarian

School of Information and Library Science

UNC LibrariesDavisHSL