evidence-based information retrieval in bioinformatics timothy b. patrick, phd healthcare...

34
Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin- Milwaukee

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Evidence-Based Information Retrieval in Bioinformatics

Timothy B. Patrick, PhD Healthcare Administration and Informatics,

University of Wisconsin-Milwaukee

Goal of the Project

• The overall, long term goal of this research project is to contribute to evidence-based information retrieval in post-genomic medicine– proof of the effectiveness of the way particular

information resources are used and combined in order to retrieve that information

Aims

• Specific Aim 1: Determine existing pitfalls in accessing literature on gene function

• Specific Aim 2: Based on user warrant, determine the current state of evidence-based functional genomic retrieval

• Specific Aim 3: Based on literary warrant, determine the current state of evidence-based functional genomic retrieval

“Determine existing pitfalls in accessing literature on gene function”

• That is the topic of my talk later today.

• “Asymmetries in Retrieval of Gene Function Information”

The Study

• Investigated an example of different paths to the literature that might look to a user to be equivalent but which are not equivalent due to various features of the resources involved.

• Knowledge that they are not equivalent requires knowledge of metadata about the resources.

Pubmed links

GenbankAccession

number

Pubmed links

GenbankAccession

number

Three Paths

GenbankAccession

number

Pubmed ID Pubmed ID Pubmed ID

Affymetrix Affymetrix Affymetrix

Pubmed Pubmed Pubmed

Nucleotide Gene

http://www.affymetrix.com/corporate/media/genechip_essentials/gene_expression/Features_and_probes.affx

Pubmed links

GenbankAccession

number

Pubmed links

GenbankAccession

number

Three Paths

GenbankAccession

number

Pubmed ID Pubmed ID Pubmed ID

Affymetrix Affymetrix Affymetrix

Pubmed Pubmed Pubmed

Nucleotide Gene

Methods

• We first collected representative DNA Accession numbers associated with genes expressed in a microarray experiment designed to identify changes in gene expression associated with skeletal muscle recovery from immobilization-induced sarcopenia.

Methods

• Next, we retrieved the Unique Identifiers (UI’s) of Entrez Pubmed citations that were associated with the Accession numbers by each of the three Entrez resources. – Directly in the case of Entrez Pubmed– Indirectly, via Pubmed links in the case of Entrez

Nucleotide and Entrez Gene

• Next, we compared the number of Pubmed ID's retrieved by the three resources for each of the Accession numbers.

Summary of Pubmed ID’s by Accession Number

# of

Pubmed ID’s

# of Accessionnumbers

0 198

1 36

2 10

3 4

4 1

5 2

Total 251

# of

Pubmed ID’s

# ofAccessionnumbers

0 132

1 112

2 5

3 2

4 0

5 0

Total 251

Pubmed Nucleotide

# of

Pubmed ID’s

# ofAccession numbers

0 216

1 34

2 0

3 1

4 0

5 0

Total 251

Gene

Methods

• Compared number of Pubmed ID’s produced for each Accession number by each path.

• Applied non-parametric test: Kendall’s W– Pubmed versus Nucleotide versus Gene– p < .05

The Three Paths Are Not Equivalent

≠ ≠Pubmed links

GenbankAccession

number

Pubmed links

GenbankAccession

numberGenbankAccession

number

Pubmed ID Pubmed ID Pubmed ID

Affymetrix Affymetrix Affymetrix

Pubmed Pubmed Pubmed

Nucleotide Gene

The SI field identifies secondary source databanks and accession numbers of outside resources discussed in MEDLINE articles. The field is composed of the source followed by a slash followed by an accession number and can be searched with one or both components, e.g., genbank [si], AF001892 [si], genbank/AF001892 [si].

The SI field and the Entrez sequence database links are not linked. The PubMed links to these databases are created from the reference field of the GenBank or GenPept flat file. These references include citations that discuss the specific sequence presented in these flat files.

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.box.pubmedhelp.Box_1_Search_Field_D#pubmedhelp.Secondary_Source_ID_

“Based on user warrant, determine the current state of evidence-based functional

genomic retrieval”

• Interviews with biologists who use microarrays to study gene expression levels

• Questions concern what methods for IR are used, why they consider the methods effective, what are criteria of success and failure, and how they see the role of biomedical librarians in the process

Interviews in Progress

• Five interviews currently scheduled at the University of Missouri-Columbia

• Interviews being scheduled at University of Wisconsin-Milwaukee

• In March we interviewed two subjects at NIG in Japan

“Based on literary warrant, determine the current state of evidence-based functional

genomic retrieval”

• We wanted to investigate how and to what extent biological science researchers reported their information retrieval methods, including details of why they used the methods they did.

Methods

• We searched OVID Medline on October 1, 2004 for the period 1966 to September Week 4 2004 with the query “Oligonucleotide Array Sequence Analysis/”, producing 10746 results.

• We then limited the results to English (10374), excluded “review articles” (9049), and limited to the years 2003 – 2004 (4798). We next ranked journals in the results by number of articles, and selected a population of all of the articles from the 13 top journals (n=1373). We randomly sampled 150 articles from that population.

Methods

• If the authors of the paper did report gene function, we wanted to know which information sources and retrieval methods they used, as well as the reasons they had for using them. – Functional Attribution Reported– Sources of Information Reported– Retrieval Strategy Reported– Grounds for Choice of Sources Reported – Grounds for Retrieval Strategy Reported

Methods

• How were details of sources and retrieval methods reported?– Methods or Procedures– Results – Discussion

Results

• Typical evidence for attribution of gene function consists of literature citations.

• When a literature search (e.g. Pubmed search), or a search of other knowledge sources (e.g. NCBI databases), is cited as the source of evidence to support attribution of function, rarely are details of the search reported.

• Reasons for using sources and retrieval methods not reported.

Results

• When information retrieval methods are described in the paper, they are typically mentioned only in the “Results” or “Discussion” sections of the paper, and not in the “Methods” section.

• Wet bench methods are reported in more detail than dry bench methods.

Implications for Information Practice

Implications for Information Practice

• There is a need to embrace a workflow concept

• There is a need to develop standards for documentation in e-science

• There is a need to use multidisciplinary teams to develop workflows

“There is a need to embrace a workflow concept”

• Call a scenario of the use of a combination of multiple information resources databases and analysis tools a workflow

• Workflows are increasingly important for information retrieval and processing in the Life Sciences

Traditional Science

Computer based In

formation

retrieval and processing

The Digitization of Science or E-science

“There is a need to develop standards for documentation in e-science”

Life Science Information Retrieval and Processing

Workflows

documentation

Life Science Information Retrieval and Processing

Workflows

technologyto facilitate

documentation

documentation

Life Science Information Retrieval and Processing

Workflows

technologyto facilitate

documentation

editorialpolicydrivers

documentation

Life Science Information Retrieval and Processing

Workflows

INFORMATION ITEMS

METADATA

KNOWLEDGE-ENABLED WORKFLOWS

TOOLS

“There is a need to use multidisciplinary teams to develop workflows”

INFORMATION ITEMS

METADATA

KNOWLEDGE-ENABLED WORKFLOWS

TOOLS domainexpert

(scientist)

INFORMATION ITEMS

METADATA

KNOWLEDGE-ENABLED WORKFLOWS

TOOLS

domain metadataexpert

(informationspecialist) domain

expert(scientist)

INFORMATION ITEMS

METADATA

KNOWLEDGE-ENABLED WORKFLOWS

TOOLS

domain metadataexpert

(informationspecialist) domain

expert(scientist)