a literature network of human genes for high-throughput analysis of gene expression speaker :...

29
A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, Yang Advisor : Ueng -Cheng, Yang The institute of biochemistry, NYMU Bioinformatics program and core lab Tor-Kristian Jenssen, Astrid Laegreid, Jan Komorowski & Eivind Hovig Nature Genetics. Volume 28. may2001

Upload: rosamund-chandler

Post on 21-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

A literature network of human genes for high-throughput analysis of gene expression

Speaker : Shih-Te, Yang Advisor : Ueng-Cheng, Yang

The institute of biochemistry, NYMUBioinformatics program and core lab

Tor-Kristian Jenssen, Astrid Laegreid, Jan Komorowski & Eivind HovigNature Genetics. Volume 28. may2001

Page 2: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Goals for system biology

?

Cell., 100(1):57–70 Review, 2000.PNAS, Vol. 95, 14863-14868

Page 3: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

How to Find Biologically Significant Events Using Microarray Tech?

Fitting to current knowledge

Sifting out variations

Page 4: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Mapping Gene Expression Data to KEGG Pathways

Page 5: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Linking Molecular Information to Phenotypes Can Provide

Insights to Biological Processes

Pathways: metabolic, signal transduction, etc.

Phenotype: angiogenesis, metastasis

Page 6: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Information Hidden in Literature

Molecular functions Protein-protein interactions Protein-DNA (RNA) interactions

Phenotypic information Physiological and pathological

processes (ex. Angiogenesis, tumor metastasis)

Drug and chemical response

Page 7: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

No Efficient Way to Find Genes Related to Angiogenesis

http://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=0&form=1&term=angiogenesis

Page 8: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Strategies of Literature Mining

Keyword indexing (a gene) protein annotation

Semantics (語意學 ) (genes) Protein binding and interaction

Keyword co-occurrence (terms and genes) Biomedical terms vs genes ->

biological processes

Page 9: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Medicine and Related Subjects from MeSH Classified by NLM

http://wwwcf.nlm.nih.gov/class/schedule.html

Page 10: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Gene Ontology (GO) Can Provide Links between Biological Processes and Genes

Page 11: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Approach to construct the literature network (part one)

Step One: gene-to-term co-associated to a common set of articles

Articles

Gene Termannotation

Index Index

•MeSH•Gene OntologyTM

Page 12: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Approach to construct the literature network (part two)

Step Two: gene-to-gene co-citation (co-mentioned, co-occurrence)

Articles

Gene B

Gene AIndex

IndexBiological relation

Global approach

Network Extension and Expansion

Page 13: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Linking gene-gene, gene-term, and term-term relations

Term 2(Metastasis)

Gene 5

Term 1 (Angiogenesis)Gene 1

Gene 3Gene 4

Gene 2

Page 14: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Research design step by step logically

Mapping/matching symbol to gene

Filtering procedure

Gene-articles indexTerm-articles index

MeSHGene OntologyTM

Gene-gene networkGene-term network

PubGene Database

Gene network browser

Internet

PubGeneTM Gene Database and Toolshttp://www.pubgene.org/

Page 15: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Automated indexing of named human genes

Gene nomenclature Database(13712)

HUGO(9722)

LocusLink(2729)

GENATLAS(1239)

GDB(358)

•Primary symbol•Gene name•Alternative symbol

63 63 352

14048

13570(142)

Page 16: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Contribution to the gene-to-article index over time

The total number of gene occurrences

The MEDLINE before 1975 don’t contain abstracts

More articles of the years 1999 & 2000 were expected to be include into MEDLINE

Page 17: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Distribution of genes with respect to the number of

articles found to be reverent

Distribution of genes with respect to the number of gene neighbors

•The histogram show ‘smoothed’ values.•The distribution of genes by article ref. is almost exponentially decreasing.

Genes tended to be mentioned in triplets almost as much as

for the ref.

Page 18: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Types of gene relationships found in PubGene To examine over-represented or incorrectly assigned relationship

(40%) (29%)

Symbols belong to more than one gene symbol

Very general symbols coinciding with general acronyms

Very short gene name

Page 19: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

DIPC(171,2)OMIMC(6404,2)? 8643?

•DIP: “Number of actual links” “Number of genes”•OMIM: “Number of genes” “Number of actual links” •“Number of actual links” “PubGene” “Number of actual link found in PubGene”•“Number of possible links” “PubGene” “Number of all links found in PubGene”

Comparison of PubGene with manually curated database To examine the under-represented gene pairs

(51%) (45%)

Page 20: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

(a) insufficient synonym lists(b) synonym case variation(c) complex gene family with immature or complex naming convention

Reasons for under-representation of DIP derived gene pairs

Page 21: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

The sum up from the verification of DIP and OMIM

The numbers of interactions in DIP and OMIM contained in PubGene reflect that PubGene captures substantial amounts of the existing biological information on protein-protein interactions and on gene mapping and disease.

Page 22: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Linking relations to expression profiles (microarray, proteomics

etc.)

Term 2(Metastasis)

Gene 5

Term 1 (Angiogenesis)Gene 1

Gene 3Gene 4

Gene 2

Time series, expression levels, patterns, etc.

Page 23: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Verify the applicability of the tools by analyzing two publicly available microarray data sets

Discrimination analysis: Literature associations highlight background

knowledge for signature genes in patient sample data.

Kinetic & mechanism studyDetection of complex co-regulatory patterns between

biologically related genes.

Page 24: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

The “signature gene cluster” from

unsupervised hierarchical clustering analysis

(Nature. 403, 503-511)

•Cell type•Biological process

Page 25: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

To explore the correlation between unsupervised clustering and supervised PubGene approach

(Nature. 403, 503-511)

•4062 clones 1032symbol(PubGene) 50(up/down regulated)•(7+14)/50=42%•6%(1302,50) B-cell signature•42/6=7 x significant compare to the random

Page 26: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Network of the genes in the GC-B signature

•GC-B signature 25genes only 20genes map to network+the most important neighbors

•Underlying biological relationship between these genes

•Link signature gene to disease MeSH term Fragile X, Angelman syndrome, lymphoma, leukaemia,…

•Link signature gene to Gene Ontologytranscriptional regulator

Translocation in lymphomas

Immunoglobulinrecombination

Page 27: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

To visualize complex co-regulatory patterns of gene expression and simultaneously highlight biological

relationships

1hour 8hour

(from Science. 283, 83-87)

Transcription factors

8613clones 517clones 340 genes + 1hour-expression level superimpose into sub-network of PubGene

Angiogenesis

Page 28: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Rapid profiling of genes through the distribution of MeSH terms6 hour 1 hour

•MeSH indexing: the identification of strong association between genes and biological process•Liking literature network to MeSH-terms•‘angiogenesis’ 10/12 (highest fraction)

(from Science. 283, 83-87)

MeSH index

Page 29: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Summary

With the indexing strategy (gene-gene & gene-term co-citation), rich and varied information content and analytical flexibility, can incorporate more of the available biological knowledge for high-throughput gene expression analysis than any other analytical tool available.

Web-base solution and multiple-query can offer end-user literature information to microarray data by global and systematical view.