a literature network of human genes for high-throughput analysis of gene expression speaker :...
TRANSCRIPT
![Page 1: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/1.jpg)
A literature network of human genes for high-throughput analysis of gene expression
Speaker : Shih-Te, Yang Advisor : Ueng-Cheng, Yang
The institute of biochemistry, NYMUBioinformatics program and core lab
Tor-Kristian Jenssen, Astrid Laegreid, Jan Komorowski & Eivind HovigNature Genetics. Volume 28. may2001
![Page 2: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/2.jpg)
Goals for system biology
?
Cell., 100(1):57–70 Review, 2000.PNAS, Vol. 95, 14863-14868
![Page 3: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/3.jpg)
How to Find Biologically Significant Events Using Microarray Tech?
Fitting to current knowledge
Sifting out variations
![Page 4: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/4.jpg)
Mapping Gene Expression Data to KEGG Pathways
![Page 5: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/5.jpg)
Linking Molecular Information to Phenotypes Can Provide
Insights to Biological Processes
Pathways: metabolic, signal transduction, etc.
Phenotype: angiogenesis, metastasis
![Page 6: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/6.jpg)
Information Hidden in Literature
Molecular functions Protein-protein interactions Protein-DNA (RNA) interactions
Phenotypic information Physiological and pathological
processes (ex. Angiogenesis, tumor metastasis)
Drug and chemical response
![Page 7: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/7.jpg)
No Efficient Way to Find Genes Related to Angiogenesis
http://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=0&form=1&term=angiogenesis
![Page 8: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/8.jpg)
Strategies of Literature Mining
Keyword indexing (a gene) protein annotation
Semantics (語意學 ) (genes) Protein binding and interaction
Keyword co-occurrence (terms and genes) Biomedical terms vs genes ->
biological processes
![Page 9: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/9.jpg)
Medicine and Related Subjects from MeSH Classified by NLM
http://wwwcf.nlm.nih.gov/class/schedule.html
![Page 10: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/10.jpg)
Gene Ontology (GO) Can Provide Links between Biological Processes and Genes
![Page 11: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/11.jpg)
Approach to construct the literature network (part one)
Step One: gene-to-term co-associated to a common set of articles
Articles
Gene Termannotation
Index Index
•MeSH•Gene OntologyTM
![Page 12: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/12.jpg)
Approach to construct the literature network (part two)
Step Two: gene-to-gene co-citation (co-mentioned, co-occurrence)
Articles
Gene B
Gene AIndex
IndexBiological relation
Global approach
Network Extension and Expansion
![Page 13: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/13.jpg)
Linking gene-gene, gene-term, and term-term relations
Term 2(Metastasis)
Gene 5
Term 1 (Angiogenesis)Gene 1
Gene 3Gene 4
Gene 2
![Page 14: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/14.jpg)
Research design step by step logically
Mapping/matching symbol to gene
Filtering procedure
Gene-articles indexTerm-articles index
MeSHGene OntologyTM
Gene-gene networkGene-term network
PubGene Database
Gene network browser
Internet
PubGeneTM Gene Database and Toolshttp://www.pubgene.org/
![Page 15: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/15.jpg)
Automated indexing of named human genes
Gene nomenclature Database(13712)
HUGO(9722)
LocusLink(2729)
GENATLAS(1239)
GDB(358)
•Primary symbol•Gene name•Alternative symbol
63 63 352
14048
13570(142)
![Page 16: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/16.jpg)
Contribution to the gene-to-article index over time
The total number of gene occurrences
The MEDLINE before 1975 don’t contain abstracts
More articles of the years 1999 & 2000 were expected to be include into MEDLINE
![Page 17: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/17.jpg)
Distribution of genes with respect to the number of
articles found to be reverent
Distribution of genes with respect to the number of gene neighbors
•The histogram show ‘smoothed’ values.•The distribution of genes by article ref. is almost exponentially decreasing.
Genes tended to be mentioned in triplets almost as much as
for the ref.
![Page 18: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/18.jpg)
Types of gene relationships found in PubGene To examine over-represented or incorrectly assigned relationship
(40%) (29%)
Symbols belong to more than one gene symbol
Very general symbols coinciding with general acronyms
Very short gene name
![Page 19: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/19.jpg)
DIPC(171,2)OMIMC(6404,2)? 8643?
•DIP: “Number of actual links” “Number of genes”•OMIM: “Number of genes” “Number of actual links” •“Number of actual links” “PubGene” “Number of actual link found in PubGene”•“Number of possible links” “PubGene” “Number of all links found in PubGene”
Comparison of PubGene with manually curated database To examine the under-represented gene pairs
(51%) (45%)
![Page 20: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/20.jpg)
(a) insufficient synonym lists(b) synonym case variation(c) complex gene family with immature or complex naming convention
Reasons for under-representation of DIP derived gene pairs
![Page 21: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/21.jpg)
The sum up from the verification of DIP and OMIM
The numbers of interactions in DIP and OMIM contained in PubGene reflect that PubGene captures substantial amounts of the existing biological information on protein-protein interactions and on gene mapping and disease.
![Page 22: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/22.jpg)
Linking relations to expression profiles (microarray, proteomics
etc.)
Term 2(Metastasis)
Gene 5
Term 1 (Angiogenesis)Gene 1
Gene 3Gene 4
Gene 2
Time series, expression levels, patterns, etc.
![Page 23: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/23.jpg)
Verify the applicability of the tools by analyzing two publicly available microarray data sets
Discrimination analysis: Literature associations highlight background
knowledge for signature genes in patient sample data.
Kinetic & mechanism studyDetection of complex co-regulatory patterns between
biologically related genes.
![Page 24: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/24.jpg)
The “signature gene cluster” from
unsupervised hierarchical clustering analysis
(Nature. 403, 503-511)
•Cell type•Biological process
![Page 25: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/25.jpg)
To explore the correlation between unsupervised clustering and supervised PubGene approach
(Nature. 403, 503-511)
•4062 clones 1032symbol(PubGene) 50(up/down regulated)•(7+14)/50=42%•6%(1302,50) B-cell signature•42/6=7 x significant compare to the random
![Page 26: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/26.jpg)
Network of the genes in the GC-B signature
•GC-B signature 25genes only 20genes map to network+the most important neighbors
•Underlying biological relationship between these genes
•Link signature gene to disease MeSH term Fragile X, Angelman syndrome, lymphoma, leukaemia,…
•Link signature gene to Gene Ontologytranscriptional regulator
Translocation in lymphomas
Immunoglobulinrecombination
![Page 27: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/27.jpg)
To visualize complex co-regulatory patterns of gene expression and simultaneously highlight biological
relationships
1hour 8hour
(from Science. 283, 83-87)
Transcription factors
8613clones 517clones 340 genes + 1hour-expression level superimpose into sub-network of PubGene
Angiogenesis
![Page 28: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/28.jpg)
Rapid profiling of genes through the distribution of MeSH terms6 hour 1 hour
•MeSH indexing: the identification of strong association between genes and biological process•Liking literature network to MeSH-terms•‘angiogenesis’ 10/12 (highest fraction)
(from Science. 283, 83-87)
MeSH index
![Page 29: A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,](https://reader036.vdocuments.site/reader036/viewer/2022062309/5697bfd31a28abf838cac2b0/html5/thumbnails/29.jpg)
Summary
With the indexing strategy (gene-gene & gene-term co-citation), rich and varied information content and analytical flexibility, can incorporate more of the available biological knowledge for high-throughput gene expression analysis than any other analytical tool available.
Web-base solution and multiple-query can offer end-user literature information to microarray data by global and systematical view.