gene expression introduction
DESCRIPTION
Microarray as one of recent biomedical technologies produce high dimensional data. This makes statistical analysis become challenging. I presented an overview of microarray analysis specifically in the use of gene expression profiling in a discussion.TRANSCRIPT
![Page 1: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/1.jpg)
Analysis of Gene Expression An overview
Setia Pramana
![Page 2: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/2.jpg)
Outline
• Biological background – Central Dogma – DNA – Genes
• Genomics • Microarrays • Gene Expression data analysis pipeline • What’s next ??
Gene expression analysis
![Page 3: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/3.jpg)
Central Dogma
http://compbio.pbworks.com Gene expression analysis
![Page 4: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/4.jpg)
DeoxyriboNucleic Acid (DNA)
• DNA is the organic molecule that carries the informaBon used by a cell to build the proteins that carry out most of the biological processes in a cell.
• Double helix • Pair: G ≡ C,A = T • Example sequence: ATGCTGATCGATGCAGAATCGATC • Length of human DNA is about 3 × 109 base pair (bp) • Between us, DNA 99.9 % the same, • Our DNA 99 % the same chimpanzees.
Gene expression analysis
wikipedia
![Page 5: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/5.jpg)
Gene
• The full DNA sequence of an organism is called its genome
• A segment that specifies the sequence of a protein. • Length: 1000-‐3000 bases • Approximately around 20,000 -‐25,000 genes
h(p://www.dna-‐sequencing-‐service.com/dna-‐sequencing/gene-‐dna/a(achment/gene-‐dna/
Gene expression analysis
![Page 6: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/6.jpg)
Genetic Code • NucleoBde sequence of a mRNA is translated into the
amino acid sequence of the corresponding protein.
h\p://www.cs.tau.ac.il/~rshamir/
Gene expression analysis
![Page 7: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/7.jpg)
Genomics
• Genomics is the study of all the genes of a cell, or Bssue, at : – the DNA (genotype), e.g., GWAS SNP, CNV etc… – mRNA (transcriptomics), Gene expression, – or protein levels (proteomics).
• FuncBonal Genomics: study of the funcBonality of specific genes, their relaBons to diseases, their associated proteins and their parBcipaBon in biological processes.
Gene expression analysis
![Page 8: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/8.jpg)
Gene Expression
Gene expression analysis
• Different Bssues in the same human may express different genes, according to their role in the human body.
• The same cell may express different genes under different circumstances (stress, nutriBon, etc.).
• Cells express different genes during lifeBme (for instance, embryonic gene expression differs from adult gene expression).
• Technologies for measuring mRNA assume: – The level of mRNA in the cell is an indicaBon of the protein level in the
cell, since the major regularity is on the subscripBon process, and not the transcripBon process.
– Genes are expressed only when needed.
![Page 9: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/9.jpg)
Microarrays
Gene expression analysis
![Page 10: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/10.jpg)
Microarray Technologies
• Two type of microarray technologies: – Single channel – Dual channel
• Plaforms: – Affymetrix, – Illumina, – Agilent
Gene expression analysis
![Page 11: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/11.jpg)
Microarrays Applications
• Gene expression profiling (our focus) • SNP arrays for studying single nucleoBde polymorphisms (SNP) and copy
number variaBons (CNV) such as deleBons or inserBons. • Etc:
– ChIP on chip for invesBgaBng protein binding site occupancy, – Exon arrays to search for alternaBve splicing events – Tiling arrays for idenBfying novel transcripts that are either coding or
non-‐coding.
Gene expression analysis
![Page 12: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/12.jpg)
Microarrays Applications: MammaPrint
• MammaPrint-‐ test, can determine the likelihood of breast cancer returning within 10 years aher treatment.
• First FDA-‐approved molecular test that is based on microarray technology. • Predict whether exisBng cancer will metastasize. • InvesBgate the pa\erns and behavior of large numbers of genes. • The recurrence of cancer is partly dependent on the acBvaBon and
suppression of certain genes located in the tumor. • MammaPrint can measure the acBvity of those genes, then it can predict
paBents’ odds of the cancer spreading.
Gene expression analysis
![Page 13: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/13.jpg)
The Pipeline
• Experiment design à Lab work à Image processing • à Background correcBon • à NormalizaBon • à Signal summarizaBon (GCRMA, FARMS) (for affymetrix plaform) • à Data Analysis:
– DifferenBally Expressed genes – Clustering – ClassificaBon – Etc.
• à Network / Pathways analysis (GSEA etc..) • à Biological interpretaBons
Gene expression analysis
![Page 14: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/14.jpg)
Image Processing
http://isda.ncsa.uiuc.edu/Microarrays/ Gene expression analysis
![Page 15: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/15.jpg)
Log2 Intensity
• Response: log2 Intensity ……. why? • StaBsBcs: Log-‐transforming the data makes the intensity distribuBon more
symmetric and bell-‐shaped, i.e., a normal distribuBon • Biology: The biological processes in whole individuals presumably act in a
mulBplicaBve way. Log-‐transformaBon exactly makes the intensiBes and the expression levels behave in a mulBplicaBve way.
Gene expression analysis
![Page 16: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/16.jpg)
Normalization
• Process to remove systemaBc errors which can cause considerable biases.
• SystemaBc errors are due to: – Different incorporaBon efficiencies of dyes. – Different amounts of mRNA in the tested sample,
causing different expression levels. – Difference in experimenter or protocol (if data were
gathered in different labs). – Different scanning parameters – Differences between chips created in different
producBon batches. • Example: QuanBle normalizaBon
Gene expression analysis
![Page 17: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/17.jpg)
Normalization
Gene expression analysis
![Page 18: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/18.jpg)
Microarrays, Data structure
Gene expression analysis http://www.ebi.ac.uk
![Page 19: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/19.jpg)
Microrrays, Applications
• IdenBfy diseases related genes • ClassificaBon, example Mamaprint • Cluster genes • Clusters the samples (disease stages, Bssues) : class
discovery • Clusters genes and samples
• Pharmacogenomics: – Personalized medicine: individualize therapies – Target based medicine: More effecBve but less side
effect drugs.
Gene expression analysis
![Page 20: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/20.jpg)
Data Analysis Challenges
• The curse of high-‐dimensionality: • Obstacle in the soluBon of classificaBon and clustering problems • Problem of mulBple tesBng problem: the problem of having an increased
number of false posiBve results because the same hypothesis is tested mulBple Bmes.
• MulBple tesBng correcBon: – FWER: Bonferroni, Holm. – FDR: BH, BY
Gene expression analysis
![Page 21: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/21.jpg)
Identification of Differential Genes
• Discover genes with different expression in two or more different Bssues/condiBons.
• Fold change • t-‐type test:
– t-‐ test – Modified t-‐test: Significance
Analyss of Microarray (SAM), t -‐ LIMMA
• Linear Models for Microarray Data (LIMMA)
Gene expression analysis
![Page 22: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/22.jpg)
Clustering
• Clustering genes or condiBons or both. • Deducing funcBons of unknown genes from known genes with similar
expression pa\erns. • IdenBfying disease profiles -‐ Bssues with similar pathology should yield
similar expression profiles. • Co-‐expression of genes may imply co-‐regulaBon. • ClassificaBon of biological condiBons. • Drug development
Gene expression analysis
![Page 23: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/23.jpg)
Clustering
Gene expression analysis
Statistical Methods: Hierarchical clustering, K-means, CLICK (CLuster Identification via Connectivity Kernels), Biclustering, etc. More: http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf
![Page 24: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/24.jpg)
Classification
Gene expression analysis
• Classification of tumor malignancies into
known classes : supervised learning; • Identification of marker genes that
characterize the different tumor classes: feature selection.
Genes distinguishing ALL from AML (two types of leukemia).
![Page 25: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/25.jpg)
Classification
• Methods: – Discriminant analysis : LDA, K nearest neighbor. – ClassificaBon Tree – LogisBc regression, penalized LR: LASSO. – Neural network – Support vector machines (SVM) – Random forest, etc….. A survey of these methods: h\p://www.ibiostat.be/publicaBons/phd/suzyvansanden.pdf h\p://www.stat.cmu.edu/~jiashun/Research/sohware/Data/papers/dudoit.pdf
Gene expression analysis
![Page 26: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/26.jpg)
Pathways Analysis
• We discover DE genes, what's next?
• IdenBfy which pathways (e,g,. GO KEGG) terms are most commonly associated with the DE genes.
• Methods: GEA, GSEA, NEA, etc.
Gene expression analysis
![Page 27: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/27.jpg)
What’s next
• Next-‐generaBon sequencing + No need to know the sequence of the transcript. + There are no arBfacts due to cross-‐hybridizaBon + Be\er quanBtaBon of low abundance transcripts. -‐ New data types and huge data volumes. -‐ Quality
• EpigeneBcs – The study of heritable changes in genome funcBon
that occur without a change in DNA sequence (h\p://epigenome.eu/en/1,1,0 ).
– DNA methylaBon Gene expression analysis
![Page 28: Gene expression introduction](https://reader034.vdocuments.site/reader034/viewer/2022052315/5552171ab4c90520548b49ce/html5/thumbnails/28.jpg)
Reference
• Gohlmann,, H. and Talloen, W, Gene Expression Studies Using Affymetrix Microarrays, Chapman & Hall/CRC MathemaBcal & ComputaBonal Biology, 2009.
• h\p://www.cs.tau.ac.il/~rshamir/ge/09/ Other useful books: • Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S, editors:
BioinformaBcs and computaBonal biology soluBons using R and Bioconductor . Springer Science, New York, 2005.
• Amaratunga D, Cabrera J: ExploraBon and Analysis of DNA Microarray and Protein Array Data. Wiley-‐Interscience, 2004.
Gene expression analysis