microarray - wordpress.comfrom genex spot of genex with complementary sequence ... hybridization of...
TRANSCRIPT
Microarray
Mitesh Shrestha
• transcription • post transcription (RNA stability) • post transcription (translational control) • post translation (not considered gene regulation)
the “transcriptome”
Genes can be regulated at many levels
RNA PROTEIN DNA TRANSCRIPTION TRANSLATION
Usually, when we speak of gene regulation, we are referring to transcriptional regulation. The complete set of all genes being transcribed are referred to as the “transcriptome.”
In the last dozen years, it has become possible to look at
the entire transcriptome in a single experiment!
While there are a number of variations, there are
essentially two basic ways of doing this—using
sequencing-based methods and microarrays. These
have largely replaced older methods such as subtractive
hybridization and differential display.
Sequencing-based methods are very powerful but have
typically been prohibitively expensive. However, with
recent advances in low-cost, high-throughput next
generation sequencing, these methods—referred to as
“RNA-seq”—are becoming more common and may soon
be dominant.
Genomic analysis of gene expression
• Methods capable of giving a “snapshot” of RNA expression of all genes
• Can be used as diagnostic profile – Example: cancer diagnosis
• Can show how RNA levels change during development, after exposure to stimulus, during cell cycle, etc.
• Provides large amounts of data • Can help us start to understand how whole
systems function
Benfey and Protopapas, "Genomics" © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper
Saddle River, New Jersey 07458
Although details of the methods vary, the concept behind
RNA-seq is simple:
• isolate all mRNA
• convert to cDNA using reverse transcriptase
• sequence the cDNA
• map sequences to the genome
The more times a given sequence is detected, the more
abundantly transcribed it is. If enough sequences are
generated, a comprehensive and quantitative view of the
entire transcriptome of an organism or tissue can be
obtained.
RNA-seq
Nucleic acid hybridization
Gene expression assays
The main types of gene expression assays:
– Serial analysis of gene expression (SAGE);
– Short oligonucleotide arrays (Affymetrix);
– Long oligonucleotide arrays (Agilent);
– Fibre optic arrays (Illumina);
– cDNA arrays (Brown/Botstein)*.
Biological
Question
Sample
Preparation
Data Analysis &
Modelling
Microarray Reaction
MicroarrayDete
ction
Taken from Schena & Davis
Microarray life cyle
Evolution & Industrialization 1989: First Affymetrix Genechip Prototype
1994: First Commercial Affymetrix Genechip
1994- First cDNAs arrays were developed at Stanford University.
1994: First Commercial Scanner-Affymetrix
1996- Commercialization of arrays
1997-Genome-wide Expression Monitoring in S. cerevisiae
Types of Microarrays
-Expression Arrays -Protein microarrays (Proteomics) -Resequencing arrays -CGH arrays- Comparative genomic hybridization -SNP Arrays -Antibody Arrays -Exon arrays-Alternative splice variant detection -Tissue Arrays
Microarrays may eventually be eclipsed by sequence-based methods, but
meanwhile have become incredibly popular since their inception in 1995
(Schena et al. (1995) Science 270:467-70).
Microarrays are based on the ability of complementary strands of DNA
(or DNA and RNA) to hybridize to one another in solution with high
specificity so can be used for DNA or RNA abundance on a genomic
scale in different types of cells.
There are now many variations. We’ll take a quick look at the two basic
types: Affymetrix (high density oligonucleotide) and glass slide (cDNA,
long oligo, etc). Both are conceptually similar, with differences in
manufacture and details of design and analysis.
DNA microarrays
Cell A Cell B
Hybridizaton to chip
Labeled cDNA
from geneX
Spot of geneX with
complementary sequence
of colored cDNA This spot shows red color after scanning.
Idea of Microarray
Several Types of Arrays
• Spotted DNA arrays – Developed by Pat Brown’s lab at Stanford
– PCR products of full-length genes (>100nt)
• Affymetrix gene chips – Photolithography technology from computer industry
allows building many 25-mers
• Ink-jet microarrays from Agilent – 25-60-mers “printed directly on glass slides
– Flexible, rapid, but expensive
Array Fabrication Spotting
• Use PCR to amplify DNA
• Robotic "pen" deposits DNA at defined coordinates
• approximately 1-10 ng per spot
• Experimentation with oligos (40, 70 bp)
This machine can make 48 microarrays simultaneously.
Array Fabrication Photolithography
• Light activated synthesis • synthesize oligonucleotides on glass slides
• 107copies per oligo in 24 x 24 µm square
• Use 20 pairs of different 25-mers per gene
• Perfect match and mismatch
Array Fabrication Photolithography
Affymetrix Microarrays
50um
1.28cm
~107 oligonucleotides, half perfectly match mRNA (PM), half have one mismatch (MM) Raw gene expression is intensity difference: PM - MM
Raw image
Agilent cDNA microarray and oligonucelotides microarray
• Agilent delivering printed 60-mer microarrays in addition to 25-mer formats.
• The inkjet process uses standard phosphoramidite chemistry to deliver extremely small volumes (picoliters) of the chemicals to be spotted.
Biological question
Differentially expressed genes
Sample class prediction etc.
Testing
Biological verification
and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
R, G
16-bit TIFF files
(Rfg, Rbg), (Gfg, Gbg)
Microarray Experiments
Experimental Design for Microarrays
There are a number of important experimental design
considerations for a microarray experiment:
•technical vs biological replicates
•amplification of RNA
•dye swaps
•reference samples
Experimental Design for Microarrays
Technical vs biological replicates
•technical replicates are repeat hybridizations using the
same RNA isolate
•biological replicates use RNA isolated from separate
experiments/experimental organisms
Although technical replicates can be useful for reducing
variation due to hybridization, imaging, etc., biological
replicates are necessary for a properly controlled experiment
Experimental Design for Microarrays
Amplification of RNA
• linear amplification methods can be used to increase the
amount of RNA so that microarray experiments can be
performed using very small numbers of cells. It’s not clear
to what degree this affects results, especially with respect
to rare transcripts, but seems to be generally OK if done
correctly
Experimental Design for Microarrays
Dye swaps
When using 2-color arrays, it’s important to hybridize
replicates using a dye-swap strategy in which the
colors (labels) are reversed between the two
replicates. This is because there can be biases in
hybridization intensity due to which dye is used (even
when the sequence is the same).
S1 S2
S1 S2
Experimental Design for Microarrays
Reference samples
•one common strategy is to use a reference sample
in one channel on each array. This is usually
something that will hybridize to most of the
features (e.g., a complex RNA mixture). Using a
reference sample allows comparisons to be made
between different experimental conditions, as each
is compared to the common reference.
S1
S2
S3
R
R
R
compare
S1/R vs. S2/R vs. S3/R
The Workflow of Microarray
Array
Hybridized Array
Hybridization
Scanning
Plate
Array Fabrication
Plate Preparation RNA extraction
Labeled cDNA
cDNA synthesis
and labeled
sample
cDNA Synthesis And Directly Labeling
Cy3 and Cy5 cDNA Hybridization On To The Chip
1.Loading from the corner of the cover slip It is time consuming and easily producing bubbles.
2. Loading sample at the center of array then put the slip smoothly Faster, and have lower chance of bubble producing then the last one.
3. Loading sample at the side of the array then put the slip on. Solution would attach to the slip right after the slip contact with it, and would diffuse with the movement of slip when we slowly move down.
1
2
3
Sample loading
e.g. treatment / control
normal / tumor tissue
Sample loading
Sample loading
Scan
Green: down regulate Red: up regulate Yellow: equal level
RESULTS
The colors denote the degree of expression in the experimental versus the control cells.
Gene not expressed in control or in experimental cells
Only in control
cells
Mostly in control
cells
Only in experimental
cells
Mostly in experimental
cells
Same in both cells
Image analysis
• The raw data from a cDNA microarray experiment consist of pairs of image files, 16-bit TIFFs, one for each of the dyes.
• Image analysis is required to extract measures of the red and green fluorescence intensities for each spot on the array.
Steps in image analysis
1. Addressing. Estimate location of spot
centers.
2. Segmentation. Classify pixels as
foreground (signal) or background.
3. Information extraction. For
each spot on the array and each
dye
• foreground intensities;
• background intensities;
• quality measures.
Why do we calculate the background intensities?
• Motivation behind background adjustment: A spot’s measured fluorescence intensity includes a contribution that is not specifically due to the hybridization of the target to the probe, but to something else, e.g. the chemical treatment of the slide, autofluorescence etc. Want to estimate and remove this unwanted contribution.
Quantification of expression
For each spot on the slide we calculate
Red intensity = Rfg - Rbg
fg = foreground, bg = background, and
Green intensity = Gfg – Gbg
cDNA gene expression data
Genes
mRNA samples
Gene expression level of gene 5 in mRNA sample 4
= log2( Red intensity / Green intensity)
sample1 sample2 sample3 sample4 sample5 …
1 0.46 0.30 0.80 1.51 0.00 ...
2 -0.10 0.49 0.24 0.06 0.46 ...
3 0.15 0.74 0.04 0.10 0.20 ...
4 -0.45 -1.03 -0.79 -0.56 -0.32 ...
5 -0.06 1.06 1.35 1.09 -1.09 ...
Data on p genes for n samples
down-regulated gene
Up-regulated gene
unchanged expression
Homogeneity and Separation Principles
• Homogeneity: Elements within a cluster are close to each other
• Separation: Elements in different clusters are further apart from each other
• …clustering is not an easy task!
Given these points a clustering algorithm might make two distinct clusters as follows
Bad Clustering
This clustering violates both Homogeneity and Separation principles
Close distances from points in separate clusters
Far distances from points in the same cluster
Good Clustering
This clustering satisfies both Homogeneity and Separation principles
Clustering Techniques
• Agglomerative: Start with every element in its own cluster, and iteratively join clusters together
• Divisive: Start with one cluster and iteratively divide it into smaller clusters
• Hierarchical: Organize elements into a tree, leaves represent genes and the length of the pathes between leaves represents the distances between genes. Similar genes lie within the same subtrees
Hierarchical Clustering
1 2
3
4
6
5
7 8 9
7 9 8 4 5 1 2 3 6
Validation of data
There’s no way that all of your microarray data can
be validated.
It’s strongly recommended that any key findings
be verified by independent means.
Northern blots and quantitative RT-PCR are the
typical ways of doing this; real-time, quantitative
RT-PCR is generally the method of choice.
• EXPERIMENT DESIGN
type, factors, number of arrays, reference sample, qc, database
accession (ArrayExpress, GEO)
• SAMPLES USED, PREPARATION AND LABELING
• HYBRIDIZATION PROCEDURES AND PARAMETERS
• MEASUREMENT DATA AND SPECIFICATIONS
quantitations, hardware & software used for scanning and analysis,
raw measurements, data selection and transformation procedures, final
expression data
• ARRAY DESIGN
platform type, features and locations, manufacturing protocols or
commercial p/n
MIAME (Minimal Information About a Microarray Experiment)
When you publish a microarray experiment, you are expected to make available
the following minimal information. This allows others to evaluate your data and
compare it to other experimental results:
Repositories of Microarray Studies
• Due to the large use of microarrays, data repositories have flourished world-wide. Three of the largest databases of gene expression are:
1. The Gene Expression Omnibus (GEO)
2. National Center for Biotechnology Information (NCBI)
3. Stanford Microarray Data Base (SMD)
And for PLANTS
Plant Expression database
PLEXdb
Tiled microarrays
So-called tiled microarrays cover a genomic region (or the
whole genome!) at high coverage. Probes are designed to cover
virtually every basepair of the sequence, usually excluding
only simple sequence repeats. In this way, there is no bias
toward known transcribed regions.
genomic sequence probes on array
probe size and spacing determines the resolution of the array
Expression Arrays Most common type of microarray
Spotted glass, cartridge, and electronic
Involves extracting RNA from a sample and converting it to cDNA by priming off of the Poly A tail of mRNA for eukaryotes and using random hexamers for prokaryotes [WHY?]
Measures the amount and type of mRNA transcripts
Provides information on whether genes are up or down regulated in a specific condition
Can find novel changes in ESTs for specific conditions
Protein Microarrays
True protein microarrays are evolving very slow and only a few exist. Technology is not straight forward due to inherent characteristic of proteins [e.g. available ligands, folding, drying…] Most are designed to detect antibodies or enzymes in a biological system Protein is on the microarray Some detect protein-protein interaction by surface plasmon resonance other use a fluorescence based approach
Protein Microarrays
The Invitrogen Human Protein Microarray is a high-density microarray It contains thousands of unique human proteins [kinases, phophatases, GPCRs, nuclear receptors, and proteases]
Antibody Arrays -Assay hundreds of native proteins simultaneously
-Compare protein abundances in a variety of biological
samples
-GenTel and BD biosciences
-Antibody or ligand is on the microarray
Antibody Arrays-labeling scheme
Targets DNA not RNA like expression
Requires amplification of target DNA
Uses multiple probes sets to determine base change at a specific nucleotide position in the genomic DNA.
Use thousand of oligos that “tile” or span the genomic DNA for characterization.
Provides sequence and genotyping data including LOH, Linkage analysis and single nucleotide polymorphisms
SNP, Genotyping, and DNA Mapping Arrays
Resequencing Arrays [Affy]
Enable the analysis of up to 300,000+ bases of double-stranded sequence (600,000 bases total) on a single Affy array Used for large-scale resequencing of organisms genome and organelles Faster and cheaper than sequencing but very limited to few organisms and/or organelles Large potential
Exon Arrays-Alternative splice variant detection
Probes are designed for hybridizing to individual exons of genomic DNA
Tissue or development specific splicing leads to normal or expected protein diversity
Defective splicing can lead to disease
CGH Arrays- Comparative Genomic Hybridization
Provides DNA and chromosomal information DNA Copy number and allele-specific information
Enables the identification of critical gene(s) that have altered copy number and may be responsible for the development and progression of a particular disease.
Determine regions of chromosomal deletion (LOH) or amplification
CGH Arrays- Comparative Genomic Hybridization
CGH (comparative genomic hybridization) looks at cytogenetic
abnormalities
•genomic DNA hybridized to array
•often uses large clones (e.g., BACs) as array features
Tissue Arrays
Slide based “spotted” tissues (not really)
Applications of microarrays
• Measuring transcript abundance (cDNA arrays);
• Genotyping;
• Estimating DNA copy number (CGH);
• Determining identity by descent (GMS);
• Measuring mRNA decay rates;
• Identifying protein binding sites;
• Determining sub-cellular localization of gene products;
• Classification – there’s a lot of promise in medicine (especially cancer research) for this
Other types and uses of microarrays: ChIP-chip
Other types and uses of microarrays: RIP-chip
Similar to ChIP-chip but for discovering RNA binding
proteins rather than DNA binding proteins
Other types and uses of microarrays: PBMs
Protein-binding microarrays can be used to identify transcription
factor binding sequences (motifs)
•double-stranded DNA probes used on array
•purified protein hybridized to array
•detected by antibody to protein or to epitope tag
•can use real genomic sequence or carefully designed
oligonucleotides
•possible to look at all possible 10-mer nucleotide sequences
on a single array!
Berger, M.F. and M.L. Bulyk. 2006. Methods Mol Biol 338: 245-260.
Berger, M.F., A.A. Philippakis, A.M. Qureshi, F.S. He, P.W. Estep, 3rd, and M.L. Bulyk. 2006. Nat
Biotechnol 24: 1429-1435.
Microarray Limitations
Cross-hybridization of sequences with high identity
Chip to chip variation
True measure of abundance?
Does mRNA levels reflect protein levels? Generally, do not “prove” new biology - simply suggest genes involved in a
process, a hypothesis that will require traditional experimental verification.
What fold change has biological relevance?
Need cloned EST or some sequence knowledge -- rare messages may be undetected
Expensive!! Not every lab can afford experiment repeat.
The real limitation is Bioinformatics