microarray analysis jesse mecham cs 601r. microarray analysis it all comes down to experimental...

21
Microarray Microarray Analysis Analysis Jesse Mecham Jesse Mecham CS 601R CS 601R

Post on 21-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Microarray AnalysisMicroarray Analysis

Jesse MechamJesse MechamCS 601RCS 601R

Microarray AnalysisMicroarray Analysis

It all comes down toIt all comes down to Experimental DesignExperimental Design PreprocessingPreprocessing Data AnalysisData Analysis

Experimental DesignExperimental Design

Elimination of confounding factorsElimination of confounding factors Same cell line, minimal exposureSame cell line, minimal exposure Timing of samplingTiming of sampling

Technological considerationsTechnological considerations Hybridization considerationsHybridization considerations Chip/tag selectionChip/tag selection

Slide to DataSlide to Data

Gene ValueD26528_at 193D26561_cds1_at -70D26561_cds2_at 144D26561_cds3_at 33D26579_at 318D26598_at 1764D26599_at 1537D26600_at 1204D28114_at 707

PreprocessingPreprocessing

Data importData import

Background adjustmentBackground adjustment

NormalizationNormalization

Summarization of multiple probes per Summarization of multiple probes per transcripttranscript

Quality controlQuality control

Data ImportData Import

Incorporate various file formats into Incorporate various file formats into desired data formatsdesired data formats Different vendors have different Different vendors have different

representationsrepresentations Sometimes desired data is not providedSometimes desired data is not provided

Background AdjustmentBackground Adjustment

It all comes down to one word…noiseIt all comes down to one word…noise Optical distortionOptical distortion Non-specific hybridizationNon-specific hybridization Equipment damageEquipment damage

M vs. AM vs. A

M represents differential ratioM represents differential ratioMM = ( = (loglog R – R – loglog G) G)

A represents the fluorescence intensityA represents the fluorescence intensityA = (log A = (log RR + log + log GG)/2)/2

Desirable transformation would show Desirable transformation would show uniform distribution of differential across uniform distribution of differential across intensitiesintensities

NormalizationNormalization

Normalization between samples needs to Normalization between samples needs to be established for a variety of reasonsbe established for a variety of reasons Different reverse transcription efficiency levelsDifferent reverse transcription efficiency levels

We are using PCR to amplify in separate platesWe are using PCR to amplify in separate plates Hybridization inequalitiesHybridization inequalities

Variations in solution used in hybridization reactionVariations in solution used in hybridization reaction Spatial abnormalities between platesSpatial abnormalities between plates

Particularly apparent for in-house platesParticularly apparent for in-house plates

Background ExampleBackground Example

Possible Problem in Background?Possible Problem in Background?

Summarizing DataSummarizing Data

Process of reducing the various samples Process of reducing the various samples into an analysisinto an analysis The crux of microarray analysisThe crux of microarray analysis

Can apply a Can apply a linearlinear or a non linear model or a non linear model using any of the following techniquesusing any of the following techniques Support Vector Machines (SVM)Support Vector Machines (SVM) Neural NetworksNeural Networks Empirical BayesEmpirical Bayes

Quality ControlQuality Control

Concerned with accuracy and Concerned with accuracy and reproducibilityreproducibility Dr. Piatetsy-Shapiro (last week’s colloquium) Dr. Piatetsy-Shapiro (last week’s colloquium)

was primarily concerned with this area of was primarily concerned with this area of microarray analysismicroarray analysis

Detection of errors (x-validation)Detection of errors (x-validation)

Isolation and validation of significant resultsIsolation and validation of significant results

Corrective behaviorCorrective behavior

Time for FunTime for Fun

DatasetDataset ApoAI.RDataApoAI.RData

The apolipoprotein AI (ApoAI) gene is known to play a pivotal The apolipoprotein AI (ApoAI) gene is known to play a pivotal role in high density lipoprotein (HDL) metabolism. Mice which role in high density lipoprotein (HDL) metabolism. Mice which have the ApoAI gene knocked (KO) out have very low HDL have the ApoAI gene knocked (KO) out have very low HDL cholesterol levels.cholesterol levels.

Puprose is to determine how ApoAI deficiency affects the Puprose is to determine how ApoAI deficiency affects the action of other genes in the liveraction of other genes in the liver

Help determine what molecular pathways ApoAI operates onHelp determine what molecular pathways ApoAI operates on

MarkersMarkers

All mRNA data from both knockout and wild-type All mRNA data from both knockout and wild-type were marked were marked GREENGREEN

KO and WT are marked KO and WT are marked REDRED Oftentimes, both populations are run on same plate Oftentimes, both populations are run on same plate

with one being marked with one being marked REDRED and the other marked and the other marked GREENGREEN

RRwww.r-project.orgwww.r-project.org

““S”-like GNU project language and S”-like GNU project language and environment for statistical computingenvironment for statistical computingGreat free package for linear and non-Great free package for linear and non-linear statistical modelinglinear statistical modelingAlso includes:Also includes:

an effective data handling and storage facility, an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, graphical facilities for data analysis and display either on-screen or on hardcopy,

and and a well-developed, simple and effective programming language which includes a well-developed, simple and effective programming language which includes

conditionals, loops, user-defined recursive functions and input and output conditionals, loops, user-defined recursive functions and input and output facilities. facilities.

BioconductorBioconductorhttp://http://bioconductor.orgbioconductor.org

Open source package for statistical Open source package for statistical analysis of genomic dataanalysis of genomic data

Includes both statistical and graphical Includes both statistical and graphical toolstools

Active project with a constant influx of new Active project with a constant influx of new packagespackages

Does not include more complex analysis Does not include more complex analysis tools at this time (SVM’s, etc.)tools at this time (SVM’s, etc.)

With ControlsWith Controls

Controls RemovedControls Removed