Microarray Dataset: quick mining and gene profile analysis using online tools
Dr. Etienne Z. GNIMPIEBA
Sioux Falls, March 2013
Plan Gene expression measurement Microarray processGene expression data storesData mining / queringData analysisExample: ATP13A2 profile in stress
conditions
Gene expression measurement
Higher-plex techniques: SAGEDNA microarrayTiling arrayRNA-SeqNGS
Low-to-mid-plex techniques: Reporter geneNorthern blotWestern blotFluorescent in situ hybridizationReverse transcription PCR
What is a Microarray?
“A DNA microarray is a multiplex technology consisting of thousands of oligonucleotide spots, each containing picomoles of a specific DNA sequence.”
Used to quantitate mRNA or DNAMany applications:
◦mRNA or DNA levels◦SNP identification◦ChIP-on-Chip
Hypotheses
Microarrays are usually hypothesis-generating:◦ They highlight specific genes or features that are
particularly interesting for follow-up experiments◦ There are many interesting exceptions
Biomarkers Pathway analyses
This does not reduce the importance of experimental design◦ the low statistical power of array studies make good
design even more important and very challenging
Microarray process (1/3)• Image analysis
(genepix)• Normalization (R)• Pre-treatment• Differential
expression• Clustering• Data mining• Annotation
Microarray process (2/3)
Microarray process (3/3)High density
filters(macroarrays)
Glass slides (microarrays)
Oligonucleotides chips
Detail: Detail: Detail:
Size: 12cm x 8cm Size: 5,4cm x 0,9cm Size: 1,28cm x 1,28cm
•2400 clones by membrane•radioactive labelling•1 experimental condition by membrane
•10000 clones by slide•fluorescent labelling•2 experimental conditions by slide
•300000 oligonucleotides by slide•fluorescent labelling•1 experimental condition by slide
Gene expression data management
DatabaseMicroarray Experiment
Sets
Sample Profiles Date Reported
ArrayExpress at EBI 24,838 708,914 October 28, 2011
ArrayTrack™ 1,622 50,953 February 11, 2012
caArray at NCI 41 1,741 November 15, 2006
Gene Expression Omnibus - NCBI 25,859 641,770 October 28, 2011
Genevestigator database 2,500 65,000 January 2012
MUSC database ~45 555 April 1, 2007
Stanford Microarray database 82,542 Not reported October 23, 2011
UNC Microarray database ~31 2,093 April 1, 2007
UNC modENCODE Microarray database ~6 180 July 17, 2009
UPenn RAD database ~100 ~2,500 September 1, 2007
UPSC-BASE ~100 Not reported November 15, 2007
SAGEGEOGUDMAP (421)MGIBIOGPS
Data mining / querying
Problem specificationQueryExtractionStorage LoadPretreat / prepare for analysis
Data analysis (1/3)Question-Answer
◦ Experimental condition profile: group comparison
◦ Annotation profile: systems biological involved◦ Clustering profile: co-regulation◦ Time course profile: time variation◦ …
Descriptive ◦ Boxplot (SD, MEAN, MEDIAN, )◦ Scatter plot
Predictive / inference (clustering)Modeling (machine learning, simulation)
Data analysis (2/3)
3 Questions ◦What is the right dataset (experimental condition)?
◦ Is dataset is ready for analysis (quality)?
◦What is the expression profile for a given gene?
◦Significant differential expression in groups comparison
Tools◦ArrayExpress (EBI)
◦Boxplot
◦GEO2R (LIMMA, profile graph,)
◦….
Data analysis (3/3)
Boxplot
Example: ATP13A2 profile in stress conditions
Specification: ATP13A2 profile in stress conditions
Data querying: ◦GEO◦Array Express ◦Gene Atlas
Data analysis: ◦Online: GEO2R, Genospace, …◦Desktop: R, ArrayTrack, …
Significant differential expression !!!
Kerry Bemis slides