3/24/2005 tigp 1 bioinformatics for microarray studies at ibs pei-ing hwang, ph.d. mar. 24, 2005

44
3/24/2005 3/24/2005 TIGP TIGP 1 Bioinformatics for Bioinformatics for Microarray Studies Microarray Studies at IBS at IBS Pei-Ing Hwang, Ph.D. Pei-Ing Hwang, Ph.D. Mar. 24, 2005 Mar. 24, 2005

Upload: bridget-smith

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

3/24/20053/24/2005 TIGPTIGP 11

Bioinformatics for Bioinformatics for Microarray Studies at Microarray Studies at

IBSIBSPei-Ing Hwang, Ph.D. Pei-Ing Hwang, Ph.D.

Mar. 24, 2005Mar. 24, 2005

TIGPTIGP 223/24/20053/24/2005

Different aspects Different aspects for life science researchfor life science research

genomics

transcriptomics

proteomics

TIGPTIGP 333/24/20053/24/2005

Building blocks for DNA or Building blocks for DNA or RNARNA

DNA: A, T, G, CDNA: A, T, G, C RNA: A, U, G, CRNA: A, U, G, C

TIGPTIGP 443/24/20053/24/2005

DNA: deoxyribonucleic acidDNA: deoxyribonucleic acid

Double strandedDouble stranded AntiparallelAntiparallel

TIGPTIGP 553/24/20053/24/2005

Why microarray?Why microarray?

Gene ExpressionGene Expression To simultaneously study multiple genesTo simultaneously study multiple genes To obtain an overview of gene expression at To obtain an overview of gene expression at

transcriptional level under specific transcriptional level under specific experimental conditionsexperimental conditions

To study gene interaction network from the To study gene interaction network from the transcriptional aspecttranscriptional aspect

Genome Genome SNP detectionSNP detection To find out recombination site in the To find out recombination site in the

chromosome/genomechromosome/genome HopefullyHopefully to discover the gene responsible for to discover the gene responsible for

a genetic diseasea genetic disease

TIGPTIGP 663/24/20053/24/2005

OutlineOutline

Introduction to Microarray Introduction to Microarray experimentsexperiments

Experiences at IBS for the cDNA Experiences at IBS for the cDNA arrays arrays Data generated with microarray Data generated with microarray DNA annotation DNA annotation Data AnalysisData AnalysisData ManagementData Management

TIGPTIGP 773/24/20053/24/2005

About Microarray About Microarray Technology-1Technology-1

Up to hundreds of thousands of spots Up to hundreds of thousands of spots in a fixed area on a glass slide or a in a fixed area on a glass slide or a membranemembrane

One species of DNA molecules per One species of DNA molecules per one spot one spot Spot is also named as “feature”Spot is also named as “feature” DNA fixed on the chip or membrane is also called “probeDNA fixed on the chip or membrane is also called “probe

The sequence or/and function of each The sequence or/and function of each DNA species on the spot is known .DNA species on the spot is known .

TIGPTIGP 883/24/20053/24/2005

About Microarray About Microarray Technology-2Technology-2

Making use of “hybridization Making use of “hybridization method” method” A : T, U A : T, U G : CG : C

Image processingImage processingData analysisData analysisResult interpretation from biology Result interpretation from biology

aspectaspect

TIGPTIGP 993/24/20053/24/2005

Types of MicroarrayTypes of Microarray

Types of DNA immobilized on the solid Types of DNA immobilized on the solid supportsupport cDNA vs. oligonucleotidescDNA vs. oligonucleotides

Manufacturing methodsManufacturing methods Printing vs. photolithographyPrinting vs. photolithography

Solid supportSolid support Glass slidesGlass slides MembraneMembrane

Nucleotide labeling (slide scanning Nucleotide labeling (slide scanning condition)condition) One color vs. two colorsOne color vs. two colors

TIGPTIGP 10103/24/20053/24/2005

GeneChip® Array GeneChip® Array ManufacuturingManufacuturing

Figure 1. Affymetrix uses a unique combination of photolithography and combinatorial chemistry to manufacture GeneChip® Arrays.

TIGPTIGP 11113/24/20053/24/2005

Microarray printing machineMicroarray printing machine

http://arrayit.com/Products/MicroarrayI/NanoPrint/Nano-Print-new-600.jpg

TIGPTIGP 12123/24/20053/24/2005

Procedure for one-channel array

TIGPTIGP 13133/24/20053/24/2005

Experimental Procedure for 2-channel Microarray

TIGPTIGP 14143/24/20053/24/2005

Data AnalysesData Analyses

Feature intensity acquisitionFeature intensity acquisition Image analyses: Image analyses:

To identify differentially expressed genesTo identify differentially expressed genes Normalization Normalization (global, local, print-tip, btwn array (global, local, print-tip, btwn array

etc.)etc.)

Clustering or ClassificationClustering or Classification Analyses from biology aspectAnalyses from biology aspect

Significant genesSignificant genes Transcriptional regulation studyTranscriptional regulation study Cellular pathway or network findingCellular pathway or network finding

3/24/20053/24/2005 TIGPTIGP 1515

Experiences at IBS for Experiences at IBS for the cDNA arraysthe cDNA arrays

TIGPTIGP 16163/24/20053/24/2005

AboutAbout IBS tomato arraysIBS tomato arrays

~13000 spots/features per chip~13000 spots/features per chip1 clone per spot1 clone per spotcDNA clones from ~a dozen of cDNA clones from ~a dozen of

various cDNA librariesvarious cDNA librariesAt least two different protocols were At least two different protocols were

followed and six different vectors followed and six different vectors were usedwere used

More than ten technicians involvedMore than ten technicians involved

TIGPTIGP 17173/24/20053/24/2005

Bioinformatics for Microarray at Bioinformatics for Microarray at IBS (cont’d)IBS (cont’d)

IBS tomato EST database IBS tomato EST database constructionconstruction

Installation, management and Installation, management and maintenance of data analyses maintenance of data analyses software software

Reference information searchingReference information searchingBatch Submission of EST sequencesBatch Submission of EST sequences

TIGPTIGP 18183/24/20053/24/2005

Bioinformatics Needs for Microarray Bioinformatics Needs for Microarray Studies at IBSStudies at IBS

Pre-arraying data managementPre-arraying data management cDNA info collection, vector trimming, sequence cDNA info collection, vector trimming, sequence

annotation, EST submission……..etc.annotation, EST submission……..etc.

Array information managementArray information management Gene set characterization, data storage, data retrievalGene set characterization, data storage, data retrieval

Post-hybridization data analysis Post-hybridization data analysis and managementand management array data analyses, storage of the scanning result, array data analyses, storage of the scanning result,

biology-oriented bioinformatics analysesbiology-oriented bioinformatics analyses

TIGPTIGP 19193/24/20053/24/2005

Bioinformatics Service Work for Bioinformatics Service Work for Microarray studies at IBSMicroarray studies at IBS

Data pre-processing for the cDNAsData pre-processing for the cDNAsClone id assignmentClone id assignmentSequence trimmingSequence trimminggene annotationgene annotationFunction classificationFunction classification

Data sheet preparation for commercial Data sheet preparation for commercial software to analyze microarray datasoftware to analyze microarray dataGal file preparation for GenePixProGal file preparation for GenePixProMaster Gene List preparation for Master Gene List preparation for

GeneSpringGeneSpring

TIGPTIGP 20203/24/20053/24/2005

cDNA clones

GenePix

Spotfire,GeneSpring

Biological meaning :

Pathway analysis

Transcription network

Gene-gene interactionFeature intensitiesnormalization

sequencing

PCR

Vector trimming

Assembly

Function annotation

Database

Data analysis:Normalization,Variance Clustering

TIGPTIGP 21213/24/20053/24/2005

Pre-array BioinformaticsPre-array Bioinformatics

clones from labs

sequencing

Raw EST seq

1. Clone id generation

2. Vector Trimming

3. Sequence assembly

4. Seq annotation (BLAST)

5. EST submission to NCBI

6. Database construction

Data Processing and Management

TIGPTIGP 22223/24/20053/24/2005

Clone id generationClone id generation

Data centralization following Data centralization following sequencingsequencing

Rules for re-arrayingRules for re-arraying96 well plate to/from 384 well96 well plate to/from 384 wellPCR from 96 well and spotting from 384 PCR from 96 well and spotting from 384

wellwellOrder of A1, A2, B1, B2Order of A1, A2, B1, B2

TIGPTIGP 23233/24/20053/24/2005

cDNA clones

sequencing

PCR

96 or 384 well

96 well

96 well

384 well

TIGPTIGP 24243/24/20053/24/2005

96-well to 384 well plates96-well to 384 well plates

A1

B2

A2

B1

TIGPTIGP 25253/24/20053/24/2005

Data collectionData collection

Raw sequencing data obtained from the Raw sequencing data obtained from the sequencing companysequencing company

Organized and stored both ABI and text files by Organized and stored both ABI and text files by labs and by datelabs and by date

Confirmed with each sequence contributor for Confirmed with each sequence contributor for clone infoclone info

Clone id matched with raw sequencesClone id matched with raw sequences

TIGPTIGP 26263/24/20053/24/2005

Processing the sequencing Processing the sequencing datadata

cDNA libraries procedures confirmed with cDNA libraries procedures confirmed with each single labeach single lab

Vector/linker/primer trimming (Seqclean)Vector/linker/primer trimming (Seqclean)Function annotationFunction annotation

Blast against different databaseBlast against different databaseGene Ontology annotationGene Ontology annotation

Sequence Assembly (Phrap)Sequence Assembly (Phrap)

TIGPTIGP 27273/24/20053/24/2005

Procedure to generate cDNA Procedure to generate cDNA clonesclones

TIGPTIGP 28283/24/20053/24/2005

IBS tomato EST DatabaseIBS tomato EST Database

CloningCloning informationinformationSequencing data Sequencing data Vector/adaptor Trimming Vector/adaptor Trimming

informationinformationEST assemblyEST assemblyFunction annotationFunction annotationCross ReferenceCross Reference

3/24/20053/24/2005 TIGPTIGP 2929

ID MAP

1. Seq id2. Clone _ id3. Contig id4. Lab_id#15. Lab_id#26. NCBI_sbmt_id937. NCBI_sbmt_id948. dbEST _ accn _no 9. note

Trimmed Sequence

1. Seq id2. Trimmed Sequence3. Method4. Trim setAssembly Information

1. Contig _ id2. Contig Sequence3. BLAST Result4. Position5. Component seq id

TAIR Result

1. Seq id2. At number3. E-Value4. Description5. Identity6. Other result

NCBI BLAST Result

1. Seq id2. NCBI _id3. E-Value4. Description5. Identity6. Other result

TIGR Result

1. Seq id2. TC number3. E-Value4. Description5. Identity6. Other result

Lab info

1. Seq id2. Comment3. Primer4. Biotech5. Sender6. Collect From

cDNA Library Information

1. Clone _ id(3)(4) 8. Host.2. Name 9. Species3. Date made 10. Vector4. Developmental stage 11. Antibiotic.5. Cloning sites 12. Authors6. Description 13. Tissue7. Library 14. Primer

Gene Ontology

1. TC number2. EC number3. Process -GO_id -Description4. Function -GO_id -Description5. Component -GO_id -Description

TC number

Untrimmed Sequence

1. Seq id2. Trimmed Sequence

Clone _ idn1 1 n

The Tomato DatabaseEntity-Relationship model

TOM 3

TOM 4 Clone _ id

Clone _ id

Seq _ id

TIGPTIGP 30303/24/20053/24/2005

Information to be further Information to be further analyzedanalyzed

Gene set characterizationGene set characterizationNumber of unique genes on the arrayNumber of unique genes on the arrayNumber of known/ unkown genesNumber of known/ unkown genes

Coordination of each spotted Coordination of each spotted sequencesequence

Statistics about spotted cDNA Statistics about spotted cDNA grouped by function/pathwaygrouped by function/pathwaygrouped by sequence similaritygrouped by sequence similarity

3/24/20053/24/2005 TIGPTIGP 3131

Post-hybridization data Post-hybridization data analysis and analysis and managementmanagement

TIGPTIGP 32323/24/20053/24/2005

Post-hybridization data Post-hybridization data analysisanalysis

Software for Microarray Analysis At Software for Microarray Analysis At IBSIBSGenePix Pro5.0 – image processingGenePix Pro5.0 – image processingGeneSpring – microarray data analysisGeneSpring – microarray data analysisSpotfire – microarray data analysis and Spotfire – microarray data analysis and

data storagedata storageTransPath – pathway searchingTransPath – pathway searching

TIGPTIGP 33333/24/20053/24/2005

Image ProcessingImage Processing

GenePix Pro5.0GenePix Pro5.0GAL (GenePix GAL (GenePix

Array List) file Array List) file

TIGPTIGP 34343/24/20053/24/2005

From multi-well plate to From multi-well plate to microarraymicroarray

TIGPTIGP 35353/24/20053/24/2005

GAL onlineGAL online

TIGPTIGP 36363/24/20053/24/2005

GeneSpring at IBSGeneSpring at IBS

for microarray data analysesfor microarray data analyses standalone softwarestandalone software providing statistical methods for data providing statistical methods for data

analysisanalysis Some bioinformaticsSome bioinformatics providing visaulizationproviding visaulization licensed annuallylicensed annually rigid format requirement for input datarigid format requirement for input data requiring installation of a master gene list requiring installation of a master gene list

(master table) prior to data analysis(master table) prior to data analysis

TIGPTIGP 37373/24/20053/24/2005

Master table for GeneSpringMaster table for GeneSpring

Master table contains information ofMaster table contains information ofIdIdSource of DNA Source of DNA Gene nameGene nameGene function annotation (from Blast Gene function annotation (from Blast

results)results)GO annotationGO annotation

Each array needs its own master tableEach array needs its own master tableFormat of master table may vary with Format of master table may vary with

different version of the software.different version of the software.

TIGPTIGP 38383/24/20053/24/2005

To generate master table for To generate master table for GeneSpring GeneSpring

Batch blast against three sequence Batch blast against three sequence databasedatabase

Parsing Blast resultsParsing Blast results Incorporating EC number, GO number Incorporating EC number, GO number

and other related data from the best and other related data from the best BLAST matched resultsBLAST matched results

Integrate all required data from Integrate all required data from various files and generate the master various files and generate the master tabletable

checkingchecking

TIGPTIGP 39393/24/20053/24/2005

SpotfireSpotfire for microarray data analysesfor microarray data analyses server-client softwareserver-client software linked to Oracle database for data storage linked to Oracle database for data storage providing various statistical methods for data providing various statistical methods for data

analysisanalysis capability in establishing links to more capability in establishing links to more

bioinformatics toolsbioinformatics tools can record analysis procedurecan record analysis procedure more flexible format requirement for input more flexible format requirement for input

datadata

TIGPTIGP 40403/24/20053/24/2005

One color array for One color array for ArabidopsisArabidopsis

Affymetrix ATH1 chipAffymetrix ATH1 chipAnnotation information provided by Annotation information provided by

company and available on internetcompany and available on internet

TIGPTIGP 41413/24/20053/24/2005

Bioinformatics support at Affymetrix

TIGPTIGP 42423/24/20053/24/2005

Projects for now and the near Projects for now and the near futurefuture

Infrastructure build-upInfrastructure build-upMicroarray data management Microarray data management

systemsystemPlatform for Bioinformatics Platform for Bioinformatics

analysesanalysesPlant Signaling Pathway Plant Signaling Pathway

DatabaseDatabase

TIGPTIGP 43433/24/20053/24/2005

TeamTeam

3/24/20053/24/2005 TIGPTIGP 4444

Thank you!Thank you!