overview of bioconductor

30
Overview of Bioconductor Aedín Culhane [email protected] http://bcb.dfci.harvard.edu/~aedin http://www.hsph.harvard.edu/research/aedin-culhane

Upload: jovan

Post on 16-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Aed í n Culhane [email protected]. Overview of Bioconductor. http://bcb.dfci.harvard.edu/~aedin http://www.hsph.harvard.edu/research/aedin-culhane. Bioconductor. Biannual release (normally April, October) to coincide with R release. Current: Bioconductor 2.9 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overview of Bioconductor

Overview of Bioconductor

Aedín Culhane

[email protected]

http://bcb.dfci.harvard.edu/~aedin

http://www.hsph.harvard.edu/research/aedin-culhane

Page 2: Overview of Bioconductor

BioconductorBiannual release (normally April, October) to coincide with R release.

Current: Bioconductor 2.9 (release coincide with R 2.14)

To install use script on Bioconductor Website source("http://www.bioconductor.org/biocLite.R")

biocLite()

Page 3: Overview of Bioconductor

Packages Overview

BioConductor web site

• Bioconductor BiocViews Task view

Software

Annotation Data

Experimental Data

Page 4: Overview of Bioconductor

What Packages do I need?

Specific to you data and analysis pipeline but for examples:

• Bioconductor Workshops

• Bioconductor Workflows

Page 5: Overview of Bioconductor

Main types of Annotation Packages• Gene centric AnnotationDbi packages:

– Organism: org.Mm.eg.db.

– Technology/Platform: hgu133plus2.db.

– GeneSets and Pathway (biology level): GO.db or KEGG.db

– .db packages can be queried with sql or accessed using annotation package (totable, get, mget)

• Genome centric GenomicFeatures packages:– Transriptome level: TxDb.Hsapiens.UCSC.hg19.knownGene

– Generic features: Can generate via GenomicFeatures

• biomaRt:– Query web-based `biomart' resource for genes, sequence, SNPs, and

etc.• See http://www.bioconductor.org/help/course-materials/2011/BioC2011/LabStuff/AnnotationSlidesBioc2011.pdf

Page 6: Overview of Bioconductor

Bioconductor resources

• Mailing List (sign up for daily digest)

• Documentation, workshop/course material online– Slides from talks, pdf of tutorials, R code

• Help available for each software package– Each package MUST contain vignette (howto)

• Other resources ww.Rseek.org www.r-bloggers.com

Page 7: Overview of Bioconductor

Vignette

• Tutorials, provide worked example of package• Required in Bioconductor packages• Written in Sweave (Leisch, 2002).

– LATEX dynamic reports in which R code is embedded and executable

– All R code in vignette is checked (and executed) by R CMD check

– http://www.bioconductor.org/docs/vignettes.html

library("Biobase") library("GOstats") # Load package of interestopenVignette()

Page 8: Overview of Bioconductor

S4 classes and ExpressionSet

• Within Bioconductor, you will encounter packages are structured around S4 object-oriented programming proposed by John Chambers (developer of S)

• A class provides a software abstraction of a real world object.

• A method performs an action on a class(Think of a class as a noun, and method as verb)

Page 9: Overview of Bioconductor

Object (S4)

• An object is an instance of a class.

• Descriptions are stored in slots

• slotNames(ob1) lists all slots in object, or use str().

• To access slots– ob1@slotname– slotname(ob1), or– slot(ob1, “slotname")

Page 10: Overview of Bioconductor

Example: ExpressionSet

library(ALL)

data(ALL)

slotNames(ALL)

ALL@phenoData

phenoData(ALL)

class(ALL)

?ExpressionSet

> ALL

ExpressionSet (storageMode: lockedEnvironment)

assayData: 12625 features, 128 samples

element names: exprs

protocolData: none

phenoData

sampleNames: 01005 01010 ... LAL4 (128 total)

varLabels: cod diagnosis ... date last seen (21 total)

varMetadata: labelDescription

featureData: none

experimentData: use 'experimentData(object)'

pubMedIds: 14684422 16243790

Annotation: hgu95av2

Page 11: Overview of Bioconductor

Method which act on a S4 class

showMethods(class= "ExpressionSet")

getMethod("write.exprs", "ExpressionSet")

Or if you wish to see how the package really works, download and look the source code

Page 12: Overview of Bioconductor

Getting Data into R & Bioconductor

Aedín Culhane

[email protected]

http://www.hsph.harvard.edu/research/aedin-culhane/

Page 13: Overview of Bioconductor

Simple Excel SpreadSheet data

• Simple table

– read.table()

– read.csv()

– scan()

• However more datatype specialized. See Technologies on BiocViews.

– http://www.bioconductor.org/packages/release/BiocViews.html

• Large data files. Also see http://www.revolutionanalytics.com

13

Page 14: Overview of Bioconductor

Some common data types

• Microarray

• SNP

• NGS

May 2011 14

Page 15: Overview of Bioconductor

A Microarray OverviewA Microarray Overview

15

Page 16: Overview of Bioconductor

Reading Affymetrix Data

library(affy)

require(affy) # Alternative

affybatch <- ReadAffy(celfile.path="[Location of your data]")

eSet<-justRMA()

May 2011 16

Page 17: Overview of Bioconductor

Sample R code

17

Page 18: Overview of Bioconductor

ExpressionSet Class in R

May 2011 18

Page 19: Overview of Bioconductor

Assessing Data Quality

May 2011 19

Page 20: Overview of Bioconductor

Public Microarray Data

ArrayExpress • 21997 Studies (622,617 profiles,)

GEO • 22,735 Studies (558,074 profiles)

Statistics May 2011

Page 21: Overview of Bioconductor

R Code

May 2011 21

Page 22: Overview of Bioconductor

More on GEOquery

May 2011 22

require(GEOquery)

Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity.

GDS810<-getGEO("GDS810")

The getGEO function returns an object of class GEOData. You can get a description of this class like this: help("GEOData-class")

Meta(GDS810) Columns(GDS810) head(Table(GDS810))

Page 23: Overview of Bioconductor

Affy SNP Arrays

May 2011 23

Page 24: Overview of Bioconductor

Process – Affy SNP Arrays (Oligo package)

May 2011 24

Page 25: Overview of Bioconductor

Other Arrays

• Illumina– Lumi package

• 2 color spotted arrays– Limma package

• Other arrays– http://www.bioconductor.org/help/workflows/

oligo-arrays/

May 2011 25

Page 26: Overview of Bioconductor

Next Generation Sequencing Data

Page 27: Overview of Bioconductor

R Code

May 2011 27

Page 28: Overview of Bioconductor

Exercise

• Install the library GEOquery

• Download the dataset GSE1297 using getGEO

• This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs

• Use ArrayQualityMetrics to Assess the data quality of these data

May 2011 28

Page 29: Overview of Bioconductor

R basics: Getting help

• To get help– ?mean– help(mean)

• help.search(“mean”)

• apropos("mean")

• example(mean)

• http://www.bioconductor.org/help/

Page 30: Overview of Bioconductor

• With thanks to

• www.bioconductor.org/help/course.../Bioconductor-Introduction-lab.pdf

May 2011 30