getting data into r & bioconductor

20
Getting Data into R & Bioconductor Aedín Culhane [email protected] http://www.hsph.harvard.edu/research/aedin-culhane/ http://www.hsph.harvard.edu/research/aedin-culhane/

Upload: tavon

Post on 17-Jan-2016

41 views

Category:

Documents


3 download

DESCRIPTION

Aed í n Culhane [email protected]. Getting Data into R & Bioconductor. http://www.hsph.harvard.edu/research/aedin-culhane/. Simple Excel SpreadSheet data. Already described Read.table() Read.csv() scan() Are other formats eg netcdf However more datatype specialized. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Getting Data into R & Bioconductor

Getting Data into R & Bioconductor

Aedín Culhane

[email protected]

http://www.hsph.harvard.edu/research/aedin-culhane/http://www.hsph.harvard.edu/research/aedin-culhane/

Page 2: Getting Data into R & Bioconductor

Simple Excel SpreadSheet data

• Already described – Read.table()– Read.csv()– scan()

• Are other formats eg netcdf

• However more datatype specialized.– Look at Technologies on BiocViews.– http://www.bioconductor.org/packages/release/BiocViews.html

22

Page 3: Getting Data into R & Bioconductor

Some common data types

• Microarray

• SNP

• Increasingly NGS

May 2011May 2011 33

Page 4: Getting Data into R & Bioconductor

A Microarray OverviewA Microarray Overview

44

Page 5: Getting Data into R & Bioconductor

Reading Affymetrix Data

library(affy)

require(affy) # Alternative

affybatch <- ReadAffy(celfile.path="[Location of your data]")

eSet<-justRMA()

May 2011May 2011 55

Page 6: Getting Data into R & Bioconductor

Sample R code

66

Page 7: Getting Data into R & Bioconductor

ExpressionSet Class in R

May 2011May 2011 77

Page 8: Getting Data into R & Bioconductor

Assessing Data Quality

May 2011May 2011 88

Page 9: Getting Data into R & Bioconductor

Public Microarray Data

ArrayExpress • 21997 Studies (622,617 profiles,)

GEO • 22,735 Studies (558,074 profiles)

Statistics May 2011Statistics May 2011

Page 10: Getting Data into R & Bioconductor

>500,000 arrays x $500 = $250,000,000

Cancer Studies account for >14% of all studies in databases…

Page 11: Getting Data into R & Bioconductor

R Code

May 2011May 2011 1111

Page 12: Getting Data into R & Bioconductor

More on GEOquery

May 2011May 2011 1212

require(GEOquery) require(GEOquery)

Let's try to load the GDS810 dataset which contains data on Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity. Alzheimer's disease at various stages of severity.

GDS810<-getGEO("GDS810") GDS810<-getGEO("GDS810")

The The getGEOgetGEO function returns an object of class function returns an object of class GEODataGEOData. You can . You can get a description of this class like this: get a description of this class like this: help("GEOData-class") help("GEOData-class")

Meta(GDS810) Meta(GDS810) Columns(GDS810) Columns(GDS810) head(Table(GDS810)) head(Table(GDS810))

Page 13: Getting Data into R & Bioconductor

Affy SNP Arrays

May 2011May 2011 1313

Page 14: Getting Data into R & Bioconductor

Process – Affy SNP Arrays (Oligo package)

May 2011May 2011 1414

Page 15: Getting Data into R & Bioconductor

Other Arrays

• Illumina– Lumi package

• 2 color spotted arrays– Limma package

• Other arrays– http://www.bioconductor.org/help/workflows/

oligo-arrays/

May 2011May 2011 1515

Page 16: Getting Data into R & Bioconductor

Next Generation Sequencing Data

Page 17: Getting Data into R & Bioconductor

R Code

May 2011May 2011 1717

Page 18: Getting Data into R & Bioconductor

Exercise

• From GEO bring down GSE

• Download the dataset GSE1297 using getGEO

• This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs

• Use ArrayQualityMetrics to Assess the data quality of these data

May 2011May 2011 1818

Page 19: Getting Data into R & Bioconductor

• With thanks to

• www.bioconductor.org/help/course.../Bioconductor-Introduction-lab.pdf

May 2011May 2011 1919

Page 20: Getting Data into R & Bioconductor

A B

Quick Aside: Interpreting hierarchical clustering trees

Hierarchical analysis results viewed using a dendrogram (tree)

• Distance between nodes (Scale)• Ordering of nodes not important (like baby mobile)

Tree A and B are equivalentTree A and B are equivalent