regioner an r/bioconductor package for the magement and comparision of genomic regions anna díez...

36
regioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

Upload: trevor-cummings

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

regioneR Basic management of genomic regions Statistical evaluation Helper function making our lives easier

TRANSCRIPT

Page 1: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

regioneRan R/Bioconductor package for the magement and comparision of genomic regions

Anna DíezBernat GelRoberto Malinverni

Page 2: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

regioneR aimsPractical to use. Easy to understand.

Generic and useful

Efficient

Customizable Something we would like to use

Page 3: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

regioneRBasic management of genomic regions

Statistical evaluation

Helper function making our lives easier

Page 4: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The BasicsStatistics

Customization

Helper Functions

The Basics Statistics Customization Helper Functions

Page 5: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

THE BASICS

Page 6: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

joinRegions

The Basics Statistics Customization Helper Functions

Amin.dist

joinRegions(A, min.dist)

Page 7: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

subtractRegions

The Basics Statistics Customization Helper Functions

A

B

subtractRegions(A, B)

Page 8: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

splitRegions

The Basics Statistics Customization Helper Functions

A

B

splitRegions(A, B, min.size=1, track.original=TRUE)

Page 9: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

mergeRegions

The Basics Statistics Customization Helper Functions

commonRegions

extendRegions¿any other? flankingRegions? …

Page 10: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

overlapRegions

The Basics Statistics Customization Helper Functions

A

B

overlapRegions(A, B, colA, colB, type, min.bases, min.pctA, min.pctB, get.pctA, get.pctB, get.bases, only.boolean, only.count, ...)

Page 11: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

overlapRegions

The Basics Statistics Customization Helper Functions

Page 12: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

Example: annotateRegions

The Basics Statistics Customization Helper Functions

regAnnotation(regions, annot.tab, ann.names, strands, descr, peak.point, gap3,

gap5)

Page 13: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

STATISTICS

Page 14: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

overlapPermTest

The Basics Statistics Customization Helper Functions

A

BB

Page 15: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

overlapPermTest

The Basics Statistics Customization Helper Functions

A

B

B’4

4

3

5

4

5

2

4

0.33

1

Page 16: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

overlapPermTest

The Basics Statistics Customization Helper Functions

Page 17: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

Example: TIs

The Basics Statistics Customization Helper Functions

TIs over: 81TIs under 66

SCNA gain: 60SCNA losses: 53

Page 18: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 81Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 37.00 47.00 50.00 50.43 53.00 67.00 Standard score: 6.8117P-value: 0.000999000999000999 ***--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

overlapPermTest(TIs_over, SCNA.gains, alternative="g“, genome=“hg19”, ntimes=1000)

Gains vs Overexpression

~800s (~13min)

Page 19: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 66Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 37.00 48.00 50.00 50.18 53.00 60.00 Standard score: 4.4942P-value: 0.000999000999000999 ***--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

overlapPermTest(Tis_under, SCNA.losses, alternative="g“, genome=“hg19”, ntimes=1000)

Losses vs Underexpression

Page 20: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 25Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 39.00 41.00 41.25 44.00 52.00 Standard score: -4.2739P-value: 1 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

overlapPermTest(TIs_under, SCNA.gains, alternative="g“, genome=“hg19”, ntimes=1000)

Gains vs Underexpression

recomputePermTest(gains.under, alternative="l")

Page 21: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

Number of permutations: 1000Alternative: lessEvaluation of the original region set: 25Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 39.00 41.00 41.25 44.00 52.00 Standard score: -4.2739P-value: 0.000999000999000999 ***

overlapPermTest(TIs_under, SCNA.gains, alternative=“l“, genome=“hg19”, ntimes=1000)

Gains vs Underexpression

Page 22: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

overlapPermTest(10KrandomA, 10KrandomB, alternative=“g“, genome=“hg19”, ntimes=1000)

Random Region Sets

Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 68Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 42.00 57.00 62.00 62.16 67.00 89.00 Standard score: 0.7488P-value: 0.215784215784216 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

~1850s (~30min) (Single core)

~800s (~13min) (Parallel 4 cores)

Page 23: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

permTest

The Basics Statistics Customization Helper Functions

overlapPermTestoverlap

randomRegions

distance

resampling

value of a function

Page 24: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

permTest

The Basics Statistics Customization Helper Functions

permTest(A, ntimes=1000, randomize.function, evaluate.function, alternative, min.parallel=1000, force.parallel=NULL, ...)

overlapPermTest <- permTest(A, randomize.function=randomizeRegions, evaluate.function=countOverlaps)

Page 25: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

Example: Genes & ALUs

The Basics Statistics Customization Helper Functions

1.175.329 ALUs 9.111 overexpressed genes51.796 genes

¿Are overexpressed genes closer to ALUs than expected by chance?

Page 26: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

Example: Genes & ALUs

The Basics Statistics Customization Helper Functions

Resampling

¿Are overexpressed genes closer to ALUs than expected by chance?

Mean Distance

permTest(A=expressed, B=alus, ntimes=1000, randomize.function=resampleRegions, universe=genes2, evaluate.function=meanDistance, alternative="less")

Page 27: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

Example: Genes & ALUs

The Basics Statistics Customization Helper Functions

¿Are overexpressed genes closer to ALUs than expected by chance?

Number of permutations: 1000 Alternative: less Evaluation of the original region set: 353.371858193393 Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 912.1 992.8 1010.0 1011.0 1028.0 1095.0  Standard score: -25.0275 P-value: 0.000999000999000999 ***

Page 28: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

CUSTOMIZATION

Page 29: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

countOverlapsmeanDistancemeanInRegions

Available functions

randomizeRegionsresampleRegions

Evaluation Randomization

GC content TF binding sites Encode classification …

GC aware randomization …

Page 30: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

Custom functions

randomize.function(A,...)

Randomization

resampleRegions <- function(A, universe, ...) { resample <- universe[sample(1:length(universe), length(A))] return(resample) }

Page 31: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

Custom functions

evaluate.function(A,...)

Evaluation

meanDistance <- function(A, B, ...) {d <- distanceToNearest(A, B, ...)

return(mean(as.matrix(d@elementMetadata)[,1])) }

Page 32: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

HELPERFUNCTIONS

Page 33: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

toGRanges & toDataframe

chr start end chr1 2000 4000 chr1 5000 5500 chr1 10000 12000

GRanges with 3 ranges and 0 elementMetadata values seqnames ranges strand | <Rle> <IRanges> <Rle> | [1] chr1 [ 2000, 4000] * | [2] chr1 [ 5000, 5500] * | [3] chr1 [10000, 12000] * |

Seqlengths chr1 NA

Page 34: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

Genomes & MasksgetGenome(genome)

getMask(genome)

getGenomeAndMask(genome, mask)

characterToBSGenome(genome.id)

maskFromBSGenome(bsgenome)

emptyCache()

Page 35: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

The Basics Statistics Customization Helper Functions

RandomizationrandomizeRegions(A, genome="hg19", mask=NULL, non.overlapping=FALSE, per.chromosome=FALSE, ...)

createRandomRegions(nregions=100, length.mean=250, length.sd=20, genome="hg19", mask=NULL, non.overlapping=FALSE)

resampleRegions(A, univers, per.chromosome=FALSE, ...)

Page 36: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni

Aaaaaalmost finished: Anyone with experience in packaging for Bioconductor?

Suggestions? Requests? Improvements?

Beta Testers Wanted