regioner an r/bioconductor package for the magement and comparision of genomic regions anna díez...
DESCRIPTION
regioneR Basic management of genomic regions Statistical evaluation Helper function making our lives easierTRANSCRIPT
regioneRan R/Bioconductor package for the magement and comparision of genomic regions
Anna DíezBernat GelRoberto Malinverni
regioneR aimsPractical to use. Easy to understand.
Generic and useful
Efficient
Customizable Something we would like to use
regioneRBasic management of genomic regions
Statistical evaluation
Helper function making our lives easier
The BasicsStatistics
Customization
Helper Functions
The Basics Statistics Customization Helper Functions
The Basics Statistics Customization Helper Functions
THE BASICS
joinRegions
The Basics Statistics Customization Helper Functions
Amin.dist
joinRegions(A, min.dist)
subtractRegions
The Basics Statistics Customization Helper Functions
A
B
subtractRegions(A, B)
splitRegions
The Basics Statistics Customization Helper Functions
A
B
splitRegions(A, B, min.size=1, track.original=TRUE)
mergeRegions
The Basics Statistics Customization Helper Functions
commonRegions
extendRegions¿any other? flankingRegions? …
overlapRegions
The Basics Statistics Customization Helper Functions
A
B
overlapRegions(A, B, colA, colB, type, min.bases, min.pctA, min.pctB, get.pctA, get.pctB, get.bases, only.boolean, only.count, ...)
overlapRegions
The Basics Statistics Customization Helper Functions
Example: annotateRegions
The Basics Statistics Customization Helper Functions
regAnnotation(regions, annot.tab, ann.names, strands, descr, peak.point, gap3,
gap5)
The Basics Statistics Customization Helper Functions
STATISTICS
overlapPermTest
The Basics Statistics Customization Helper Functions
A
BB
overlapPermTest
The Basics Statistics Customization Helper Functions
A
B
B’4
4
3
5
4
5
2
4
0.33
1
overlapPermTest
The Basics Statistics Customization Helper Functions
Example: TIs
The Basics Statistics Customization Helper Functions
TIs over: 81TIs under 66
SCNA gain: 60SCNA losses: 53
The Basics Statistics Customization Helper Functions
Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 81Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 37.00 47.00 50.00 50.43 53.00 67.00 Standard score: 6.8117P-value: 0.000999000999000999 ***--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
overlapPermTest(TIs_over, SCNA.gains, alternative="g“, genome=“hg19”, ntimes=1000)
Gains vs Overexpression
~800s (~13min)
The Basics Statistics Customization Helper Functions
Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 66Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 37.00 48.00 50.00 50.18 53.00 60.00 Standard score: 4.4942P-value: 0.000999000999000999 ***--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
overlapPermTest(Tis_under, SCNA.losses, alternative="g“, genome=“hg19”, ntimes=1000)
Losses vs Underexpression
The Basics Statistics Customization Helper Functions
Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 25Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 39.00 41.00 41.25 44.00 52.00 Standard score: -4.2739P-value: 1 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
overlapPermTest(TIs_under, SCNA.gains, alternative="g“, genome=“hg19”, ntimes=1000)
Gains vs Underexpression
recomputePermTest(gains.under, alternative="l")
The Basics Statistics Customization Helper Functions
Number of permutations: 1000Alternative: lessEvaluation of the original region set: 25Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 39.00 41.00 41.25 44.00 52.00 Standard score: -4.2739P-value: 0.000999000999000999 ***
overlapPermTest(TIs_under, SCNA.gains, alternative=“l“, genome=“hg19”, ntimes=1000)
Gains vs Underexpression
The Basics Statistics Customization Helper Functions
overlapPermTest(10KrandomA, 10KrandomB, alternative=“g“, genome=“hg19”, ntimes=1000)
Random Region Sets
Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 68Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 42.00 57.00 62.00 62.16 67.00 89.00 Standard score: 0.7488P-value: 0.215784215784216 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
~1850s (~30min) (Single core)
~800s (~13min) (Parallel 4 cores)
permTest
The Basics Statistics Customization Helper Functions
overlapPermTestoverlap
randomRegions
distance
resampling
value of a function
permTest
The Basics Statistics Customization Helper Functions
permTest(A, ntimes=1000, randomize.function, evaluate.function, alternative, min.parallel=1000, force.parallel=NULL, ...)
overlapPermTest <- permTest(A, randomize.function=randomizeRegions, evaluate.function=countOverlaps)
Example: Genes & ALUs
The Basics Statistics Customization Helper Functions
1.175.329 ALUs 9.111 overexpressed genes51.796 genes
¿Are overexpressed genes closer to ALUs than expected by chance?
Example: Genes & ALUs
The Basics Statistics Customization Helper Functions
Resampling
¿Are overexpressed genes closer to ALUs than expected by chance?
Mean Distance
permTest(A=expressed, B=alus, ntimes=1000, randomize.function=resampleRegions, universe=genes2, evaluate.function=meanDistance, alternative="less")
Example: Genes & ALUs
The Basics Statistics Customization Helper Functions
¿Are overexpressed genes closer to ALUs than expected by chance?
Number of permutations: 1000 Alternative: less Evaluation of the original region set: 353.371858193393 Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 912.1 992.8 1010.0 1011.0 1028.0 1095.0 Standard score: -25.0275 P-value: 0.000999000999000999 ***
The Basics Statistics Customization Helper Functions
CUSTOMIZATION
The Basics Statistics Customization Helper Functions
countOverlapsmeanDistancemeanInRegions
Available functions
randomizeRegionsresampleRegions
Evaluation Randomization
GC content TF binding sites Encode classification …
GC aware randomization …
The Basics Statistics Customization Helper Functions
Custom functions
randomize.function(A,...)
Randomization
resampleRegions <- function(A, universe, ...) { resample <- universe[sample(1:length(universe), length(A))] return(resample) }
The Basics Statistics Customization Helper Functions
Custom functions
evaluate.function(A,...)
Evaluation
meanDistance <- function(A, B, ...) {d <- distanceToNearest(A, B, ...)
return(mean(as.matrix(d@elementMetadata)[,1])) }
The Basics Statistics Customization Helper Functions
HELPERFUNCTIONS
The Basics Statistics Customization Helper Functions
toGRanges & toDataframe
chr start end chr1 2000 4000 chr1 5000 5500 chr1 10000 12000
GRanges with 3 ranges and 0 elementMetadata values seqnames ranges strand | <Rle> <IRanges> <Rle> | [1] chr1 [ 2000, 4000] * | [2] chr1 [ 5000, 5500] * | [3] chr1 [10000, 12000] * |
Seqlengths chr1 NA
The Basics Statistics Customization Helper Functions
Genomes & MasksgetGenome(genome)
getMask(genome)
getGenomeAndMask(genome, mask)
characterToBSGenome(genome.id)
maskFromBSGenome(bsgenome)
emptyCache()
The Basics Statistics Customization Helper Functions
RandomizationrandomizeRegions(A, genome="hg19", mask=NULL, non.overlapping=FALSE, per.chromosome=FALSE, ...)
createRandomRegions(nregions=100, length.mean=250, length.sd=20, genome="hg19", mask=NULL, non.overlapping=FALSE)
resampleRegions(A, univers, per.chromosome=FALSE, ...)
Aaaaaalmost finished: Anyone with experience in packaging for Bioconductor?
Suggestions? Requests? Improvements?
Beta Testers Wanted