In this vignette, we demonstrate the unsegmented block bootstrap functionality implemented in nullranges. “Unsegmented” refers to the fact that this implementation does not consider segmentation of the genome for sampling of blocks, see the segmented block bootstrap vignette for the alternative implementation.

Timing on DHS peaks

First we use the DNase hypersensitivity peaks in A549 downloaded from AnnotationHub, and pre-processed as described in the nullrangesOldData package.

library(nullrangesData)
## Loading required package: ExperimentHub
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Loading required package: AnnotationHub
## Loading required package: BiocFileCache
## Loading required package: dbplyr
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: InteractionSet
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
## The following object is masked from 'package:ExperimentHub':
## 
##     cache
## The following object is masked from 'package:AnnotationHub':
## 
##     cache
## snapshotDate(): 2021-10-18
dhs <- DHSA549Hg38()
## see ?nullrangesData and browseVignettes('nullrangesData') for documentation
## loading from cache

The following chunk of code evaluates various types of bootstrap/permutation schemes, first within chromosome, and then across chromosome (the default). The default type is bootstrap, and the default for withinChrom is FALSE (bootstrapping with blocks moving across chromosomes).

set.seed(5) # reproducibility
library(microbenchmark)
blockLength <- 5e5
microbenchmark(
  list=alist(
    p_within=bootRanges(dhs, blockLength=blockLength,
                        type="permute", withinChrom=TRUE),
    b_within=bootRanges(dhs, blockLength=blockLength,
                        type="bootstrap", withinChrom=TRUE),
    p_across=bootRanges(dhs, blockLength=blockLength,
                        type="permute", withinChrom=FALSE),
    b_across=bootRanges(dhs, blockLength=blockLength,
                        type="bootstrap", withinChrom=FALSE)
  ), times=10)
## Unit: milliseconds
##      expr       min        lq      mean    median        uq       max neval
##  p_within 1180.1950 1226.5619 1239.0088 1240.3942 1258.1794 1285.3867    10
##  b_within 1065.7010 1083.1175 1206.4929 1117.5276 1168.7585 1804.8513    10
##  p_across  245.3502  254.3676  351.6325  264.3601  297.7970 1000.5426    10
##  b_across  273.2062  275.6005  359.3636  289.8705  306.3977  990.5593    10

Visualize on synthetic data

We create some synthetic ranges in order to visualize the different options of the unsegmented bootstrap implemented in nullranges.

library(GenomicRanges)
seq_nms <- rep(c("chr1","chr2","chr3"),c(4,5,2))
gr <- GRanges(seqnames=seq_nms,
              IRanges(start=c(1,101,121,201,
                              101,201,216,231,401,
                              1,101),
                      width=c(20, 5, 5, 30,
                              20, 5, 5, 5, 30,
                              80, 40)),
              seqlengths=c(chr1=300,chr2=450,chr3=200),
              chr=factor(seq_nms))

The following function uses functionality from plotgardener to plot the ranges. Note in the plotting helper function that chr will be used to color ranges by chromosome of origin.

suppressPackageStartupMessages(library(plotgardener))
plotGRanges <- function(gr) {
  pageCreate(width = 5, height = 2, xgrid = 0,
                ygrid = 0, showGuides = FALSE)
  for (i in seq_along(seqlevels(gr))) {
    chrom <- seqlevels(gr)[i]
    chromend <- seqlengths(gr)[[chrom]]
    suppressMessages({
      p <- pgParams(chromstart = 0, chromend = chromend,
                    x = 0.5, width = 4*chromend/500, height = 0.5,
                    at = seq(0, chromend, 50),
                    fill = colorby("chr", palette=palette.colors))
      prngs <- plotRanges(data = gr, params = p,
                          chrom = chrom,
                          y = 0.25 + (i-1)*.7,
                          just = c("left", "bottom"))
      annoGenomeLabel(plot = prngs, params = p, y = 0.30 + (i-1)*.7)
    })
  }
}
plotGRanges(gr)

Within chromosome

Visualizing two permutations of blocks within chromosome:

for (i in 1:2) {
  gr_prime <- bootRanges(gr, blockLength=100, type="permute", withinChrom=TRUE)
  plotGRanges(gr_prime)
}

Visualizing two bootstraps within chromosome:

for (i in 1:2) {
  gr_prime <- bootRanges(gr, blockLength=100, withinChrom=TRUE)
  plotGRanges(gr_prime)
}

Across chromosome

Visualizing two permutations of blocks across chromosome. Here we use larger blocks than previously.

for (i in 1:2) {
  gr_prime <- bootRanges(gr, blockLength=200, type="permute", withinChrom=FALSE)
  plotGRanges(gr_prime)
}

Visualizing two bootstraps across chromosome:

for (i in 1:2) {
  gr_prime <- bootRanges(gr, blockLength=200, withinChrom=FALSE)
  plotGRanges(gr_prime)
}

Session information

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] plotgardener_0.99.16        microbenchmark_1.4-7       
##  [3] nullranges_0.99.19          nullrangesData_0.99.2      
##  [5] InteractionSet_1.21.1       SummarizedExperiment_1.23.5
##  [7] Biobase_2.53.0              MatrixGenerics_1.5.4       
##  [9] matrixStats_0.61.0          GenomicRanges_1.45.0       
## [11] GenomeInfoDb_1.29.10        IRanges_2.27.2             
## [13] S4Vectors_0.31.5            ExperimentHub_2.1.4        
## [15] AnnotationHub_3.1.7         BiocFileCache_2.1.1        
## [17] dbplyr_2.1.1                BiocGenerics_0.39.2        
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-2              rjson_0.2.20                 
##   [3] ellipsis_0.3.2                ggridges_0.5.3               
##   [5] mclust_5.4.7                  rprojroot_2.0.2              
##   [7] XVector_0.33.0                fs_1.5.0                     
##   [9] bit64_4.0.5                   interactiveDisplayBase_1.31.2
##  [11] AnnotationDbi_1.55.2          fansi_0.5.0                  
##  [13] mvtnorm_1.1-3                 cachem_1.0.6                 
##  [15] knitr_1.36                    jsonlite_1.7.2               
##  [17] speedglm_0.3-3                Rsamtools_2.9.1              
##  [19] png_0.1-7                     shiny_1.7.1                  
##  [21] BiocManager_1.30.16           compiler_4.1.1               
##  [23] httr_1.4.2                    assertthat_0.2.1             
##  [25] Matrix_1.3-4                  fastmap_1.1.0                
##  [27] later_1.3.0                   htmltools_0.5.2              
##  [29] tools_4.1.1                   gtable_0.3.0                 
##  [31] glue_1.4.2                    GenomeInfoDbData_1.2.7       
##  [33] dplyr_1.0.7                   rappdirs_0.3.3               
##  [35] Rcpp_1.0.7                    jquerylib_0.1.4              
##  [37] pkgdown_1.6.1                 vctrs_0.3.8                  
##  [39] Biostrings_2.61.2             strawr_0.0.9                 
##  [41] rtracklayer_1.53.1            xfun_0.27                    
##  [43] stringr_1.4.0                 plyranges_1.13.1             
##  [45] mime_0.12                     lifecycle_1.0.1              
##  [47] restfulr_0.0.13               XML_3.99-0.8                 
##  [49] zlibbioc_1.39.0               MASS_7.3-54                  
##  [51] scales_1.1.1                  ragg_1.1.3                   
##  [53] promises_1.2.0.1              parallel_4.1.1               
##  [55] RColorBrewer_1.1-2            yaml_2.2.1                   
##  [57] curl_4.3.2                    memoise_2.0.0                
##  [59] ggplot2_3.3.5                 yulab.utils_0.0.4            
##  [61] sass_0.4.0                    stringi_1.7.5                
##  [63] RSQLite_2.2.8                 highr_0.9                    
##  [65] BiocVersion_3.14.0            BiocIO_1.3.0                 
##  [67] desc_1.4.0                    filelock_1.0.2               
##  [69] BiocParallel_1.27.17          rlang_0.4.12                 
##  [71] pkgconfig_2.0.3               systemfonts_1.0.3            
##  [73] bitops_1.0-7                  pracma_2.3.3                 
##  [75] evaluate_0.14                 lattice_0.20-45              
##  [77] purrr_0.3.4                   GenomicAlignments_1.29.0     
##  [79] ks_1.13.2                     bit_4.0.4                    
##  [81] tidyselect_1.1.1              plyr_1.8.6                   
##  [83] magrittr_2.0.1                R6_2.5.1                     
##  [85] generics_0.1.0                DelayedArray_0.19.4          
##  [87] DBI_1.1.1                     pillar_1.6.4                 
##  [89] withr_2.4.2                   KEGGREST_1.33.0              
##  [91] RCurl_1.98-1.5                tibble_3.1.5                 
##  [93] crayon_1.4.1                  KernSmooth_2.23-20           
##  [95] utf8_1.2.2                    rmarkdown_2.11               
##  [97] grid_4.1.1                    data.table_1.14.2            
##  [99] blob_1.2.2                    digest_0.6.28                
## [101] xtable_1.8-4                  httpuv_1.6.3                 
## [103] gridGraphics_0.5-1            textshaping_0.3.6            
## [105] munsell_0.5.0                 ggplotify_0.1.0              
## [107] bslib_0.3.1