R/AllGenerics.R, R/methods-matchRanges.R
matchRanges.RdmatchRanges() uses a propensity score-based method to
generate a covariate-matched control set of DataFrame,
GRanges, or GInteractions objects.
matchRanges(focal, pool, covar, method = "nearest", replace = TRUE, ...)
# S4 method for class 'DF_OR_df_OR_dt,DF_OR_df_OR_dt,formula,character_OR_missing,logical_OR_missing'
matchRanges(focal, pool, covar, method, replace)
# S4 method for class 'GRanges,GRanges,formula,character_OR_missing,logical_OR_missing'
matchRanges(focal, pool, covar, method, replace)
# S4 method for class 'GInteractions,GInteractions,formula,character_OR_missing,logical_OR_missing'
matchRanges(focal, pool, covar, method, replace)A DataFrame, GRanges, or GInteractions object containing the focal data to match.
A DataFrame, GRanges, or GInteractions object containing the pool from which to select matches.
A rhs formula with covariates on which to match.
A character describing which matching method to use. supported options are either 'nearest', 'rejection', or 'stratified'.
TRUE/FALSE describing whether to select matches with or without replacement.
Additional arguments.
A covariate-matched control set of data.
Available inputs for focal and pool include data.frame,
data.table, DataFrame, GRanges, or GInteractions.
data.frame and data.table inputs are coerced to DataFrame
objects and returned as MatchedDataFrame while GRanges and
GInteractions objects are returned as MatchedGRanges or
MatchedGInteractions, respectively.
matchRanges uses
propensity scores
to perform subset selection on the pool set such that the resulting matched
set contains similar distributions of covariates to that of the focal set.
A propensity score is the conditional probability of assigning an element
(in our case, a genomic range) to a particular outcome (Y) given a set of
covariates. Propensity scores are estimated using a logistic regression model
where the outcome Y=1 for focal and Y=0 for pool, over the provided
covariates covar.
method = 'nearest': Nearest neighbor matching
with replacement. Finds the nearest neighbor by using a
rolling join with data.table. Matching without replacement
is not currently supported.
method = 'rejection': (Default) Rejection sampling
with or without replacement. Uses a probability-based approach
to select options in the pool that match the focal distribition.
method = 'stratified': Iterative stratified sampling
with or without replacement. Bins focal and pool propensity
scores by value and selects matches within bins until all focal
items have a corresponding match in pool.
matchRanges manuscript:
Eric S. Davis, Wancen Mu, Stuart Lee, Mikhail G. Dozmorov, Michael I. Love, Douglas H. Phanstiel. 2023. "matchRanges: Generating null hypothesis genomic ranges via covariate-matched sampling." Bioinformatics. doi: 10.1093/bioinformatics/btad197
## Match with DataFrame
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3)
#> MatchedDataFrame with 500 rows and 3 columns
#> feature1 feature2 feature3
#> <logical> <numeric> <character>
#> 1 FALSE 2.87088 c
#> 2 FALSE 3.54290 c
#> 3 FALSE 7.11436 c
#> 4 FALSE 10.78965 b
#> 5 FALSE 4.25960 c
#> ... ... ... ...
#> 496 FALSE 0.173349 e
#> 497 FALSE 4.362421 a
#> 498 FALSE 3.182474 e
#> 499 FALSE 4.688994 d
#> 500 FALSE 5.068635 d
## Match with GRanges
set.seed(123)
x <- makeExampleMatchedDataSet(type = "GRanges")
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3)
#> MatchedGRanges object with 500 ranges and 3 metadata columns:
#> seqnames ranges strand | feature1 feature2 feature3
#> <Rle> <IRanges> <Rle> | <logical> <numeric> <character>
#> [1] chr1 8696-8795 * | FALSE 2.87088 c
#> [2] chr1 4386-4485 * | FALSE 3.54290 c
#> [3] chr1 1094-1193 * | FALSE 7.11436 c
#> [4] chr1 5705-5804 * | FALSE 10.78965 b
#> [5] chr1 1643-1742 * | FALSE 4.25960 c
#> ... ... ... ... . ... ... ...
#> [496] chr1 7288-7387 * | FALSE 0.173349 e
#> [497] chr1 5539-5638 * | FALSE 4.362421 a
#> [498] chr1 8499-8598 * | FALSE 3.182474 e
#> [499] chr1 6507-6606 * | FALSE 4.688994 d
#> [500] chr1 1860-1959 * | FALSE 5.068635 d
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Match with GInteractions
set.seed(123)
x <- makeExampleMatchedDataSet(type = "GInteractions")
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3)
#> MatchedGInteractions object with 500 interactions and 3 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 | feature1 feature2
#> <Rle> <IRanges> <Rle> <IRanges> | <logical> <numeric>
#> [1] chr1 8696-8795 --- chr1 8696-8795 | FALSE 2.87088
#> [2] chr1 4386-4485 --- chr1 4386-4485 | FALSE 3.54290
#> [3] chr1 1094-1193 --- chr1 1094-1193 | FALSE 7.11436
#> [4] chr1 5705-5804 --- chr1 5705-5804 | FALSE 10.78965
#> [5] chr1 1643-1742 --- chr1 1643-1742 | FALSE 4.25960
#> ... ... ... ... ... ... . ... ...
#> [496] chr1 7288-7387 --- chr1 7288-7387 | FALSE 0.173349
#> [497] chr1 5539-5638 --- chr1 5539-5638 | FALSE 4.362421
#> [498] chr1 8499-8598 --- chr1 8499-8598 | FALSE 3.182474
#> [499] chr1 6507-6606 --- chr1 6507-6606 | FALSE 4.688994
#> [500] chr1 1860-1959 --- chr1 1860-1959 | FALSE 5.068635
#> feature3
#> <character>
#> [1] c
#> [2] c
#> [3] c
#> [4] b
#> [5] c
#> ... ...
#> [496] e
#> [497] a
#> [498] e
#> [499] d
#> [500] d
#> -------
#> regions: 10500 ranges and 0 metadata columns
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Nearest neighbor matching with replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3,
method = 'nearest',
replace = TRUE)
#> MatchedDataFrame with 500 rows and 3 columns
#> feature1 feature2 feature3
#> <logical> <numeric> <character>
#> 1 FALSE 2.87088 c
#> 2 FALSE 3.54290 c
#> 3 FALSE 7.11436 c
#> 4 FALSE 10.78965 b
#> 5 FALSE 4.25960 c
#> ... ... ... ...
#> 496 FALSE 0.173349 e
#> 497 FALSE 4.362421 a
#> 498 FALSE 3.182474 e
#> 499 FALSE 4.688994 d
#> 500 FALSE 5.068635 d
## Rejection sampling without replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3,
method = 'rejection',
replace = FALSE)
#> MatchedDataFrame with 500 rows and 3 columns
#> feature1 feature2 feature3
#> <logical> <numeric> <character>
#> 1 FALSE 5.27263 c
#> 2 FALSE 10.02427 c
#> 3 FALSE 8.19732 a
#> 4 FALSE 7.83727 c
#> 5 FALSE 4.69472 c
#> ... ... ... ...
#> 496 FALSE 1.95550 a
#> 497 FALSE 4.98042 c
#> 498 FALSE 2.62560 e
#> 499 FALSE 1.40300 c
#> 500 FALSE 3.89279 b
## Stratified sampling without replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3,
method = 'stratified',
replace = FALSE)
#> MatchedDataFrame with 500 rows and 3 columns
#> feature1 feature2 feature3
#> <logical> <numeric> <character>
#> 1 FALSE 2.87088 c
#> 2 FALSE 3.54290 c
#> 3 FALSE 7.11436 c
#> 4 FALSE 10.78965 b
#> 5 FALSE 4.25960 c
#> ... ... ... ...
#> 496 FALSE 1.32077 b
#> 497 FALSE 10.96586 d
#> 498 FALSE 4.32895 b
#> 499 FALSE 4.68899 d
#> 500 FALSE 5.06864 d