R/AllGenerics.R
, R/methods-matchRanges.R
matchRanges.Rd
matchRanges()
uses a propensity score-based method to
generate a covariate-matched control set of DataFrame,
GRanges, or GInteractions objects.
matchRanges(focal, pool, covar, method = "nearest", replace = TRUE, ...)
# S4 method for DF_OR_df_OR_dt,DF_OR_df_OR_dt,formula,character_OR_missing,logical_OR_missing
matchRanges(focal, pool, covar, method, replace)
# S4 method for GRanges,GRanges,formula,character_OR_missing,logical_OR_missing
matchRanges(focal, pool, covar, method, replace)
# S4 method for GInteractions,GInteractions,formula,character_OR_missing,logical_OR_missing
matchRanges(focal, pool, covar, method, replace)
A DataFrame, GRanges, or GInteractions object containing the focal data to match.
A DataFrame, GRanges, or GInteractions object containing the pool from which to select matches.
A rhs formula with covariates on which to match.
A character describing which matching method to use. supported options are either 'nearest', 'rejection', or 'stratified'.
TRUE/FALSE describing whether to select matches with or without replacement.
Additional arguments.
A covariate-matched control set of data.
Available inputs for focal
and pool
include data.frame
,
data.table
, DataFrame
, GRanges
, or GInteractions
.
data.frame
and data.table
inputs are coerced to DataFrame
objects and returned as MatchedDataFrame
while GRanges
and
GInteractions
objects are returned as MatchedGRanges
or
MatchedGInteractions
, respectively.
matchRanges
uses
propensity scores
to perform subset selection on the pool
set such that the resulting matched
set contains similar distributions of covariates to that of the focal
set.
A propensity score is the conditional probability of assigning an element
(in our case, a genomic range) to a particular outcome (Y
) given a set of
covariates. Propensity scores are estimated using a logistic regression model
where the outcome Y=1
for focal
and Y=0
for pool
, over the provided
covariates covar
.
method = 'nearest'
: Nearest neighbor matching
with replacement. Finds the nearest neighbor by using a
rolling join with data.table
. Matching without replacement
is not currently supported.
method = 'rejection'
: (Default) Rejection sampling
with or without replacement. Uses a probability-based approach
to select options in the pool
that match the focal
distribition.
method = 'stratified'
: Iterative stratified sampling
with or without replacement. Bins focal
and pool
propensity
scores by value and selects matches within bins until all focal
items have a corresponding match in pool
.
matchRanges manuscript:
Eric S. Davis, Wancen Mu, Stuart Lee, Mikhail G. Dozmorov, Michael I. Love, Douglas H. Phanstiel. 2023. "matchRanges: Generating null hypothesis genomic ranges via covariate-matched sampling." Bioinformatics. doi: 10.1093/bioinformatics/btad197
## Match with DataFrame
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3)
#> MatchedDataFrame with 500 rows and 3 columns
#> feature1 feature2 feature3
#> <logical> <numeric> <character>
#> 1 FALSE 2.87088 c
#> 2 FALSE 3.54290 c
#> 3 FALSE 7.11436 c
#> 4 FALSE 10.78965 b
#> 5 FALSE 4.25960 c
#> ... ... ... ...
#> 496 FALSE 0.173349 e
#> 497 FALSE 4.362421 a
#> 498 FALSE 3.182474 e
#> 499 FALSE 4.688994 d
#> 500 FALSE 5.068635 d
## Match with GRanges
set.seed(123)
x <- makeExampleMatchedDataSet(type = "GRanges")
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3)
#> MatchedGRanges object with 500 ranges and 3 metadata columns:
#> seqnames ranges strand | feature1 feature2 feature3
#> <Rle> <IRanges> <Rle> | <logical> <numeric> <character>
#> [1] chr1 8696-8795 * | FALSE 2.87088 c
#> [2] chr1 4386-4485 * | FALSE 3.54290 c
#> [3] chr1 1094-1193 * | FALSE 7.11436 c
#> [4] chr1 5705-5804 * | FALSE 10.78965 b
#> [5] chr1 1643-1742 * | FALSE 4.25960 c
#> ... ... ... ... . ... ... ...
#> [496] chr1 7288-7387 * | FALSE 0.173349 e
#> [497] chr1 5539-5638 * | FALSE 4.362421 a
#> [498] chr1 8499-8598 * | FALSE 3.182474 e
#> [499] chr1 6507-6606 * | FALSE 4.688994 d
#> [500] chr1 1860-1959 * | FALSE 5.068635 d
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Match with GInteractions
set.seed(123)
x <- makeExampleMatchedDataSet(type = "GInteractions")
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3)
#> MatchedGInteractions object with 500 interactions and 3 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 | feature1 feature2
#> <Rle> <IRanges> <Rle> <IRanges> | <logical> <numeric>
#> [1] chr1 8696-8795 --- chr1 8696-8795 | FALSE 2.87088
#> [2] chr1 4386-4485 --- chr1 4386-4485 | FALSE 3.54290
#> [3] chr1 1094-1193 --- chr1 1094-1193 | FALSE 7.11436
#> [4] chr1 5705-5804 --- chr1 5705-5804 | FALSE 10.78965
#> [5] chr1 1643-1742 --- chr1 1643-1742 | FALSE 4.25960
#> ... ... ... ... ... ... . ... ...
#> [496] chr1 7288-7387 --- chr1 7288-7387 | FALSE 0.173349
#> [497] chr1 5539-5638 --- chr1 5539-5638 | FALSE 4.362421
#> [498] chr1 8499-8598 --- chr1 8499-8598 | FALSE 3.182474
#> [499] chr1 6507-6606 --- chr1 6507-6606 | FALSE 4.688994
#> [500] chr1 1860-1959 --- chr1 1860-1959 | FALSE 5.068635
#> feature3
#> <character>
#> [1] c
#> [2] c
#> [3] c
#> [4] b
#> [5] c
#> ... ...
#> [496] e
#> [497] a
#> [498] e
#> [499] d
#> [500] d
#> -------
#> regions: 10500 ranges and 0 metadata columns
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Nearest neighbor matching with replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3,
method = 'nearest',
replace = TRUE)
#> MatchedDataFrame with 500 rows and 3 columns
#> feature1 feature2 feature3
#> <logical> <numeric> <character>
#> 1 FALSE 2.87088 c
#> 2 FALSE 3.54290 c
#> 3 FALSE 7.11436 c
#> 4 FALSE 10.78965 b
#> 5 FALSE 4.25960 c
#> ... ... ... ...
#> 496 FALSE 0.173349 e
#> 497 FALSE 4.362421 a
#> 498 FALSE 3.182474 e
#> 499 FALSE 4.688994 d
#> 500 FALSE 5.068635 d
## Rejection sampling without replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3,
method = 'rejection',
replace = FALSE)
#> MatchedDataFrame with 500 rows and 3 columns
#> feature1 feature2 feature3
#> <logical> <numeric> <character>
#> 1 FALSE 2.108420 c
#> 2 FALSE 5.991699 b
#> 3 FALSE 5.696525 b
#> 4 FALSE 0.409923 a
#> 5 FALSE 9.330107 b
#> ... ... ... ...
#> 496 FALSE 3.27301 b
#> 497 FALSE 8.05522 c
#> 498 FALSE 2.01576 a
#> 499 FALSE 6.60221 b
#> 500 FALSE 7.79926 c
## Stratified sampling without replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
pool = x[!x$feature1,],
covar = ~feature2 + feature3,
method = 'stratified',
replace = FALSE)
#> MatchedDataFrame with 500 rows and 3 columns
#> feature1 feature2 feature3
#> <logical> <numeric> <character>
#> 1 FALSE 2.87088 c
#> 2 FALSE 3.54290 c
#> 3 FALSE 7.11436 c
#> 4 FALSE 10.78965 b
#> 5 FALSE 4.25960 c
#> ... ... ... ...
#> 496 FALSE 1.32077 b
#> 497 FALSE 10.96586 d
#> 498 FALSE 4.32895 b
#> 499 FALSE 4.68899 d
#> 500 FALSE 5.06864 d