MI_missRanger.Rd
MI_missRanger
is a function of multiple imputation with
missRanger
method.
In missRanger_mod_draw
, for a certain prediction, instead of taking
average of the prediction result from each tree of the random forest, during
the last iteration, we draw one result from the empirical distribution
constructed by predictions of trees. The other steps of the imputation are
identical as those of missRanger
from 'missRanger' package.
MI_missRanger
takes all the imputation results from
missRanger_mod_draw
and combine them with Rubin's Rule to generate
the final imputed data set.
MI_missRanger(
data,
formula = . ~ .,
pmm.k = 0L,
maxiter = 10L,
seed = NULL,
verbose = 1,
returnOOB = FALSE,
case.weights = NULL,
col_cat = c(),
num_mi = 5,
...
)
A data.frame
or tibble
with missing values to
impute.
A two-sided formula specifying variables to be imputed (lef hand side) and variables used to impute (right hand side). Defaults to . ~ ., i.e. use all variables to impute all variables. If e.g. all variables (with missings) should be imputed by all variables except variable "ID", use . ~ . - ID. Note that a "." is evaluated separately for each side of the formula. Further note that variables with missings must appear in the left hand side if they should be used on the right hand side.
Number of candidate non-missing values to sample from in the predictive mean matching steps. 0 to avoid this step.
Maximum number of chaining iterations.
Integer seed to initialize the random generator.
Controls how much info is printed to screen. 0 to print
nothing. 1 (default) to print a "." per iteration and variable, 2 to print
the OOB prediction error per iteration and variable (1 minus R-squared for
regression). Furthermore, if verbose
is positive, the variables used
for imputation are listed as well as the variables to be imputed (in the
imputation order). This will be useful to detect if some variables are
unexpectedly skipped.
Logical flag. If TRUE, the final average out-of-bag prediction error is added to the output as attribute "oob". This does not work in the special case when the variables are imputed univariately.
Vector with non-negative case weights.
Indices of categorical columns
Number of multiple imputation
Arguments passed to ranger()
. If the data set is large,
better use less trees (e.g. num.trees = 20
) and/or a low value of
sample.fraction
.
The following arguments are e.g. incompatible with ranger
:
write.forest
, probability
, split.select.weights
,
dependent.variable.name
, and classification
.
ximp
Final imputed dataset.
ximp.disj
Final disjunctive imputed dataset.
ls_imputations
List of imputed dataset from multiple
imputation.
ls_imputations.disj
List of disjunctive imputed dataset from
multiple imputation.