missForest is a modified version of the function missForest by Daniel Stekhoven. Please find the detailed documentation of missForest in the missForest package. Only the modifications are explained on this page. The original missForest function returns the final imputation result after convergence or maxiter iterations. The results of categorical columns are returned in form of vector. In missForest function, during the last iteration, not only the final result, but also the onehot probability for each category is returned.

missForest(
  xmis,
  maxiter = 10,
  ntree = 100,
  variablewise = FALSE,
  decreasing = FALSE,
  verbose = FALSE,
  mtry = floor(sqrt(ncol(xmis))),
  replace = TRUE,
  classwt = NULL,
  cutoff = NULL,
  strata = NULL,
  sampsize = NULL,
  nodesize = NULL,
  maxnodes = NULL,
  xtrue = NA,
  parallelize = c("no", "variables", "forests"),
  col_cat = c()
)

Arguments

xmis

data matrix with missing values.

maxiter

stop after how many iterations (default = 10).

ntree

how many trees are grown in the forest (default = 100).

variablewise

(boolean) return OOB errors for each variable separately.

decreasing

(boolean) if TRUE the columns are sorted with decreasing amount of missing values.

verbose

(boolean) if TRUE then missForest returns error estimates, runtime and if available true error during iterations.

mtry

how many variables should be tried randomly at each node.

replace

(boolean) if TRUE bootstrap sampling (with replacements) is performed, else subsampling (without replacements).

classwt

list of priors of the classes in the categorical variables.

cutoff

list of class cutoffs for each categorical variable.

strata

list of (factor) variables used for stratified sampling.

sampsize

list of size(s) of sample to draw

nodesize

minimum size of terminal nodes, vector of length 2, with number for continuous variables in the first entry and number for categorical variables in the second entry.

maxnodes

maximum number of terminal nodes for individual trees

xtrue

complete data matrix

parallelize

TODO

col_cat

index of categorical columns.

Value

ximp imputed data matrix of same type as 'xmis'. ximp.disj imputed data matrix of same type as 'xmis' for the numeric columns. For the categorical columns, the prediction of probability for each category is shown in form of onehot vector. OOBerror estimated OOB imputation error. For the set of continuous variables in 'xmis' the NRMSE and for the set of categorical variables the proportion of falsely classified entries is returned. See Details for the exact definition of these error measures. If 'variablewise' is set to 'TRUE' then this will be a vector of length 'p' where 'p' is the number of variables and the entries will be the OOB error for each variable separately. error true imputation error. This is only available if 'xtrue' was supplied. The error measures are the same as for 'OOBerror'.