generation of missing values on complete or incomplete data according to different missingness mechanisms and patterns

produce_NA(
  data,
  mechanism = "MCAR",
  perc.missing = 0.5,
  self.mask = NULL,
  idx.incomplete = NULL,
  idx.covariates = NULL,
  weights.covariates = NULL,
  by.patterns = FALSE,
  patterns = NULL,
  freq.patterns = NULL,
  weights.patterns = NULL,
  use.all = FALSE,
  logit.model = "RIGHT",
  seed = NULL
)

Arguments

data

[data.frame, matrix] (mixed) data table (n x p)

mechanism

[string] either one of "MCAR", "MAR", "MNAR"; default is "MCAR"

perc.missing

[positive double] proportion of missing values, between 0 and 1; default is 0.5

self.mask

[string] either NULL or one of "sym", "upper", "lower"; default is NULL

idx.incomplete

[array] indices of variables to generate missing values for; if NULL then missing values in all variables are possible; default is NULL

idx.covariates

[matrix] binary matrix such that entries in row i that are equal to 1 indicate covariates that incluence missingness of variable i (sum(idx.incomplete) x p); if NULL then all covariates contribute; default is NULL

weights.covariates

[matrix] matrix of same size as idx.covariates with weights in row i for contribution of each covariate to missingness model of variable i; if NULL then a (regularized) logistic model is fitted; default is NULL

by.patterns

[boolean] generate missing values according to (pre-specified) patterns; default is FALSE

patterns

[matrix] binary matrix with 1=observed, 0=missing (n_pattern x p); default is NULL

freq.patterns

[array] array of size n_pattern containing desired proportion of each pattern; if NULL then mice::ampute.default.freq will be called ; default is NULL

weights.patterns

[matrix] weights used to calculate weighted sum scores (n_pattern x p); if NULL then mice::ampute.default.weights will be called; default is NULL

use.all

[boolean] use all observations, including incomplete observations, for amputation when amputing by patterns (only relevant if initial data is incomplete and by.pattern=T); default is FALSE

logit.model

[string] either one of "RIGHT","LEFT","MID","TAIL"; default is "RIGHT"

seed

[natural integer] seed for random numbers generator; default is NULL

Value

A list with the following elements

data.init

original data.frame

data.incomp

data.frame with the newly generated missing values, observed values correspond to the values from the initial data.frame

idx_newNA

a boolean data.frame indicating the indices of the newly generated missing values