combine_boot function combines several imputed bootstrapped dataframes into the final imputed dataframe and provide the variance for each imputed value.

combine_boot(
  ls_df,
  col_con,
  col_dis = c(),
  col_cat = c(),
  num_row_origin,
  method = "onehot",
  dict_cat = NULL,
  var_cat = "unalike"
)

Arguments

ls_df

A list of imputed bootstrapped dataframes.

col_con

Continous columns index.

col_dis

Discret columns index.

col_cat

Categorical columns index.

num_row_origin

Number of rows in the original incomplete dataframe before bootstrapping.

method

The encoded method of categorical columns in the imputed dataframes. This function is coded for both "onehot" and "factor" encoded situations. When method = 'onehot', combine_boot averages the probability vectors over the $B$ imputed datasets for the same observation, then choose the position of maximum probability as the predicted category in the final result. When method = 'factor', for each observation, combine_boot choose the mode value over the imputed dataframes as the predicted category.

dict_cat

The dictionary of categorical columns names if "onehot" method is applied. For example, it could be list("Y7"=c("Y7_1","Y7_2"), "Y8"=c("Y8_1","Y8_2","Y8_3")).

var_cat

The method of variance calculation for the categorical columns. "unalike" will lead to the calculation of unalikeability, while "wilcox_va" will lead to the calculation of Wilcox index: VarNC.

Value

df_result_disj The final imputed dataframe with the categorical columns in onehot form. df_result_var_disj The variance matrix for the final imputation dataframe with the categorical columns in onehot form. df_result The final imputed dataframe with the categorical columns in factor form. df_result_var The variance matrix for the final imputation dataframe with the categorical columns in factor form.

References

Statistical Analysis with Missing Data, by Little and Rubin, 2002