ls_F1 is a function that returns a list of F1-Score corresponding to the given list of imputed datasets. resample_method is needed because with 'bootstrap' method, we could have repeated lines in the imputed datasets, and with both 'jackknife' and 'bootstrap', the imputed datasets could not cover all the lines.

ls_F1(
  df_comp,
  ls_df_imp,
  mask,
  col_cat_comp,
  col_cat_imp,
  resample_method = "bootstrap",
  combine_method = "onehot",
  dict_cat = NULL
)

Arguments

df_comp

The original complete dataset.

ls_df_imp

List of imputed dataset.

mask

Mask of missingness (1 means missing value and 0 means observed value).

col_cat_comp

Indices of categorical columns in the complete dataset.

col_cat_imp

Indices of categorical columns in the imputed dataset.

resample_method

Default value is 'bootstrap', could also be 'jackknife' or 'none'.

combine_method

When resample_method = 'bootstrap', combine_method could be 'factor' or 'onehot'. When method = 'onehot', ls_F1 takes the average of the one-hot probability vector for each observation, then choose the position of maximum probability as the predicted category. When method = 'factor', or each observation, ls_F1 chooses the mode value over the imputed dataframes as the predicted category.

dict_cat

The dictionary of categorical columns names if "onehot" method is applied. For example, it could be list("Y7"=c("Y7_1","Y7_2"), "Y8"=c("Y8_1","Y8_2","Y8_3")).

Value

list_F1 List of F1 corresponding to the given list of imputed datasets. Mean_F1 Mean value of F1. Variance_F1 Variance of F1.