This function performs Reject Inference using the Augmentation technique. Note that this technique is theoretically better than using the financed clients scorecard in the MAR and misspecified model case.
augmentation(xf, xnf, yf)
xf | The matrix of financed clients' characteristics to be used in the scorecard. |
---|---|
xnf | The matrix of not financed clients' characteristics to be used in the scorecard (must be the same features in the same order as xf!). |
yf | The matrix of financed clients' labels |
List containing the model using financed clients only and the model produced using the Augmentation method.
This function performs the Augmentation method on the data. When provided with labeled observations \((x^\ell,y)\), it first fits the logistic regression model \(p_\theta\) of \(x^\ell\) on \(y\), then reweighs labeled observations according to their probability of being sampled, i.e. calculates the predicted probabilities of \(p_\theta\) on all observations, defines score-bands and calculates, in each of these score-bands, the probability of having been accepted as the proportion of labeled samples in that score-band. It then refits a logistic regression model \(p_\eta\) on the labeled samples.
Enea, M. (2015), speedglm: Fitting Linear and Generalized Linear Models to Large Data Sets, https://CRAN.R-project.org/package=speedglm Ehrhardt, A., Biernacki, C., Vandewalle, V., Heinrich, P. and Beben, S. (2018), Reject Inference Methods in Credit Scoring: a rational review,
glm
, speedglm
Adrien Ehrhardt
# We simulate data from financed clients df <- generate_data(n = 100, d = 2) xf <- df[, -ncol(df)] yf <- df$y # We simulate data from not financed clients (MCAR mechanism) xnf <- generate_data(n = 100, d = 2)[, -ncol(df)] augmentation(xf, xnf, yf)#> Warning: non-integer #successes in a binomial glm!#> Generalized Linear Model of class 'speedglm': #> #> Call: speedglm::speedglm(formula = labels ~ ., data = df_augmente[, -which(names(df_augmente) %in% c("poidsfinal", "classe_SCORE"))][!df_augmente$poidsfinal == 0, ], family = stats::binomial(link = "logit"), weights = df_augmente$poidsfinal[!df_augmente$poidsfinal == 0]) #> #> Coefficients: #> (Intercept) x.x.1 x.x.2 #> 0.76812 -1.71279 -0.00256 #>