This function performs Reject Inference using the Parcelling technique. Note that this technique is theoretically good in the MNAR framework although coefficients must be chosen a priori.

parcelling(
  xf,
  xnf,
  yf,
  probs = seq(0, 1, 0.25),
  alpha = rep(1, length(probs) - 1)
)

Arguments

xf

The matrix of financed clients' characteristics to be used in the scorecard.

xnf

The matrix of not financed clients' characteristics to be used in the scorecard (must be the same in the same order as xf!).

yf

The matrix of financed clients' labels

probs

The sequence of quantiles to use to make scorebands (see the vignette).

alpha

The user-defined coefficients to use with Parcelling (see the vignette).

Value

List containing the model using financed clients only and the model produced using the Parcelling method.

Details

This function performs the Parcelling method on the data. When provided with labeled observations \((x^\ell,y)\), it first fits the logistic regression model \(p_\theta\) of \(x^\ell\) on \(y\), then labels the unlabelled samples \(x^{u}\) with the observed bad rate in user-defined classes of predicted probabilities of \(p_\theta\) reweighted using user-supplied weights, i.e. \(\hat{y}^{u} = \alpha_k T(k)\) where \(k\) denotes the group (which depends on \(p_\theta\)) and T(k) the observed bad rate of labeled observations in this group. It then refits a logistic regression model \(p_\eta\) on the whole sample.

References

Enea, M. (2015), speedglm: Fitting Linear and Generalized Linear Models to Large Data Sets, https://CRAN.R-project.org/package=speedglm Ehrhardt, A., Biernacki, C., Vandewalle, V., Heinrich, P. and Beben, S. (2018), Reject Inference Methods in Credit Scoring: a rational review,

See also

glm, speedglm

Author

Adrien Ehrhardt

Examples

# We simulate data from financed clients df <- generate_data(n = 100, d = 2) xf <- df[, -ncol(df)] yf <- df$y # We simulate data from not financed clients (MCAR mechanism) xnf <- generate_data(n = 100, d = 2)[, -ncol(df)] parcelling(xf, xnf, yf)
#> Generalized Linear Model of class 'speedglm': #> #> Call: speedglm::speedglm(formula = labels ~ ., data = df_parceling[, -which(names(df_parceling) %in% c("poids_final", "classe_SCORE", "acc"))], family = stats::binomial(link = "logit")) #> #> Coefficients: #> (Intercept) x.x.1 x.x.2 #> -0.142 -1.398 1.915 #>