This function performs Reject Inference using the Reclassification technique. Note that this technique has no theoretical foundation as it performs a one-step CEM algorithm.

reclassification(xf, xnf, yf, thresh = 0.5)



The matrix of financed clients' characteristics to be used in the scorecard.


The matrix of not financed clients' characteristics to be used in the scorecard (must be the same in the same order as xf!).


The matrix of financed clients' labels


The threshold to use in the Classification step, i.e. the probability above which a not financed client is considered to have a label equal to 1.


List containing the model using financed clients only and the model produced using the Reclassification method.


This function performs the Reclassification method on the data. When provided with labeled observations \((x^\ell,y)\), it first fits the logistic regression model \(p_\theta\) of \(x^\ell\) on \(y\), then considers that unlabeled observations are of the expected class given by the model \(p_\theta\) (this is equivalent to a CEM algorithm). It then refits a logistic regression model \(p_\eta\) on the whole sample.


Enea, M. (2015), speedglm: Fitting Linear and Generalized Linear Models to Large Data Sets, Ehrhardt, A., Biernacki, C., Vandewalle, V., Heinrich, P. and Beben, S. (2018), Reject Inference Methods in Credit Scoring: a rational review,

See also

glm, speedglm


Adrien Ehrhardt


# We simulate data from financed clients xf <- matrix(runif(100 * 2), nrow = 100, ncol = 2) theta <- c(2, -2) log_odd <- apply(xf, 1, function(row) theta %*% row) yf <- rbinom(100, 1, 1 / (1 + exp(-log_odd))) # We simulate data from not financed clients (MCAR mechanism) xnf <- matrix(runif(100 * 2), nrow = 100, ncol = 2) reclassification(xf, xnf, yf)
#> Generalized Linear Model of class 'speedglm': #> #> Call: speedglm::speedglm(formula = labels ~ ., data = df[, -which(names(df) %in% c("acc"))], family = stats::binomial(link = "logit")) #> #> Coefficients: #> (Intercept) x.1 x.2 #> -1.72 5.65 -2.88 #>