Adrien Ehrhardt

About me

And all the stuff I like

AI Assistant Professor (part-time -- "Chargé d'Enseignement" (contractuel)) at Ecole Polytechnique (X) since 09/2019 and "qualifié" (n° 20226343936) for Associate Professor positions since 02/2020.

Machine Learning Engineer at Crédit Agricole S.A. since 12/2019.

PhD from Lille University and Inria since 09/2019, funded by Crédit Agricole Consumer Finance through a CIFRE funding.

My PhD work was mainly about applying state-of-the-art statistical machine learning to retail banking and credit scoring (i.e. the creditworthiness of a client) in particular: reject inference theory, efficient discretization of continuous features, grouping of categorical features' levels, introducing pairwise interactions, profit-oriented scoring, feature selection for logistic regression in high dimension, ...

I was a member of the Modal team at Inria during my PhD. My supervisors were Christophe Biernacki, Vincent Vandewalle and Philippe Heinrich. I now work for Crédit Agricole S.A., in an operational research group, as an ML Engineer implementing various research ideas for different branches of Crédit Agricole.

Blog

Informations relatives aux thèses CIFRE -French-

Récit de mes déboires et victoires administratives avec l'ANRT. Regroupement d'informations utiles pour tous ceux intéressés par le dispositif CIFRE !

Introduction au Credit Scoring

For those not familiar with Credit Scoring, here is a short introduction necessary for the understanding of the other blog posts! In French!

Reject Inference methods in Credit Scoring

Having previously rejected clients means, when a new scorecard is to be developed, than those clients' information are not included in the learning process. Is this a problem? If yes, how to adjust for this "bias"?

Discretization methods

Sometimes, you have continuous features but you want to use an algorithm that only deals with discrete values. Or you want to build a segmentation. Or you're like me, you're ordered to discretize your features. What techniques are out there to do just that?

Pairwise interactions for logistic regression

The additivity assumption of logistic regression does not allow for "compound" effect of predictors, e.g. if the risk of contracting a disease given a smoking habit or fat food regime is doubled, it is implicitly assumed that this risk is quadrupled when both predictors are present, which is rarely the case. Effectively selecting pairs that "interact" is the subject of this post.

Logistic regression trees

In order to adress different markets or partners, financial institutions often segment their clients in a tree-like structure. At its leaves, clients of a particular product might give us different or additional data, such as the brand of the car he intends to purchase, which we can use to assess its creditworthiness, hence all such clients have a specific scorecard. This segmentation is not commonly re-assessed s.t. it is likely suboptimal. In this post, we examine current solutions for building logistic regression trees.

Raspberry Pi cluster for Machine Learning

Clusters are in fashion, mostly for good reasons. We'll set up a cluster of RasPi running Hadoop and Spark, learn a machine learning model on distributed data and deploy this model in production using Kubernetes.

Teaching

Some material for the few courses I taught / teach at University of Lille (incl. DUT STID and Polytech' Lille).

Publications

Supervised multivariate discretization and levels merging for logistic regression, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, in preparation.
Reject Inference Methods in Credit Scoring, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich and Sébastien Beben (2021), Journal of Applied Statistics (Application Note: Recent Statistical Methods for Data Analysis, Applied Economics, Business & Finance). Click here to get the Accepted Manuscript (PDF).
Feature quantization for parsimonious and interpretable predictive models, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, ArXiv preprint , accompanying code.
Failure of a Mexican antivenom on recovery from snakebite-related coagulopathy in French Guiana, X. Heckmann, V. Lambert, G. Mion, A. Ehrhardt, C. Marty, F. Perotti, J.-F. Carod, A. Jolivet, D. Boels, I. Lehida Andi, S. Larréché, accepted for publication in Clinical Toxicology.

Talks

Automated ESG Report Analysis by Joint Entity and Relation Extraction, Adrien Ehrhardt, Minh Tuan Nguyen, ECML-PKDD - SoGood 2021 (manuscript, slides).
Supervised multivariate discretization and levels merging for logistic regression, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, COMPStat, 2018 (Book of abstracts).
Reject Inference in Credit Scoring, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, 49èmes Journées de Statistiques, 2017. See also the "long abstract" in French on arXiv.
Réintégration des refusés en Credit Scoring, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, Rencontres des Jeunes Statisticiens, 2017.

Posters

Model-based multivariate Discretization for Logistic Regression, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, Data Science Summer School, Polytechnique, 2017.

Software

My public software production is available through my Github profile.

GLMDISC, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich; R package released on CRAN. See also the package vignette and the package's pkgdown website.
GLMDISC, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich; Python package also released on PyPi.
GLMTREE, Adrien Ehrhardt; R package. See also the package vignette and the package's pkgdown website.
GLMTREE, Adrien Ehrhardt, Elise Bayraktar; Python package.
scoringTools, Adrien Ehrhardt, R package. See also the package vignette and the package's pkgdown website.

Seminars

Reject Inference, "quantization", interactions, logistic regression trees, and bonuses, Adrien Ehrhardt, Mission Lane Data Science seminar, 2022.
Etude et formalisation de problématiques de scoring en risque de crédit, Adrien Ehrhardt, Finale Hauts-de-France du concours Force Awards, catégorie Force Partners, 2019.
Some thoughts about current Credit Scoring practices, Adrien Ehrhardt, Machine Learning Day, AGOS, Milan, 2019. Accompanying code.
Modèles prédictifs pour données volumineuses et biaisées, Adrien Ehrhardt, Journée Data, Groupe de Recherche Opérationnel de Crédit Agricole, 2019.
Feature quantization for parsimonious and interpretable models, Adrien Ehrhardt, Modal's Days 2019.
Discrétisation, regroupement de modalités et introduction d’interactions en régression logistique, Adrien Ehrhardt, University of Lille, EA2694 Santé Publique : épidémiologie et qualité des soins. See also the R code to produce results on last slide.
Credit Scoring : biais d'échantillon ou réintégration des refusés, Adrien Ehrhardt, Modal's Days 2017.

Thesis

PhD manuscript, Adrien Ehrhardt, 2019.
PhD defense slides, Adrien Ehrhardt, 2019.
Rapport de contrat de professionnalisation : Application de la réglementation Bâle II - Etudes de risque et mise en place d’outils d’analyse, BNP Paribas Personal Finance, sous la supervision de Pierre Chainais, 2015, confidentiel.
Rapport de stage : modélisation mathématique du trafic, APRR, sous la supervision d'Augustin Mouze, 2014, confidentiel.
Projet recherche : optimisation non-linéaire - application à la création d'indices boursiers, sous la supervision de Frédéric Semet, 2014, Partie 1, Partie 2, Présentation.

Photography

Usually not very artistic, I developed an interest in photography over the years (maybe for its technical aspects!). My favorites shots so far:

About me

Blog

Informations relatives aux thèses CIFRE -French-

Introduction au Credit Scoring

Reject Inference methods in Credit Scoring

Discretization methods

Pairwise interactions for logistic regression

Logistic regression trees

Raspberry Pi cluster for Machine Learning

Teaching

Publications

Talks

Posters

Software

Seminars

Thesis

Photography

Tuscany, Italia

Budapest, Magyarország

Scotland

Guyane, France

Wroclaw, Polska

Berlin, Deutschland

Guadeloupe, France

New York, USA

London, UK

Milano, Italia

Norge