About me

And all the stuff I like

AI Assistant Professor (part-time -- "Chargé d'Enseignement" (contractuel)) at Ecole Polytechnique (X) since 09/2019 and "qualifié" (n° 20226343936) for Associate Professor positions since 02/2020.

Machine Learning Engineer at Crédit Agricole S.A. since 12/2019.

PhD from Lille University and Inria since 09/2019, funded by Crédit Agricole Consumer Finance through a CIFRE funding.

My PhD work was mainly about applying state-of-the-art statistical machine learning to retail banking and credit scoring (i.e. the creditworthiness of a client) in particular: reject inference theory, efficient discretization of continuous features, grouping of categorical features' levels, introducing pairwise interactions, profit-oriented scoring, feature selection for logistic regression in high dimension, ...

I was a member of the Modal team at Inria during my PhD. My supervisors were Christophe Biernacki, Vincent Vandewalle and Philippe Heinrich. I now work for Crédit Agricole S.A., in an operational research group, as an ML Engineer implementing various research ideas for different branches of Crédit Agricole.


Informations relatives aux thèses CIFRE -French-

Récit de mes déboires et victoires administratives avec l'ANRT. Regroupement d'informations utiles pour tous ceux intéressés par le dispositif CIFRE !

Introduction au Credit Scoring

For those not familiar with Credit Scoring, here is a short introduction necessary for the understanding of the other blog posts! In French!

Reject Inference methods in Credit Scoring

Having previously rejected clients means, when a new scorecard is to be developed, than those clients' information are not included in the learning process. Is this a problem? If yes, how to adjust for this "bias"?

Discretization methods

Sometimes, you have continuous features but you want to use an algorithm that only deals with discrete values. Or you want to build a segmentation. Or you're like me, you're ordered to discretize your features. What techniques are out there to do just that?

Pairwise interactions for logistic regression

The additivity assumption of logistic regression does not allow for "compound" effect of predictors, e.g. if the risk of contracting a disease given a smoking habit or fat food regime is doubled, it is implicitly assumed that this risk is quadrupled when both predictors are present, which is rarely the case. Effectively selecting pairs that "interact" is the subject of this post.

Logistic regression trees

In order to adress different markets or partners, financial institutions often segment their clients in a tree-like structure. At its leaves, clients of a particular product might give us different or additional data, such as the brand of the car he intends to purchase, which we can use to assess its creditworthiness, hence all such clients have a specific scorecard. This segmentation is not commonly re-assessed s.t. it is likely suboptimal. In this post, we examine current solutions for building logistic regression trees.

Raspberry Pi cluster for Machine Learning

Clusters are in fashion, mostly for good reasons. We'll set up a cluster of RasPi running Hadoop and Spark, learn a machine learning model on distributed data and deploy this model in production using Kubernetes.


Some material for the few courses I taught / teach at University of Lille (incl. DUT STID and Polytech' Lille).


  • Supervised multivariate discretization and levels merging for logistic regression, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, in preparation.
  • Reject Inference Methods in Credit Scoring: a rational review, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, in preparation.
  • Feature quantization for parsimonious and interpretable predictive models, Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, ArXiv preprint , accompanying code.
  • Failure of a Mexican antivenom on recovery from snakebite-related coagulopathy in French Guiana, X. Heckmann, V. Lambert, G. Mion, A. Ehrhardt, C. Marty, F. Perotti, J.-F. Carod, A. Jolivet, D. Boels, I. Lehida Andi, S. Larréché, accepted for publication in Clinical Toxicology.




My public software production is available through my Github profile.



  • PhD manuscript, Adrien Ehrhardt, 2019.
  • PhD defense slides, Adrien Ehrhardt, 2019.
  • Rapport de contrat de professionnalisation : Application de la réglementation Bâle II - Etudes de risque et mise en place d’outils d’analyse, BNP Paribas Personal Finance, sous la supervision de Pierre Chainais, 2015, confidentiel.
  • Rapport de stage : modélisation mathématique du trafic, APRR, sous la supervision d'Augustin Mouze, 2014, confidentiel.
  • Projet recherche : optimisation non-linéaire - application à la création d'indices boursiers, sous la supervision de Frédéric Semet, 2014, Partie 1, Partie 2, Présentation.


Usually not very artistic, I developed an interest in photography over the years (maybe for its technical aspects!). My favorites shots so far:

Tuscany, Italia

Budapest, Magyarország


Guyane, France

Wroclaw, Polska

Berlin, Deutschland

Guadeloupe, France

New York, USA

London, UK

Milano, Italia