Informations relatives aux thèses CIFRE -French-
Récit de mes déboires et victoires administratives avec l'ANRT. Regroupement d'informations utiles pour tous ceux intéressés par le dispositif CIFRE !
AI Assistant Professor @Polytechnique
PhD @Lille University & @Inria
in Machine Learning applied to Finance
Machine Learning Engineer @Crédit Agricole
And all the stuff I like
AI Assistant Professor (part-time -- "Chargé d'Enseignement" (contractuel)) at Ecole Polytechnique (X) since 09/2019 and "qualifié" (n° 20226343936) for Associate Professor positions since 02/2020.
Machine Learning Engineer at Crédit Agricole S.A. since 12/2019.
PhD from Lille University and Inria since 09/2019, funded by Crédit Agricole Consumer Finance through a CIFRE funding.
My PhD work was mainly about applying state-of-the-art statistical machine learning to retail banking and credit scoring (i.e. the creditworthiness of a client) in particular: reject inference theory, efficient discretization of continuous features, grouping of categorical features' levels, introducing pairwise interactions, profit-oriented scoring, feature selection for logistic regression in high dimension, ...
I was a member of the Modal team at Inria during my PhD. My supervisors were Christophe Biernacki, Vincent Vandewalle and Philippe Heinrich. I now work for Crédit Agricole S.A., in an operational research group, as an ML Engineer implementing various research ideas for different branches of Crédit Agricole.
Récit de mes déboires et victoires administratives avec l'ANRT. Regroupement d'informations utiles pour tous ceux intéressés par le dispositif CIFRE !
For those not familiar with Credit Scoring, here is a short introduction necessary for the understanding of the other blog posts! In French!
Having previously rejected clients means, when a new scorecard is to be developed, than those clients' information are not included in the learning process. Is this a problem? If yes, how to adjust for this "bias"?
Sometimes, you have continuous features but you want to use an algorithm that only deals with discrete values. Or you want to build a segmentation. Or you're like me, you're ordered to discretize your features. What techniques are out there to do just that?
The additivity assumption of logistic regression does not allow for "compound" effect of predictors, e.g. if the risk of contracting a disease given a smoking habit or fat food regime is doubled, it is implicitly assumed that this risk is quadrupled when both predictors are present, which is rarely the case. Effectively selecting pairs that "interact" is the subject of this post.
In order to adress different markets or partners, financial institutions often segment their clients in a tree-like structure. At its leaves, clients of a particular product might give us different or additional data, such as the brand of the car he intends to purchase, which we can use to assess its creditworthiness, hence all such clients have a specific scorecard. This segmentation is not commonly re-assessed s.t. it is likely suboptimal. In this post, we examine current solutions for building logistic regression trees.
Clusters are in fashion, mostly for good reasons. We'll set up a cluster of RasPi running Hadoop and Spark, learn a machine learning model on distributed data and deploy this model in production using Kubernetes.
Some material for the few courses I taught / teach at University of Lille (incl. DUT STID and Polytech' Lille).
My public software production is available through my Github profile.
Usually not very artistic, I developed an interest in photography over the years (maybe for its technical aspects!). My favorites shots so far: