lrtree.discretization
implements data preprocessing (merging and discretization)
- class lrtree.discretization.Processing(target: str, discretize: bool = False, merge_threshold: float = 0.2)[source]
Encapsulates information necessary to discretize / merge and reapply to some new data
Methods
fit
(X, categorical)fits the preprocessing
fit_transform
(X, categorical)Calls fit and then transform
transform
(X)Preprocesses validation/test data similar to some previously seen training data
Functions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Computes the cut values on x to minimize the entropy (on y) |
|
Discretizes the continuous variable X[var], using the target value var_predite MDPL (Minimum Description Length Principle) |
|
Computes the entropy of the variable |
|
Deals with extreme values (ex : NaN, or not filled) Creates (or not) a column signaling which values were missing |
|
Finds the best place to split x |
|
|
|
GreenClust algorithm to group modalities |
|
Chi2 independence algorithm to group modalities |
|
Recursive function with returns the cuts_points |
|
Decided whether we should cut target at cut_idx, knowing we imagine the new entropy to be |
|
Traite les données en gérant les valeurs extremes, les variables catégoriques et en normalisant |
|
Traite les données et les données de test en gérant les valeurs extremes, les variables catégoriques et en normalisant Retourne les données traitées et les labels des colonnes |
|
Traite les données de test en gérant les valeurs extremes, les variables catégoriques et en normalisant Retourne les données traitées |
|
Classes
|
Encapsulates information necessary to discretize / merge and reapply to some new data |