lrtree
This module is dedicated to logistic regression trees
- class lrtree.Lrtree(algo: str = 'SEM', test: bool = False, validation: bool = False, criterion: str = 'bic', ratios: tuple = (0.7,), class_num: int = 10, max_iter: int = 100, data_treatment: bool = False, discretization: bool = False, leaves_as_segment: bool = False, early_stopping=False, burn_in: int = 30)[source]
The class implements a supervised method based in logistic trees. Its attributes:
- test
- Type
Boolean (T/F) specifying if a test set is required. If True, the provided data is split to provide 20% of observations in a test set and the reported performance is the Gini index on test set.
- validation
- Type
Boolean (T/F) specifying if a validation set is required. If True, the provided data is split to provide 20% of observations in a validation set and the reported performance is the Gini index on the validation set (if no test=False). The quality of the model at each step is evaluated using the Gini index on the validation set, so criterion must be set to “gini”.
- criterion
- Type
The criterion to be used to assess the goodness-of-fit of the model: “bic” or “aic” if no validation set, else “gini”.
- max_iter
- Type
Number of MCMC steps to perform. The more the better, but it may be more intelligent to use several MCMCs. Computation time can increase dramatically.
- num_clas
- Type
Number of initial segments.
- criterion_iter
- Type
The value of the criterion wished to be optimized over the iterations.
- best_link
The best decision tree.
- best_reglog:
- Type
The list of the best logistic regression on each segment (found with best_link).
- ratios
- Type
The float ratio values for splitting of a dataset in test, validation.
Methods
fit
(X, y[, solver, nb_init, tree_depth, ...])Fits the Lrtree object.
generate_data
(n, d[, seed, theta])Generates some toy continuous data that gets discretized, and a label is drawn from a logistic regression given the discretized features.
precision
(X_test, y_test)Scores the precision of the prediction on the test set
predict
(X)Predicts the labels for new values using previously fitted lrtree object
Predicts the probability of the labels for new values using previously fitted lrtree object
- fit(X, y, solver: str = 'lbfgs', nb_init: int = 1, tree_depth: int = 10, min_impurity_decrease: float = 0.0, optimal_size: bool = True, tol: float = 0.005, categorical=None)
Fits the Lrtree object.
- Parameters
X (numpy.ndarray) – array_like of shape (n_samples, n_features) Vector to be scored, where n_samples is the number of samples and n_features is the number of features
y (numpy.ndarray) – Boolean (0/1) labels of the observations. Must be of the same length as X (numpy “numeric” array).
solver (str) – sklearn’s solver for LogisticRegression (default ‘lbfgs’)
nb_init (int) – Number of different random initializations
tree_depth (int) – Maximum depth of the tree used
min_impurity_decrease (float) – Parameter used to split (or not) the decision Tree
optimal_size (bool) – Whether to use the tree parameters, or to take the optimal tree (used only with a validation set)
tol (float) – Tolerance to observe an improvement and stop early
categorical (list) – List of names of categorical features
- static generate_data(n: int, d: int, seed=None, theta: Optional[ndarray] = None) Tuple[ndarray, ndarray, ndarray, float]
Generates some toy continuous data that gets discretized, and a label is drawn from a logistic regression given the discretized features.
- Parameters
n (int) – Number of observations to draw.
d (int) – Number of features to draw.
theta (numpy.ndarray) – Logistic regression coefficient to use (if None, use the one provided).
seed (int) – numpy random seed
- Returns
generated data x and y, coefficient theta and bic
- Return type
- precision(X_test: ndarray, y_test: ndarray) float
Scores the precision of the prediction on the test set
- Parameters
X_test (numpy.ndarray) – array_like of shape (n_samples, n_features) Vector used to predict values of y
y_test (numpy.ndarray) – array_like of shape (n_samples, 1) Vector of the value, aimed to be predicted, in the data
- Returns
precision
- Return type
- predict(X: ndarray) ndarray
Predicts the labels for new values using previously fitted lrtree object
- Parameters
X (numpy.ndarray) – array_like of shape (n_samples, n_features) Vector to be scored, where n_samples is the number of samples and n_features is the number of features
- predict_proba(X: ndarray) ndarray
Predicts the probability of the labels for new values using previously fitted lrtree object
- Parameters
X (numpy.ndarray) – array_like of shape (n_samples, n_features) Vector to be scored, where n_samples is the number of samples and n_features is the number of features
Classes
|
The class implements a supervised method based in logistic trees. |
Exceptions
|
Exception class to raise if estimator is used before fitting. |
Modules
implements data preprocessing (merging and discretization) |
|
fit module for the Lrtree class |
|
generate_data module for the Lrtree class: generating some data to test the algorithm on. |
|
implements segment-specific, possibly single-class, logistic regression |
|
Predict, predict_proba and precision methods for the Lrtree class |