RFNodeTransformer
sknnr.transformers.RFNodeTransformer ¶
RFNodeTransformer(n_estimators: int = 50, criterion_reg: Literal['squared_error', 'absolute_error', 'friedman_mse', 'poisson'] = 'squared_error', criterion_clf: Literal['gini', 'entropy', 'log_loss'] = 'gini', max_depth: int | None = None, min_samples_split: int | float = 2, min_samples_leaf: int | float = 5, min_weight_fraction_leaf: float = 0.0, max_features_reg: Literal['sqrt', 'log2'] | int | float | None = 1.0, max_features_clf: Literal['sqrt', 'log2'] | int | float | None = 'sqrt', max_leaf_nodes: int | None = None, min_impurity_decrease: float = 0.0, bootstrap: bool = True, oob_score: bool | Callable = False, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0, warm_start: bool = False, class_weight_clf: Literal['balanced', 'balanced_subsample'] | dict[str, float] | list[dict[str, float]] | None = None, ccp_alpha: float = 0.0, max_samples: int | float | None = None, monotonic_cst: list[int] | None = None)
Bases: TreeNodeTransformer
Transformer to capture node indexes for samples across multiple random forests.
A random forest is fit to each y target in the training set using either
scikit-learn's RandomForestRegressor or RandomForestClassifier. The
transformation captures the node indexes for each tree in each forest for
each training or new sample.
The particular random forest type used for each target is determined by the
data type of the target. If the target is numeric (e.g. int or float),
a RandomForestRegressor is used. If the target is categorical (e.g.
str or pd.Categorical), a RandomForestClassifier is used. Targets are
automatically promoted to the minimum numpy dtype that safely represents
all elements.
This transformer is intended to be used in conjunction with RFNNRegressor
which captures similarity between node indexes of training and inference data
and creates predictions using nearest neighbors.
See sklearn.ensemble.RandomForestRegressor and
sklearn.ensemble.RandomForestClassifier for more detail on available
parameters. All parameters are passed through to these respective random
forest estimators for each random forest being built. Note that some
parameters (e.g. criterion and max_features) are specified separately
for regression and classification and have _reg and _clf suffixes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_estimators
|
int
|
The number of trees in each random forest. |
50
|
criterion_reg
|
('squared_error', 'absolute_error', 'friedman_mse', 'poisson')
|
default="squared_error" The function to measure the quality of a split for RandomForestRegresor objects. |
"squared_error"
|
criterion_clf
|
('gini', 'entropy', 'log_loss')
|
The function to measure the quality of a split for RandomForestClassifier objects. |
"gini"
|
max_depth
|
int
|
The maximum depth of the tree. |
None
|
min_samples_split
|
int or float
|
The minimum number of samples required to split an internal node. |
2
|
min_samples_leaf
|
int of float
|
The minimum number of samples required to be at a leaf node. |
5
|
min_weight_fraction_leaf
|
float
|
The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. |
0.0
|
max_features_reg
|
“sqrt”, “log2”, None
|
The number of features to consider when looking for the best split for RandomForestRegressor objects. |
“sqrt”
|
max_features_clf
|
“sqrt”, “log2”, None
|
The number of features to consider when looking for the best split for RandomForestClassifier objects. |
“sqrt”
|
max_leaf_nodes
|
int
|
Grow trees with max_leaf_nodes in best-first fashion. |
None
|
min_impurity_decrease
|
float
|
A node will be split if this split induces a decrease of the impurity greater than or equal to this value. |
0.0
|
bootstrap
|
bool
|
Whether bootstrap samples are used when building trees. |
True
|
oob_score
|
bool or callable
|
Whether to use out-of-bag samples to estimate the generalization score. |
False
|
n_jobs
|
int
|
The number of jobs to run in parallel. |
None
|
random_state
|
int, RandomState instance or None
|
Controls both the randomness of the bootstrapping of the samples
used when building trees (if |
None
|
verbose
|
int
|
Controls the verbosity when fitting and predicting. |
0
|
warm_start
|
bool
|
When set to |
False
|
class_weight_clf
|
“balanced”, “balanced_subsample”
|
default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. |
“balanced”
|
ccp_alpha
|
non-negative float
|
Complexity parameter used for Minimal Cost-Complexity Pruning. |
0.0
|
max_samples
|
int or float
|
If bootstrap is |
None
|
monotonic_cst
|
array-like of int of shape (n_features)
|
Indicates the monotonicity constraint to enforce on each feature. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
n_features_in_ |
int
|
Number of features seen during |
feature_names_in_ |
ndarray of shape (`n_features_in_`)
|
Names of features seen during fit. Defined only when |
estimator_type_dict_ |
dict[str, str]
|
Dictionary mapping target names to their random forest type ("regression" or "classification"). |
estimators_ |
list [`RandomForestRegressor`|`RandomForestClassifier`]
|
The random forests associated with each target in |
n_forests_ |
int
|
The number of forests (i.e. targets) in the ensemble. Equal to
|
n_trees_per_iteration_ |
list[int]
|
The number of trees per iteration for each forest. Set to 1 for all random forest estimators. |
tree_weights_ |
list with length `n_forests_` of ndarrays of shape
|
( |