GNNRegressor
sknnr.GNNRegressor ¶
GNNRegressor(n_neighbors: int = 5, *, n_components: int | None = None, weights: Literal['uniform', 'distance'] | Callable = 'uniform', algorithm: Literal['auto', 'ball_tree', 'kd_tree', 'brute'] = 'auto', leaf_size: int = 30, p: int = 2, metric: str | Callable | DistanceMetric = 'minkowski', metric_params: dict | None = None, n_jobs: int | None = None)
Bases: YFitMixin, OrdinationKNeighborsRegressor
Regression using Gradient Nearest Neighbor (GNN) imputation.
The target is predicted by local interpolation of the targets associated with the nearest neighbors in the training set, with distances calculated in transformed Canonical Correspondence Analysis (CCA) space.
See sklearn.neighbors.KNeighborsRegressor for more information on parameters
and implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_neighbors
|
int
|
Number of neighbors to use by default for |
5
|
n_components
|
int
|
Number of components to keep during CCA transformation. If |
None
|
weights
|
(uniform, distance)
|
Weight function used in prediction. |
'uniform'
|
algorithm
|
(auto, ball_tree, kd_tree, brute)
|
Algorithm used to compute the nearest neighbors. |
'auto'
|
leaf_size
|
int
|
Leaf size passed to |
30
|
p
|
int
|
Power parameter for the Minkowski metric. |
2
|
metric
|
str or callable
|
The distance metric to use for the tree, calculated in CCA space. |
'minkowski'
|
metric_params
|
dict
|
Additional keyword arguments for the metric function. |
None
|
n_jobs
|
int
|
The number of parallel jobs to run for neighbors search. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
independent_prediction_ |
array-like of shape (n_samples, n_outputs)
|
The independent predictions for each sample in the training set,
obtained by calculating |
independent_score_ |
float
|
The independent score (i.e. coefficient of determination or R²) for the model, obtained by calculating the average R² across all outputs. |
n_features_in_ |
int
|
Number of features seen during |
regressor_ |
RawKNNRegressor
|
The underlying RawKNNRegressor instance. |
transformer_ |
CCATransformer
|
The fitted CCA transformer used to transform feature data. |
y_fit_ |
array-like of shape (n_samples, n_targets)
|
The target matrix seen during fit. Note that |
References
Ohmann JL, Gregory MJ. 2002. Predictive Mapping of Forest Composition and Structure with Direct Gradient Analysis and Nearest Neighbor Imputation in Coastal Oregon, USA. Canadian Journal of Forest Research, 32, 725–741.
Source code in src/sknnr/_base.py
Attributes¶
Functions¶
fit ¶
Fit using transformed feature data. If y_fit is provided, it will be used to fit the transformer.
kneighbors ¶
kneighbors(X: DataLike | None = None, n_neighbors: int | None = None, return_distance: bool = True, return_dataframe_index: bool = False, use_deterministic_ordering: bool = True) -> NDArray[int64] | tuple[NDArray[float64], NDArray[int64]]
Find the K-neighbors of a point or points of transformed feature data and optionally return dataframe indexes rather than array indices when the model was fitted with a dataframe.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
array-like of shape (n_queries, n_features)
|
The query point or points. Points are first transformed using the fitted transformer. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor. |
None
|
n_neighbors
|
int
|
Number of neighbors required for each sample. The default is the value passed to the constructor. |
None
|
return_distance
|
bool
|
Whether or not to return the distances. |
True
|
return_dataframe_index
|
bool
|
Whether or not to return dataframe indexes instead of array indices. Only applicable if the model was fitted with a dataframe. |
False
|
use_deterministic_ordering
|
bool
|
Whether to use deterministic ordering of neighbors when distances
are nearly identical. If True, neighbors with nearly identical
distances (up to DISTANCE_PRECISION_DECIMALS decimal places) are
ordered lexicographically by:
(1) their scaled and rounded distances,
(2) the absolute difference between a query point's row index
and the neighbor index (so that a sample, when present, is
returned before other equally distant samples), and
(3) the neighbor index iself.
If False, use the default ordering from
|
True
|
Returns:
| Name | Type | Description |
|---|---|---|
neigh_dist |
array-like of shape (n_queries, n_neighbors)
|
Array representing the lengths to points, only present if return_distance=True. |
neigh_ind |
array-like of shape (n_queries, n_neighbors)
|
Array indices or dataframe indexes of the nearest points in the population matrix. |