EuclideanKNNRegressor

sknnr.EuclideanKNNRegressor ¶

EuclideanKNNRegressor(n_neighbors: int = 5, *, weights: Literal['uniform', 'distance'] | Callable = 'uniform', algorithm: Literal['auto', 'ball_tree', 'kd_tree', 'brute'] = 'auto', leaf_size: int = 30, p: int = 2, metric: str | Callable | DistanceMetric = 'minkowski', metric_params: dict | None = None, n_jobs: int | None = None)

Bases: TransformedKNeighborsRegressor

Nearest neighbor regression in an n-dimensional feature space where features have been standardized to have unit variance (mean=0, stdev=1) using a modified StandardScaler that uses N-1 degrees of freedom.

See sklearn.neighbors.KNeighborsRegressor for more information on available parameters for k-neighbors regression used in instantiation.

Parameters:

Name	Type	Description	Default
`n_neighbors`	`int`	Number of neighbors to use by default for `kneighbors` queries.	`5`
`weights`	`(uniform, distance)`	Weight function used in prediction.	`'uniform'`
`algorithm`	`(auto, ball_tree, kd_tree, brute)`	Algorithm used to compute the nearest neighbors.	`'auto'`
`leaf_size`	`int`	Leaf size passed to `BallTree` or `KDTree`.	`30`
`p`	`int`	Power parameter for the Minkowski metric.	`2`
`metric`	`str or callable`	The distance metric to use for the tree, calculated in standardized Euclidean space.	`'minkowski'`
`metric_params`	`dict`	Additional keyword arguments for the metric function.	`None`
`n_jobs`	`int`	The number of parallel jobs to run for neighbors search. `None` means 1 unless in a `joblib.parallel_backend` context. `-1` means using all processors.	`None`

Attributes:

Name	Type	Description
`independent_prediction_`	`array-like of shape (n_samples, n_outputs)`	The independent predictions for each sample in the training set, obtained by calculating `kneighbors` on the training data itself and calculating predictions based on those neighbors.
`independent_score_`	`float`	The independent score (i.e. coefficient of determination or R²) for the model, obtained by calculating the average R² across all outputs.
`n_features_in_`	`int`	Number of features seen during `fit`.
`regressor_`	`RawKNNRegressor`	The underlying RawKNNRegressor instance.
`transformer_`	`StandardScalerWithDOF`	The fitted transformer used to standardize feature data.

Source code in src/sknnr/_base.py

def __init__(
    self,
    n_neighbors: int = 5,
    *,
    weights: Literal["uniform", "distance"] | Callable = "uniform",
    algorithm: Literal["auto", "ball_tree", "kd_tree", "brute"] = "auto",
    leaf_size: int = 30,
    p: int = 2,
    metric: str | Callable | DistanceMetric = "minkowski",
    metric_params: dict | None = None,
    n_jobs: int | None = None,
):
    # Store initialization parameters for the RawKNNRegressor, but do not
    # instantiate it yet.  It will be instantiated during `fit`, after the
    # transformer has been fitted.
    self.n_neighbors = n_neighbors
    self.weights = weights
    self.algorithm = algorithm
    self.leaf_size = leaf_size
    self.p = p
    self.metric = metric
    self.metric_params = metric_params
    self.n_jobs = n_jobs

Attributes¶

algorithm `instance-attribute` ¶

algorithm = algorithm

leaf_size `instance-attribute` ¶

leaf_size = leaf_size

metric `instance-attribute` ¶

metric = metric

metric_params `instance-attribute` ¶

metric_params = metric_params

n_jobs `instance-attribute` ¶

n_jobs = n_jobs

n_neighbors `instance-attribute` ¶

n_neighbors = n_neighbors

p `instance-attribute` ¶

p = p

regressor_ `instance-attribute` ¶

regressor_: RawKNNRegressor

transformer_ `instance-attribute` ¶

transformer_: TransformerMixin

weights `instance-attribute` ¶

weights = weights

Functions¶

fit ¶

fit(X: DataLike, y: DataLike) -> Self

Fit using transformed feature data.

Source code in src/sknnr/_base.py

def fit(self, X: DataLike, y: DataLike) -> Self:
    """Fit using transformed feature data."""
    validate_data(self, X=X, y=y, ensure_all_finite=True, multi_output=True)

    # Set the fitted transformer and apply the transformation which serves
    # as input to the KNeighbors regressor.  If estimators derive any custom
    # parameters to pass to the regressor, they should be set as estimator
    # attributes during `_set_fitted_transformer`.
    self._set_fitted_transformer(X, y)

    X_transformed = self.transformer_.transform(X)

    # Initialize and fit the KNeighbors regressor using the transformed data.
    # Override any additional regressor init kwargs provided by subclasses.
    reg_init_kwargs = {
        "n_neighbors": self.n_neighbors,
        "weights": self.weights,
        "algorithm": self.algorithm,
        "leaf_size": self.leaf_size,
        "p": self.p,
        "metric": self.metric,
        "metric_params": self.metric_params,
        "n_jobs": self.n_jobs,
    }
    reg_init_kwargs.update(self._get_additional_regressor_init_kwargs())
    self.regressor_ = RawKNNRegressor(**reg_init_kwargs)
    self.regressor_.fit(X_transformed, y)

    # `X_transformed` is guaranteed to be array-like here, so we can set
    # dataframe indexes from `X` in the regressor if applicable.
    self.regressor_._set_dataframe_index_in(X)

    # Set the number of features to be equal to that of the transformed
    # features
    self.n_features_in_ = self.regressor_.n_features_in_

    # Copy over mixin attributes from the regressor
    self.independent_prediction_ = self.regressor_.independent_prediction_
    self.independent_score_ = self.regressor_.independent_score_
    if hasattr(self.regressor_, "dataframe_index_in_"):
        self.dataframe_index_in_ = self.regressor_.dataframe_index_in_

    return self

kneighbors ¶

kneighbors(X: DataLike | None = None, n_neighbors: int | None = None, return_distance: bool = True, return_dataframe_index: bool = False, use_deterministic_ordering: bool = True) -> NDArray[int64] | tuple[NDArray[float64], NDArray[int64]]

Find the K-neighbors of a point or points of transformed feature data and optionally return dataframe indexes rather than array indices when the model was fitted with a dataframe.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_queries, n_features)`	The query point or points. Points are first transformed using the fitted transformer. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.	`None`
`n_neighbors`	`int`	Number of neighbors required for each sample. The default is the value passed to the constructor.	`None`
`return_distance`	`bool`	Whether or not to return the distances.	`True`
`return_dataframe_index`	`bool`	Whether or not to return dataframe indexes instead of array indices. Only applicable if the model was fitted with a dataframe.	`False`
`use_deterministic_ordering`	`bool`	Whether to use deterministic ordering of neighbors when distances are nearly identical. If True, neighbors with nearly identical distances (up to DISTANCE_PRECISION_DECIMALS decimal places) are ordered lexicographically by: (1) their scaled and rounded distances, (2) the absolute difference between a query point's row index and the neighbor index (so that a sample, when present, is returned before other equally distant samples), and (3) the neighbor index iself. If False, use the default ordering from `KNeighborsRegressor.kneighbors`. See the usage guide for more details.	`True`

Returns:

Name	Type	Description
`neigh_dist`	`array-like of shape (n_queries, n_neighbors)`	Array representing the lengths to points, only present if return_distance=True.
`neigh_ind`	`array-like of shape (n_queries, n_neighbors)`	Array indices or dataframe indexes of the nearest points in the population matrix.

Source code in src/sknnr/_base.py

def kneighbors(
    self,
    X: DataLike | None = None,
    n_neighbors: int | None = None,
    return_distance: bool = True,
    return_dataframe_index: bool = False,
    use_deterministic_ordering: bool = True,
) -> NDArray[np.int64] | tuple[NDArray[np.float64], NDArray[np.int64]]:
    """
    Find the K-neighbors of a point or points of transformed feature data
    and optionally return dataframe indexes rather than array indices when
    the model was fitted with a dataframe.

    Parameters
    ----------
    X : array-like of shape (n_queries, n_features), default=None
        The query point or points. Points are first transformed using the
        fitted transformer. If not provided, neighbors of each indexed
        point are returned. In this case, the query point is not
        considered its own neighbor.
    n_neighbors : int, default=None
        Number of neighbors required for each sample. The default is the
        value passed to the constructor.
    return_distance : bool, default=True
        Whether or not to return the distances.
    return_dataframe_index : bool, default=False
        Whether or not to return dataframe indexes instead of array indices.
        Only applicable if the model was fitted with a dataframe.
    use_deterministic_ordering : bool, default=True
        Whether to use deterministic ordering of neighbors when distances
        are nearly identical.  If True, neighbors with nearly identical
        distances (up to DISTANCE_PRECISION_DECIMALS decimal places) are
        ordered lexicographically by:
        (1) their scaled and rounded distances,
        (2) the absolute difference between a query point's row index
            and the neighbor index (so that a sample, when present, is
            returned before other equally distant samples), and
        (3) the neighbor index iself.
        If False, use the default ordering from
        `KNeighborsRegressor.kneighbors`. See the
        [usage guide](`../../../usage/#deterministic-neighbor-ordering`)
        for more details.

    Returns
    -------
    neigh_dist : array-like of shape (n_queries, n_neighbors)
        Array representing the lengths to points, only present if
        return_distance=True.
    neigh_ind : array-like of shape (n_queries, n_neighbors)
        Array indices or dataframe indexes of the nearest points in the
        population matrix.
    """
    X_transformed = self._transform_X(X)
    return self.regressor_.kneighbors(
        X=X_transformed,
        n_neighbors=n_neighbors,
        return_distance=return_distance,
        return_dataframe_index=return_dataframe_index,
        use_deterministic_ordering=use_deterministic_ordering,
    )

predict ¶

predict(X: DataLike) -> NDArray[float64]

Source code in src/sknnr/_base.py

def predict(self, X: DataLike) -> NDArray[np.float64]:
    X_transformed = self._transform_X(X)
    return self.regressor_.predict(X_transformed)

score ¶

score(X: DataLike, y: DataLike) -> float

Source code in src/sknnr/_base.py

def score(self, X: DataLike, y: DataLike) -> float:
    X_transformed = self._transform_X(X)
    return self.regressor_.score(X_transformed, y)

EuclideanKNNRegressor

sknnr.EuclideanKNNRegressor ¶

Attributes¶

algorithm instance-attribute ¶

leaf_size instance-attribute ¶

metric instance-attribute ¶

metric_params instance-attribute ¶

n_jobs instance-attribute ¶

n_neighbors instance-attribute ¶

p instance-attribute ¶

regressor_ instance-attribute ¶

transformer_ instance-attribute ¶

weights instance-attribute ¶

Functions¶

fit ¶

kneighbors ¶

predict ¶

score ¶

algorithm `instance-attribute` ¶

leaf_size `instance-attribute` ¶

metric `instance-attribute` ¶

metric_params `instance-attribute` ¶

n_jobs `instance-attribute` ¶

n_neighbors `instance-attribute` ¶

p `instance-attribute` ¶

regressor_ `instance-attribute` ¶

transformer_ `instance-attribute` ¶

weights `instance-attribute` ¶