Skip to content

RawKNNRegressor

sknnr.RawKNNRegressor

Bases: DFIndexCrosswalkMixin, IndependentPredictorMixin, KNeighborsRegressor

Subclass of sklearn.neighbors.KNeighborsRegressor to support independent prediction and scoring and crosswalk array indices to dataframe indexes.

See sklearn.neighbors.KNeighborsRegressor for more information on available parameters for k-neighbors regression used in instantiation.

Parameters:

Name Type Description Default
n_neighbors int

Number of neighbors to use by default for kneighbors queries.

5
weights ('uniform', 'distance')

Weight function used in prediction.

'uniform'
algorithm ('auto', 'ball_tree', 'kd_tree', 'brute')

Algorithm used to compute the nearest neighbors.

'auto'
leaf_size int

Leaf size passed to BallTree or KDTree.

30
p int

Power parameter for the Minkowski metric.

2
metric str or callable

The distance metric to use for the tree, calculated in standardized Euclidean space.

'minkowski'
metric_params dict

Additional keyword arguments for the metric function.

None
n_jobs int

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

required

Attributes:

Name Type Description
DISTANCE_PRECISION_DECIMALS int, class attribute

Number of decimal places used when rounding scaled distances to ensure deterministic neighbor ordering. Default is 10.

effective_metric_ str

The distance metric to use. It will be same as the metric parameter or a synonym of it, e.g. 'euclidean' if the metric parameter set to 'minkowski' and p parameter set to 2.

effective_metric_params_ dict

Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to 'minkowski'.

independent_prediction_ array-like of shape (n_samples, n_outputs)

The independent predictions for each sample in the training set, obtained by calculating kneighbors on the training data itself and calculating predictions based on those neighbors.

independent_score_ float

The independent score (i.e. coefficient of determination or R²) for the model, obtained by calculating the average R² across all outputs.

n_features_in_ int

Number of features seen during fit.

n_samples_fit_ int

Number of samples in the fitted data.

Attributes

DISTANCE_PRECISION_DECIMALS class-attribute instance-attribute

DISTANCE_PRECISION_DECIMALS = 10

Functions

fit

fit(X: DataLike, y: DataLike) -> Self

Override fit to set attributes using mixins.

Source code in src/sknnr/_base.py
def fit(self, X: DataLike, y: DataLike) -> Self:
    """Override fit to set attributes using mixins."""
    self._set_dataframe_index_in(X)
    self = super().fit(X, y)
    self._set_independent_prediction_attributes(y)
    return self

kneighbors

kneighbors(X: DataLike | None = None, n_neighbors: int | None = None, return_distance: bool = True, return_dataframe_index: bool = False, use_deterministic_ordering: bool = True) -> NDArray[int64] | tuple[NDArray[float64], NDArray[int64]]

Find the K-neighbors of a point or points in the dataset and optionally return dataframe indexes rather than array indices when the model was fitted with a dataframe.

Parameters:

Name Type Description Default
X array-like of shape (n_queries, n_features)

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

None
n_neighbors int

Number of neighbors required for each sample. The default is the value passed to the constructor.

None
return_distance bool

Whether or not to return the distances.

True
return_dataframe_index bool

Whether or not to return dataframe indexes instead of array indices. Only applicable if the model was fitted with a dataframe.

False
use_deterministic_ordering bool

Whether to use deterministic ordering of neighbors when distances are nearly identical. If True, neighbors with nearly identical distances (up to DISTANCE_PRECISION_DECIMALS decimal places) are ordered lexicographically by: (1) their scaled and rounded distances, (2) the absolute difference between a query point's row index and the neighbor index (so that a sample, when present, is returned before other equally distant samples), and (3) the neighbor index iself. If False, use the default ordering from KNeighborsRegressor.kneighbors. See the usage guide for more details.

True

Returns:

Name Type Description
neigh_dist array-like of shape (n_queries, n_neighbors)

Array representing the lengths to points, only present if return_distance=True.

neigh_ind array-like of shape (n_queries, n_neighbors)

Array indices or dataframe indexes of the nearest points in the population matrix.

Source code in src/sknnr/_base.py
def kneighbors(
    self,
    X: DataLike | None = None,
    n_neighbors: int | None = None,
    return_distance: bool = True,
    return_dataframe_index: bool = False,
    use_deterministic_ordering: bool = True,
) -> NDArray[np.int64] | tuple[NDArray[np.float64], NDArray[np.int64]]:
    """
    Find the K-neighbors of a point or points in the dataset and optionally
    return dataframe indexes rather than array indices when the model was
    fitted with a dataframe.

    Parameters
    ----------
    X : array-like of shape (n_queries, n_features), default=None
        The query point or points. If not provided, neighbors of each
        indexed point are returned. In this case, the query point is not
        considered its own neighbor.
    n_neighbors : int, default=None
        Number of neighbors required for each sample. The default is the
        value passed to the constructor.
    return_distance : bool, default=True
        Whether or not to return the distances.
    return_dataframe_index : bool, default=False
        Whether or not to return dataframe indexes instead of array indices.
        Only applicable if the model was fitted with a dataframe.
    use_deterministic_ordering : bool, default=True
        Whether to use deterministic ordering of neighbors when distances
        are nearly identical.  If True, neighbors with nearly identical
        distances (up to DISTANCE_PRECISION_DECIMALS decimal places) are
        ordered lexicographically by:
        (1) their scaled and rounded distances,
        (2) the absolute difference between a query point's row index
            and the neighbor index (so that a sample, when present, is
            returned before other equally distant samples), and
        (3) the neighbor index iself.
        If False, use the default ordering from
        `KNeighborsRegressor.kneighbors`. See the
        [usage guide](`../../../usage/#deterministic-neighbor-ordering`)
        for more details.

    Returns
    -------
    neigh_dist : array-like of shape (n_queries, n_neighbors)
        Array representing the lengths to points, only present if
        return_distance=True.
    neigh_ind : array-like of shape (n_queries, n_neighbors)
        Array indices or dataframe indexes of the nearest points in the
        population matrix.
    """
    neigh_dist, neigh_ind = super().kneighbors(
        X=X, n_neighbors=n_neighbors, return_distance=True
    )

    if use_deterministic_ordering:
        row_scale = np.maximum(neigh_dist.max(axis=1, keepdims=True), 1.0)
        rounded = np.round(
            neigh_dist / row_scale, decimals=self.DISTANCE_PRECISION_DECIMALS
        )
        neigh_ind_diff = np.abs(neigh_ind - np.arange(len(neigh_ind))[:, None])
        sorted_indices = np.lexsort((neigh_ind, neigh_ind_diff, rounded), axis=1)

        neigh_dist = np.take_along_axis(neigh_dist, sorted_indices, axis=1)
        neigh_ind = np.take_along_axis(neigh_ind, sorted_indices, axis=1)

    if return_dataframe_index:
        msg = "Dataframe indexes can only be returned when fitted with a dataframe."
        check_is_fitted(self, "dataframe_index_in_", msg=msg)
        neigh_ind = self.dataframe_index_in_[neigh_ind]

    return (neigh_dist, neigh_ind) if return_distance else neigh_ind