RawKNNRegressor
sknnr.RawKNNRegressor ¶
Bases: DFIndexCrosswalkMixin, IndependentPredictorMixin, KNeighborsRegressor
Subclass of sklearn.neighbors.KNeighborsRegressor to support independent
prediction and scoring and crosswalk array indices to dataframe indexes.
See sklearn.neighbors.KNeighborsRegressor for more information on
available parameters for k-neighbors regression used in instantiation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_neighbors
|
int
|
Number of neighbors to use by default for |
5
|
weights
|
('uniform', 'distance')
|
Weight function used in prediction. |
'uniform'
|
algorithm
|
('auto', 'ball_tree', 'kd_tree', 'brute')
|
Algorithm used to compute the nearest neighbors. |
'auto'
|
leaf_size
|
int
|
Leaf size passed to |
30
|
p
|
int
|
Power parameter for the Minkowski metric. |
2
|
metric
|
str or callable
|
The distance metric to use for the tree, calculated in standardized Euclidean space. |
'minkowski'
|
metric_params
|
dict
|
Additional keyword arguments for the metric function. |
None
|
n_jobs
|
int
|
The number of parallel jobs to run for neighbors search. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
DISTANCE_PRECISION_DECIMALS |
int, class attribute
|
Number of decimal places used when rounding scaled distances to ensure deterministic neighbor ordering. Default is 10. |
effective_metric_ |
str
|
The distance metric to use. It will be same as the metric parameter
or a synonym of it, e.g. 'euclidean' if the metric parameter set to
'minkowski' and |
effective_metric_params_ |
dict
|
Additional keyword arguments for the metric function. For most metrics
will be same with |
independent_prediction_ |
array-like of shape (n_samples, n_outputs)
|
The independent predictions for each sample in the training set,
obtained by calculating |
independent_score_ |
float
|
The independent score (i.e. coefficient of determination or R²) for the model, obtained by calculating the average R² across all outputs. |
n_features_in_ |
int
|
Number of features seen during |
n_samples_fit_ |
int
|
Number of samples in the fitted data. |
Attributes¶
Functions¶
fit ¶
Override fit to set attributes using mixins.
kneighbors ¶
kneighbors(X: DataLike | None = None, n_neighbors: int | None = None, return_distance: bool = True, return_dataframe_index: bool = False, use_deterministic_ordering: bool = True) -> NDArray[int64] | tuple[NDArray[float64], NDArray[int64]]
Find the K-neighbors of a point or points in the dataset and optionally return dataframe indexes rather than array indices when the model was fitted with a dataframe.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
array-like of shape (n_queries, n_features)
|
The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor. |
None
|
n_neighbors
|
int
|
Number of neighbors required for each sample. The default is the value passed to the constructor. |
None
|
return_distance
|
bool
|
Whether or not to return the distances. |
True
|
return_dataframe_index
|
bool
|
Whether or not to return dataframe indexes instead of array indices. Only applicable if the model was fitted with a dataframe. |
False
|
use_deterministic_ordering
|
bool
|
Whether to use deterministic ordering of neighbors when distances
are nearly identical. If True, neighbors with nearly identical
distances (up to DISTANCE_PRECISION_DECIMALS decimal places) are
ordered lexicographically by:
(1) their scaled and rounded distances,
(2) the absolute difference between a query point's row index
and the neighbor index (so that a sample, when present, is
returned before other equally distant samples), and
(3) the neighbor index iself.
If False, use the default ordering from
|
True
|
Returns:
| Name | Type | Description |
|---|---|---|
neigh_dist |
array-like of shape (n_queries, n_neighbors)
|
Array representing the lengths to points, only present if return_distance=True. |
neigh_ind |
array-like of shape (n_queries, n_neighbors)
|
Array indices or dataframe indexes of the nearest points in the population matrix. |