Skip to content

Moscow Mountain / St. Joes

sknnr.datasets.load_moscow_stjoes

load_moscow_stjoes(return_X_y: Literal[False] = False, as_frame: Literal[False] = False) -> Dataset[NDArray[float64]]
load_moscow_stjoes(return_X_y: Literal[False] = ..., as_frame: Literal[True] = ...) -> Dataset[DataFrame]
load_moscow_stjoes(return_X_y: Literal[True] = ..., as_frame: Literal[False] = ...) -> tuple[NDArray[float64], NDArray[float64]]
load_moscow_stjoes(return_X_y: Literal[True] = ..., as_frame: Literal[True] = ...) -> tuple[DataFrame, DataFrame]
load_moscow_stjoes(return_X_y: bool = False, as_frame: bool = False) -> Dataset[NDArray[float64]] | Dataset[DataFrame] | tuple[NDArray[float64], NDArray[float64]] | tuple[DataFrame, DataFrame]

Load the Moscow Mountain / St. Joe's dataset (Hudak 2010[^1]).

The dataset contains 165 plots with environmental, LiDAR, and forest structure measurements. Structural measurements of basal area (BA) and tree density (TD) are separated by species.

Parameters:

Name Type Description Default
return_X_y bool

If True, return the data and target as NumPy arrays instead of a Dataset.

False
as_frame bool

If True, the data and target attributes of the returned Dataset will be DataFrames instead of NumPy arrays. The frame attribute will also be added as a DataFrame with the dataset index. Pandas must be installed for this option.

False

Returns:

Type Description
Dataset or tuple of ndarray

A Dataset object containing the data, target, and feature names. If return_X_y is True, return a tuple of data and target arrays instead.

Notes

See Hudak 2010[^1] or https://cran.r-project.org/web/packages/yaImpute/yaImpute.pdf for more information on the dataset and feature names.

Reference

[^1] Hudak, A.T. (2010) Field plot measures and predictive maps for "Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data". Fort Collins, CO: U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station. https://www.fs.usda.gov/rds/archive/Catalog/RDS-2010-0012

Source code in src/sknnr/datasets/_base.py
def load_moscow_stjoes(
    return_X_y: bool = False, as_frame: bool = False
) -> (
    Dataset[NDArray[np.float64]]
    | Dataset[pd.DataFrame]
    | tuple[NDArray[np.float64], NDArray[np.float64]]
    | tuple[pd.DataFrame, pd.DataFrame]
):
    """Load the Moscow Mountain / St. Joe's dataset (Hudak 2010[^1]).

    The dataset contains 165 plots with environmental, LiDAR, and forest structure
    measurements. Structural measurements of basal area (BA) and tree density (TD)
    are separated by species.

    Parameters
    ----------
    return_X_y : bool, default=False
        If True, return the data and target as NumPy arrays instead of a Dataset.
    as_frame : bool, default=False
        If True, the `data` and `target` attributes of the returned Dataset will be
        DataFrames instead of NumPy arrays. The `frame` attribute will also be added as
        a DataFrame with the dataset index. Pandas must be installed for this
        option.

    Returns
    -------
    Dataset or tuple of ndarray
        A Dataset object containing the data, target, and feature names. If return_X_y
        is True, return a tuple of data and target arrays instead.

    Notes
    -----
    See Hudak 2010[^1] or https://cran.r-project.org/web/packages/yaImpute/yaImpute.pdf
    for more information on the dataset and feature names.

    Reference
    ---------
    [^1] Hudak, A.T. (2010) Field plot measures and predictive maps for "Nearest
    neighbor imputation of species-level, plot-scale forest structure attributes from
    LiDAR data". Fort Collins, CO: U.S. Department of Agriculture, Forest Service,
    Rocky Mountain Research Station.
    https://www.fs.usda.gov/rds/archive/Catalog/RDS-2010-0012
    """
    return load_dataset_from_csv_filenames(
        data_filename="moscow_env.csv",
        target_filename="moscow_spp.csv",
        return_X_y=return_X_y,
        as_frame=as_frame,
    )