k Nearest Neighbor Regression
k Nearest Neighbor (KNN) Regression enables you to predict new data points based on the known classification of other points. In kNN, we take a bunch of labeled points and then learn how to label other points.
Why to use
To predict the classification of a new data point using data with multiple classes.
When to use
When not to use
Any numerical data.
Predicted classification of a new data point.
Statistical Methods used
k Nearest Neighbor Regression is located under rubiML ( ) in Regression, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to Properties of k Nearest Neighbor Regression.
Figure: k Nearest Neighbor Regression
The k-nearest neighbor is a simple and easy-to-use supervised machine learning (ML) algorithm that can be applied to solve regression and classification problems. It assumes that similar things (for example, data points with similar values) exist in proximity. It combines simple mathematical techniques with this similarity to determine the distance between different points on a graph.
The input consists of the k number of training samples that are closest to each other. The output, a class membership, depends on whether the algorithm is being used for regression or classification. In the case of regression, the mean of k labels is returned, while in the case of classification, the mode of k labels is returned.
Classification is done by a vote of majority of the k nearest neighbors, and the new data point is assigned to the class among its k closest neighbors.
Properties of k Nearest Neighbor Regression
The available properties of k Nearest Neighbor Regression are as shown in the figure given below.
Figure: Properties of k Nearest Neighbor Regression
The table given below describes the different fields present on the properties of Lasso Regression.
Table: Description of Fields present on the Properties of k Nearest Neighbor Regression
It is the name of the task selected on the workbook canvas.
You can click the text field to edit or modify the name of the task as required.
It allows you to select the dependent variable.
You can select only one variable, and it should be of numeric type.
It allows you to select Independent variables.
Number of Neighbors
It allows you to enter the number of neighboring data points to be checked.
The default value is 5.
It allows you to select the method to calculate the distance between two data points.
The available options are -
It allows you to select the dimensionality reduction method.
Example of k Nearest Neighbor Regression
Consider a Credit Card Balance dataset of people of different gender, age, education, and so on. A snippet of input data is shown in the figure given below.
Figure: Input Data Snippet
The table below describes the performance metrics on the result page.
Table: Description of Performance Metrics of KNN Regression
RMSE (Root Mean Squared Error)
It is the square root of the averaged squared difference between the actual values and the predicted values.
It is the most commonly used metric for regression tasks.
MAPE (Mean Absolute Percentage Error)
It is the average of absolute percentage errors.
As seen in the above figure, the values for different parameters are –