k Nearest Neighbor Regression  
Description  k Nearest Neighbor (KNN) Regression enables you to predict new data points based on the known classification of other points. In kNN, we take a bunch of labeled points and then learn how to label other points.  
Why to use  To predict the classification of a new data point using data with multiple classes.  
When to use 
 When not to use 

Prerequisites 
 
Input  Any numerical data.  Output  Predicted classification of a new data point. 
Statistical Methods used 
 Limitations 

k Nearest Neighbor Regression is located under rubiML ( ) in Regression, in the left task pane. Use the draganddrop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to Properties of k Nearest Neighbor Regression.
Figure: k Nearest Neighbor Regression
The knearest neighbor is a simple and easytouse supervised machine learning (ML) algorithm that can be applied to solve regression and classification problems. It assumes that similar things (for example, data points with similar values) exist in proximity. It combines simple mathematical techniques with this similarity to determine the distance between different points on a graph.
The input consists of the k number of training samples that are closest to each other. The output, a class membership, depends on whether the algorithm is being used for regression or classification. In the case of regression, the mean of k labels is returned, while in the case of classification, the mode of k labels is returned.
Classification is done by a vote of majority of the k nearest neighbors, and the new data point is assigned to the class among its k closest neighbors.
Properties of k Nearest Neighbor Regression
The available properties of k Nearest Neighbor Regression are as shown in the figure given below.
Figure: Properties of k Nearest Neighbor Regression
The table given below describes the different fields present on the properties of Lasso Regression.
Table: Description of Fields present on the Properties of k Nearest Neighbor Regression
Field  Description  Remark  
Task Name  It is the name of the task selected on the workbook canvas.  You can click the text field to edit or modify the name of the task as required.  
Dependent Variable  It allows you to select the dependent variable.  You can select only one variable, and it should be of numeric type.  
Independent Variables  It allows you to select Independent variables. 
 
Advanced  Number of Neighbors  It allows you to enter the number of neighboring data points to be checked.  The default value is 5. 
Distance Method  It allows you to select the method to calculate the distance between two data points.  The available options are 
 
Dimensionality Reduction  It allows you to select the dimensionality reduction method. 

Example of k Nearest Neighbor Regression
Consider a Credit Card Balance dataset of people of different gender, age, education, and so on. A snippet of input data is shown in the figure given below.
Figure: Input Data Snippet
The table below describes the performance metrics on the result page.
Table: Description of Performance Metrics of KNN Regression
Performance Metric  Description  Remark 
RMSE (Root Mean Squared Error)  It is the square root of the averaged squared difference between the actual values and the predicted values.  It is the most commonly used metric for regression tasks. 
MAPE (Mean Absolute Percentage Error)  It is the average of absolute percentage errors.  — 
As seen in the above figure, the values for different parameters are –
Parameter  Value 
RMSE  12.9 
MAPE  3.0781 
Table of Contents