K-Nearest Neighbor

Overview [1]#

K-Nearest Neighbor (KNN) is a Supervised Learning, non-parametric method used in Machine Learning for classification and regression

K-Nearest Neighbor in both cases, the input consists of the k closest Training dataset in the feature space. The output depends on whether k-NN is used for classification or regression:

  • K-Nearest Neighbor classification, the output is a Classification membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
  • K-Nearest Neighbor regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.
K-Nearest Neighbor is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The K-Nearest Neighbor algorithm is among the simplest of all Machine Learning Algorithms.

Both for classification and regression, a useful technique can be to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.

The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.

A peculiarity of the K-Nearest Neighbor algorithm is that it is sensitive to the local structure of the data.

More Information#

There might be more information for this subject on one of the following: