In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analysed. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set.
Formally, the Mahalanobis distance from a group of values with mean
for a multivariate vector
is defined as:
Mahalanobis distance can also be defined as dissimilarity measure between two random vectors
and of the same distribution with the covariance matrix
If the covariance matrix is the identity matrix then it is the same as Euclidean distance. If covariance matrix is diagonal, then it is called normalized Euclidean distance:
is the standard deviation of the over the sample set.