Hamming distance
In
information theory, the
Hamming distance is the number of positions in two strings of equal length for which the corresponding elements are different. Put another way, it measures the number of
substitutions required to change one into the other. It was named after
Richard Hamming.
The Hamming distance is used in telecommunication to count the number of flipped bits in a fixed-length binary word, an estimate of error, and so is sometimes called the signal distance. It corresponds to the weight (number of ones) in the XOR of the words, or to the Manhattan distance between two vertices in an n-dimensional hypercube, where n is the length of the words.
Some examples:
- The Hamming distance between 1011101 and 1001001 is 2.
- The Hamming distance between 2143896 and 2233796 is 3.
- The Hamming distance between "toned" and "roses" is 3.
For comparing strings of different lengths, or strings where insertions or deletions are expected, not just substitutions, a more sophisticated metric like the
Levenshtein distance is more appropriate.
Adapted from Federal Standard 1037C.