In computer science, Huffman coding is an entropy encoding algorithm used for data compression.
It was developed by David A. Huffman and published 1952 A Method for the Construction of Minimum-Redundancy Codes.
The basic idea is borrowed from an older and slightly less efficient method called Shannon-Fano coding.
The text to be compressed is considered as a string of symbols. Symbols that are likely to be frequent are represented by a short sequence of bits, and symbols that are likely to be rare are represented by a longer sequences of bits.
Huffman coding uses a specific method for choosing the representations for each symbol, resulting in a prefix-free code (i.e. no bit string of any symbol is a prefix of the bit string of any other symbol). It has been proven that Huffman coding is the most effective compression method of this type. That is, no other mapping of source symbols to strings of bits will produce a smaller output when the actual symbol frequencies agree with those used to create the code. For a set of symbols whose cardinality is a power of two and a uniform probability distribution, Huffman coding is equivalent to simple binary block encoding.
Huffman coding is optimal when the frequencies of input characters are powers of two. Arithmetic coding produces slight gains over Huffman coding, but in practice these gains have not been large enough to offset arithmetic coding's higher computational complexity and patent royalties (as of November 2001, IBM owns patents on the core concepts of arithmetic coding in several jurisdictions).
Huffman works by creating a binary tree of symbols:
Extreme cases of Huffman codes are connected with Fibonacci numbers. For example, see " class="external">http://mathforum.org/discuss/sci.math/t/207334. Huffman coding today is often used as a "back-end" to some other compression method. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization followed by Huffman coding.
n-ary Huffman algorithm uses the {0, 1, ..., n-1} alphabet to encode message. Built tree is n-ary one.
Huffman Template algorithm enables to use non-numerical weights (costs, frequences). For example, see http://alexvn.freeservers.com/s1/huffman_template_algorithm.html