Main Page | See live article | Alphabetical index

LZW compression algorithm

LZW (Lempel-Ziv-Welch) is a lossless data compression algorithm. Most of this method was invented and published by Lempel and Ziv in 1978; this is known as the LZ78 algorithm. A few details and improvements were later given by Welch in 1984.

Description of the algorithm

The key insight of the method is that it is possible to automatically build a dictionary of previously seen strings in the text being compressed. The dictionary does not have to be transmitted with the compressed text, since the decompressor can build it the same way the compressor does, and if coded correctly, will have exactly the same strings that the compressor dictionary had at the same point in the text.

The dictionary starts off with 256 entries, one for each possible single byte string. Every time a string not already in the dictionary is seen, a longer string consisting of that string with the single character following it, is stored in the dictionary.

The output consists of integer indices into the dictionary. These initially are 9 bits each, and as the dictionary grows, can increase to up to 16 bits. A special symbol is reserved for "flush the dictionary" which takes the dictionary back to the original 256 entries, and 9 bit indices. This is useful if compressing a text which has variable charactistics, so that a dictionary of early material is no longer much use later in the text.

This use of variable increasing index sizes is one of Welch's contributions. Another was to specify an efficient data structure to store the dictionary.

Uses

The method became moderately widely used in the program "compress" which became a more or less standard utility in Unix systems circa 1986. (It has since disappeared from many for both legal and technical reasons.) Several other popular compression utilities also used the method, or closely related ones.

It became very widely used after it became part of the GIF image format in 1987. It may also (optional) be used in TIFF-files.

LZW compression provided a better compression ratio, in most applications, than any well known method available up to that time. It became the first widely used general purpose data compression method on computers. On large English texts, it typically compressed to about half original size. Other kinds of data were also quite usefully compressed in many cases.

LZW compression has now been superseded for most purposes by DEFLATE and Burrows-Wheeler transform methods, partly because these newer methods compress better in most cases, and partly for legal reasons. However the GIF image format with LZW compression is still in widespread use in 2001.

Patent issues

Various patents have been issued in the USA and other countries for LZW and similar algorithms. LZ78 was covered by US patent 4,464,650 by Lempel, Ziv, Cohn, and Eastman, assigned to Sperry Corporation, later Unisys Corporation, filed on August 10, 1981, and presumably now expired. Two US patents were issued for the LZW algorithm: 4,814,746 by Victor S. Miller and Mark N. Wegman and assigned to IBM, originally filed on June 1, 1983, and 4,558,302 by Welch, assigned to Sperry Corporation, later Unisys Corporation, filed on June 20, 1983.

US patent 4,558,302 is the one that has caused the most controversy. Unisys at one time granted royalty-free patent licenses to developers of free software and freely available proprietary software; the company terminated those licenses in August 1999. Many legal experts have concluded that the patent does not cover devices that can only uncompress LZW data and cannot compress it; for this reason, the popular Gzip program can read .Z files but cannot write them.

It was reported in Debian Weekly News based on a comp.compression thread, that the Unisys patent in the USA expired on December 20, 2002 - 17 years and 10 days after it was granted. Most other sources claim the patent expired in June 2003, 20 years after it was filed, because 35 USC §154(c)(1) specifies that patents subsisting as of six months after the enactment of the Uruguay Round Agreements Act last for the greater of 17 years after grant and 20 years after filing.

Additional Resources