Previous Table of Contents Next


History

While the publication of the two papers in 1977 and 1978 may have had an immediate impact in the world of information theory, it was some time before programmers noticed the effects. In fact, it took the publication of another paper in 1984 to really get things moving.

The June 1984 issue of IEEE Computer had an article entitled “A Technique for High-Performance Data Compression” by Terry Welch. Welch described work performed at Sperry Research Center (now part of Unisys). His paper was a practical description of this implementation of the LZ78 algorithm, which he called LZW. It discussed the LZW compression algorithm in reference to its possible use in disk and tape-drive controllers, but it was clear that the same algorithm could easily be built into a general-purpose compression program.

Almost immediately after the article appeared, work began on the Unix compress program. compress is a C program developed initially on the DEC’s VAX. Ports to other machines, including the IBM PC, followed shortly. The public release of compress became available almost exactly a year after the publication of the IEEE article.

Compress was a very influential program for a number of reasons. The program was well written. It performed well, and it had a reasonable level of documentation. Many UNIX installations began actively using compress soon after its release. Manual pages distributed with UNIX systems are now routinely shipped in compressed form, and they are not decompressed until accessed for the first time by the man program. The code was in the public domain from its initial release, which made for wide distribution and study. Perhaps most importantly, the authors went out of their way to ensure that the code was portable so that it could be used on a wide variety of systems with no modifications.

While compress was becoming a standard in the UNIX community, desktop software was still struggling along with a rather inefficient order-0 Huffman coding program known as SQ. But in 1985, desktop power was increasing; more and more people were using modems to communicate; and hard-disk space was still relatively expensive. Conditions were ripe for an improvement in compression, and dictionary-based coding stepped in.

ARC: The Father of MS-DOS Dictionary Compression

In 1985, System Enhancement Associates released a general-purpose compression and cataloging program called ARC. ARC quickly took the MS-DOS desktop world by storm, becoming a de facto standard for PC users in a matter of months. Several factors helped ARC gain this position. First, it ordinarily used a close derivative of compress to compress files. At the time, this provided state-of-the-art compression and was essentially without peer. Second, ARC provided a cataloging or archiving function as an integral part of the program. UNIX users were accustomed to using the “tar” program to combine groups of files into a single archive, but PC users did not have a similar function as part of their operating system. ARC added that capability, vital for transferring groups of files by modem or even floppy diskette. Finally, ARC was distributed as shareware, which helped saturate the user base in a short time.

With compress reigning supreme in the UNIX world and ARC ruling the MS-DOS world, it seemed LZ78 would be the dominant compression method for years. Imitators such as PKWare’s PKARC only strengthened LZ78’s hold by providing performance improvements in both speed and compression ratios. But oddly enough, in recent years the field has taken a step back, if you consider moving from LZ78 to LZ77 a step backwards.

ARC lost its dominance of the desktop world to new contenders, most notably PKZIP, by PKWare; but also LHarc, by Haruyasu Yoshizaki; and ARJ, by Robert Jung. These programs are built on an LZ77 algorithm which uses a dictionary based on a sliding window that moves through the text. LZ77 was not a practical algorithm to implement until refinements were made in the mid 1980s. Now LZ77 has a legitimate position alongside LZ78 as co-ruler of the general-purpose compression world.

Most recently, a patent dispute between Unisys, which owns the patent for LZ78-derived algorithms (Terry Welch’s work), versus the rest of the computer industry, has resulted in a definite shift over to LZ77-derived algorithms. For example, the recently designed PNG format (discussed later in this book) is being promulgated as a replacement to Compuserve’s GIF format, in order to sidestep Unisys’ patent claims.


Previous Table of Contents Next