The Data Compression Book-:Introduction to Data Compression

The Structure

This book consists of thirteen chapters and a floppy disk. The organization roughly parallels the historical progression of data compression, starting in the “dawn age” around 1950 and working up to the present.

Chapter 2 is a reference chapter which attempts to establish the fundamental data-compression lexicon. It discusses the birth of information theory, and it introduces a series of concepts, terms, buzzwords, and theories used over and over in the rest of the book. Even if you are a data-compression novice, mastery of chapter 2 will bring you up to the “cocktail party” level of information, meaning that you will be able to carry on an intelligent-sounding conversation about data compression even if you don’t fully understand its intricacies.

Chapter 3 discusses the birth of data compression, starting with variable-length bit coding. The development of Shannon-Fano coding and Huffman coding represented the birth of both data compression and information theory. These coding methods are still in wide use today. In addition, chapter 3 discusses the difference between modeling and coding—the two faces of the data-compression coin.

Standard Huffman coding suffers from a significant problem when used for high-performance data compression. The compression program has to pass a complete copy of the Huffman coding statistics to the expansion program. As the compression program collects more statistics and tries to increase its compression ratio, the statistics take up more space and work against the increased compression. Chapter 4 discusses a way to solve this dilemma: adaptive Huffman coding. This is a relatively recent innovation, due to CPU and memory requirements. Adaptive coding greatly expands the horizons of Huffman coding, leading to vastly improved compression ratios.

Huffman coding has to use an integral number of bits for each code, which is usually slightly less than optimal. A more recent innovation, arithmetic coding, uses a fractional number of bits per code, allowing it to incrementally improve compression performance. Chapter 5 explains how this recent innovation works, and it shows how to integrate an arithmetic coder with a statistical model.

Chapter 6 discusses statistical modeling. Whether using Huffman coding, adaptive Huffman coding, or arithmetic coding, it is still necessary to have a statistical model to drive the coder. This chapter shows some of the interesting techniques used to implement powerful models using limited memory resources.

Dictionary compression methods take a completely different approach to compression from the techniques discussed in the previous four chapters. Chapter 7 provides an overview of these compression methods, which represent strings of characters with single codes. Dictionary methods have become the de facto standard for general-purpose data compression on small computers due to their high-performance compression combined with reasonable memory requirements.

The fathers of dictionary-based compression, Ziv and Lempel published a paper in 1977 proposing a sliding dictionary methods of data compression which has become very popular. Chapter 8 looks at recent adaptations of LZ77 compression used in popular archiving programs such as PKZIP.

Chapter 9 takes detailed look at one of the first widely popular dictionary-based compression methods: LZW compression. LZW is the compression method used in the UNIX COMPRESS program and in earlier versions of the MS-DOS ARC program. This chapter also takes a look at the foundation of LZW compression, published in 1978 by Ziv and Lempel.

All of the compression techniques discussed through chapter 9 are “lossless.” Lossy methods can be used on speech and graphics, and they are capable of achieving dramatically higher compression ratios. Chapter 10 shows how lossy compression can be used on digitized sound data which techniques like linear predictive coding and adaptive PCM.

Chapter 11 discusses lossy compression techniques applied to computer graphics. The industry is standardizing rapidly on the JPEG standard for compressing graphical images. The techniques used in the JPEG standard will be presented in this chapter.

Chapter 12 describes how to put it all together into an archive program. A general-purpose archiving program should be able to compress and decompress files while keeping track of files names, dates, attributes, compression ratios, and compression methods. An archive format should ideally be portable to different types of computers. A sample archive program is developed, which applies the techniques used in previous chapters to put together a complete program.

Chapter 13 is a detailed look at fractal compression techniques. The world of fractal compression offers some exciting methods of achieving maximum compression for your data.

Table of Contents