DCT Specifics

The actual formula for the two-dimensional DCT is shown in Figure 11.6, with its partner, the IDCT, shown immediately below in Figure 11.7. The DCT is performed on an N x N square matrix of pixel values, and it yields an N x N square matrix of frequency coefficients. The formula looks somewhat intimidating at first glance, but it can be done with a relatively straightforward piece of code. Figure 11.6  The Discrete Cosine Transform

To write code to implement this function, it first becomes clear that simple table lookups can replace many terms of the equation. The two cosine terms that have to be multiplied together only need to be calculated once at the beginning for the program, and they can be stored for later use. Likewise, the C(x) terms that fall outside the summation loops can also be replaced with table lookups. Once that is done, code to compute the N-by-N portion of a display looks somewhat like that shown below:

```  for ( i = 0 ; i < N ; i++ )
for ( j = 0 ; j < N ; j++ ) {
temp = 0.0;
for ( x = 0 ; x < N ; x++ )
for ( y = 0 ; y < N ; y++ ) {
temp += Cosines[ x ][ i ] *
Cosines[ y ][ j ] *
pixel[ x ][ y ];
}
temp *= sqrt( 2 * N ) * Coefficients[ i ][ h ];
DCT[ i ][ j ] = INT_ROUND( temp );
}
```

Why Bother?

While this code fragment looks as though it may be somewhat interesting to a mathematician, why anyone would want to use it on a graphical image is not immediately obvious. After we transform the pixels to frequency coefficients, we still have just as many points as before. It doesn’t seem as if that is a particularly good way to go about compressing data. It would be much more impressive if the DCT took an N-by-N matrix of data and transformed it to an N/2 by N/2 matrix.

However, Figure 11.5 provides a clue as to what the JPEG committee sees in this algorithm. Figure 11.5 shows that the spectral representation of the audio waveform takes all the information needed to describe the waveform and packs it into the three non-zero points on the graph. So in principle we could describe the 512 points that make up the input waveform with just three points of frequency data.

The DCT accomplishes something similar when it transforms data. In the N-by-N matrix, all the elements in row 0 have a frequency component of zero in one direction of the signal. All the elements in column 0 have a frequency component of zero in the other direction. As the rows and columns move away from origin, the coefficients in the transformed DCT matrix begin to represent higher frequencies, with the highest frequencies found at position N-1 of the matrix.

This is significant because most graphical images on our computer screens are composed of low-frequency information. As it turns out, the components found in row and column 0 (the DC components) carry more useful information about the image than the higher-frequency components. As we move farther away from the DC components in the picture, we find that the coefficients not only tend to have lower values, but they become far less important for describing the picture.

So the DCT transformation identifies pieces of information in the signal that can be effectively “thrown away” without seriously compromising the quality of the image. It is hard to imagine how we would do this with a picture that hadn’t been transformed. With the image still described in spatial terms, using pixels, a program would have a difficult time figuring out which pixels are important to the overall look of the picture and which aren’t.

After defining the DCT as the transformation to be used, the JPEG committee then tackled the truly difficult work: how to “throw away” the insignificant portions of the picture. Details on that come later in this chapter.

Implementing the DCT

One of the first things that shows up when examining the DCT algorithm is that the calculation time required for each element in the DCT is heavily dependent on the size of the matrix. Since a doubly nested loop is used, the number of calculations is O(N squared): as N goes up, the amount of time required to process each element in the DCT output array will go up dramatically.

One of the consequences of this is that it is virtually impossible to perform a DCT on an entire image. The amount of calculation needed to perform a DCT transformation on even a 256-by-256 grey-scale block is prohibitively large. To get around this, DCT implementations typically break the image down into smaller, more manageable blocks. The JPEG group selected an 8-by-8 block for the size of their DCT calculation.

While increasing the size of the DCT block would probably give better compression, it doesn’t take long to reach a point of diminishing returns. Research shows that the connections between pixels tend to diminish quickly, such that pixels even fifteen or twenty positions away are of very little use as predictors. This means that a DCT block of 64-by-64 might not compress much better than if we broke it down into four 16-by-16 blocks. And to make matters worse, the computation time would be much longer.

While there is probably a good argument for using 16-by-16 blocks as the basis for DCT computations, the JPEG committee elected to stick with 8-by-8. Much of this was motivated by a desire to allow for practical implementations that could be built using today’s technology. This type of compression is referred to as “block coding.”