#### Matrix Multiplication

The definition of the DCT shown above is a relatively straightforward, doubly nested loop. The inner element of the loop gets executed N*N times for every DCT element that is calculated. The inner line of the loop has two multiplication operations and a single addition operation.

A considerably more efficient form of the DCT can be calculated using matrix operations. To perform this operation, we first create an N-by-N matrix known as the Cosine Transform matrix, C. This matrix is defined by the equation shown in Figure 11.8.

Figure 11.8  The Cosine Tranform Matrix.

Once the Cosine Transform matrix has been built, we transpose it by rotating it around the main diagonal. This matrix is referred to in code as Ct, the Transposed Cosine Transform matrix. Building this matrix is done only once during program initialization. Both matrices can be built at the same time with a relatively short loop, shown below:

``` for ( j = 0 ; j < N ; j++ ) {
C[ 0 ][ j ] = 1.0 / sqrt( N );
Ct[ j ][ 0 ] = C[ 0 ][ j ];
}
for ( i = 1 ; i < N ; i++ ) {
for ( j = 0; j < N ; j ++ ) {
C[ i ][ j ] = sqrt( 2.0 / N ) *
cos( ( 2 * j + 1 ) * i * pi
/ ( 2.0 * N ) );
Ct[ j ][ i ] = C[ i ][ j ];
}
}
```

Once these two matrices have been built, we can take advantage of the alternative definition of the DCT function:

```DCT = C * Pixels * Ct
```

In this particular equation, the ‘*’ operator refers to matrix multiplication, not normal arithmetic multiplication. Each factor in the equation is an N-by-N matrix. In the case of the JPEG algorithm and the program used to illustrate this chapter, the matrices are 8 by 8.

When multiplying two square matrices together, the arithmetic cost of each element of the output matrix will be N multiplication operations and N addition operations. Since we perform two matrix multiplications to create the DCT matrix, each element in the transformed DCT matrix was created at the cost of 2N multiplications and additions, a considerable improvement over the nested loop definition of the DCT used earlier.

```/* MatrixMultiply( temp, input, Ct ); */
for ( i = 0 ; i < N ; i++ ) {
for ( j = 0 ; j < N ; j++ ) {
temp[ i ][ j ] = 0.0;
for ( k = 0 ; k < N ; k++ )
temp[ i ][ j ] += ( pixel[ i ][ k ] ) * Ct[ k ][ j ];
}
}

/* MatrixMultiply( output, C, temp ); */

for ( i = 0 ; i < N ; i++ ) {
for ( j = 0 ; j < N ; j++ ) {
temp1 = 0.0;
for ( k = 0 ; k < N ; k++ )
temp1 += C[ i ][ k ] * temp[ k ][ j ];
DCT[ i ][ j ] = temp1;
}
}
```

A sample piece of code that implements the DCT via matrix arithmetic is shown above. Note that the code is essentially nothing more than a set of two triply nested loops. The first set of loops multiplies the transposed Cosine Transform Matrix by the input set of pixels, creating a temporary matrix. The temporary matrix is then multiplied by the Cosine Transform matrix, which results in the output, the DCT matrix.

### Continued Improvements

The versions of the DCT presented here perform the same operations as those used in commercial implementations, but without several more optimization steps needed to produce JPEG compressors that operate in something approaching real time.

One improvement that can be made to the algorithm is to develop versions of the algorithm that only use integer arithmetic. To achieve the accuracy needed for faithful reproduction, the versions to the program tested in this chapter all stick with reliable floating point math. It is possible, however, to develop versions of the DCT that use scaled integer math, which is considerably faster on most platforms.

Since the DCT is related to the Discrete Fourier Transform, it shouldn’t be surprising that many of the techniques used to speed up the family of Fourier Transforms can also be applied to the DCT. In fact, people all over the world are working full time on applying Digital Signal Processing techniques to the DCT. Every cycle shaved off the time taken to perform the transform can be worth a small fortune, so there is good incentive for these research efforts.

#### Output of the DCT

Figure 11.9 shows a representative input block from a grey-scale image. As can be seen, the input consists of an 8-by-8 matrix of pixel values which are somewhat randomly spread around the 140 to 175 range. These integer values are fed to the DCT algorithm, creating the output matrix shown below it.

Figure 11.9  The DCT on a Block of Pixels from CHEETAH.GS

The output matrix shows the spectral compression characteristic the DCT is supposed to create. The “DC coefficient” is at position 0,0 in the upper left-hand corner of the matrix. This value represents an average of the overall magnitude of the input matrix, since it represents the DC component in both the X and the Y axis. Note that the DC coefficient is almost an order of magnitude greater than any of the other values in the DCT matrix. In addition, there is a general trend in the DCT matrix. As the elements move farther and farther from the DC coefficient, they tend to become lower and lower in magnitude.

This means that by performing the DCT on the input data, we have concentrated the representation of the image in the upper left coefficients of the output matrix, with the lower right coefficients of the DCT matrix containing less useful information. The next section discusses how this can help compress data.