Previous Table of Contents Next


Chapter 10
Speech Compression

Manipulation of sound by computers is a relatively new development. It has been possible since the birth of digital computers, but only in the last five years or so has inexpensive hardware brought this to the average user’s desktop. Now the ability to play digitized sound is expected to be an integral part of the “multimedia revolution.”

The use of multimedia focuses the issue of data compression for most users. Computer graphics in particular quickly take up all available disk space. Digitized audio is far less voracious in its storage requirements, but even so it can quickly swallow up all free space on the average user’s hard disk.

Fortunately for computer users, the world of telephony has used digitized audio since the 1960s, and extensive research has been done on effective methods of encoding and compressing audio data. The world’s telecommunications companies were intensely aware of the cost of transmission bandwidth and made efforts to reduce expenses in this area. Computer users today benefit from much of this research.

This chapter looks first at some of the basic concepts involved in using digital audio, including the software and hardware in today’s generation of computers. Next, it looks at how well conventional lossless compression techniques work on digitized voice. Finally, it explores some lossy techniques.

Digital Audio Concepts

For modern computers to manipulate sound, they first have to convert it to a digital format. The sound samples can then be processed, transmitted, and converted back to analog format, where they can finally be received by the human ear.

Digitization of sound began in earnest in the early 1960s. Like much of our early computer technology, credit for development lies with AT&T, which at that time had a regulated monopoly on long-distance service in the United States. In 1962, AT&T established the first commercial digital telephone link, a T1 interoffice trunk in Chicago.

In the short space of thirty years, we have seen the long-distance network in the United States convert almost entirely from analog to digital transmission. Virtually all new switching equipment installed by telephone companies today is digital. But analog switching is still found in older installations and in the smaller PBX and key systems installed in businesses. Of course, the final subscriber loop between the telephone company and the end user is still persistently analog.

Digital audio is now coming of age in the highly visible consumer electronics arena as well. The digital compact disk has nearly completed its displacement of analog LP records. It remains to be seen whether digital audio tape will do the same thing to analog cassette tape, but it seems likely that some day most recorded music will be distributed in digital format.

Fundamentals

While this book cannot give a complete course in digital signal processing, it certainly has room to cover a few basic concepts involved in digital sound. Figure 10.1 shows a typical audio waveform as it might be displayed on an oscilloscope. The X axis in this diagram represents time. The Y axis represents a voltage measured at an input device, typically a microphone. The microphone attempts to faithfully reproduce changes in air pressure caused by sound waves traveling through it.

Some human ears can hear sounds at frequencies as high as 20,000Hz and nearly as low as DC. The dynamic range of our hearing is so wide that we have to employ a logarithmic scale of measurement, the decibel, to reasonably accommodate it. This presents a unique set of requirements for digitization.

A waveform like the shown in Figure 10.1 is typical of audio sample. It isn’t a nice, clean sine wave that has a regular period and can be described as a simple mathematical function. Instead, it is a combination of various frequencies at different amplitudes and phases. When combined, we see something that looks fairly irregular and not easy to characterize.


Figure 10.1  A typical audio waveform.

This particular “snapshot” shows about 5 milliseconds (ms) of output. Notice that the largest recognizable components of the waveform appear to have a period of roughly two milliseconds. This corresponds to a frequency of about 500Hz, a fairly characteristic frequency found in speech or music.

The first step in working with digital audio is “sampling.” Sampling consists of taking measurements of the input signal at regular times, converting them to an appropriate scale, and storing them. Figure 10.2 shows the same waveform sampled at an 8KHz rate. This means that 8,000 times per second a measurement is taken of the voltage level of the input signal. The measurement points are marked with an “x” on the waveform.


Figure 10.2  A typical audio waveform being sampled at 8KHz.

In most computer systems, this first step of digitization is done with an analog-to-digital converter (ADC). The ADC takes a given voltage and scales it to an appropriate digital measurement. An eight-bit ADC, for example, might have a “full scale” input voltage of 500 millivolts (mv)—it would output an eight-bit value of 255 if the input voltage were 500mv and zero if the input voltage were zero. A voltage between these values would be scaled to fit in the linear range of zero to 255.

Since audio signals are AC in natured, the ranges are usually adjusted so that a zero voltage signal falls in the middle of the range. For the previous example, the range would be adjusted to between -250mv and +250mv. Outputs from the eight-bit ADC would range from -128 to +127.

The stored sample points then represent a series of voltages that were measured at the input of the ADC. Figure 10.3 shows the representation of those voltages overlaid with the input AC signal. Note that since the sample points in this case are occurring many times more frequently than the period of the waveform, the digital samples themselves trace the analog signal very accurately.


Figure 10.3  Sample voltages overlaid with the input AC signal

Now that the sound has been digitized, it can be stored via computer using any number of technologies, ranging from fast storage, such as main processor RAM, to off-line slow storage on magnetic tape. The actual speed of the storage medium is relatively unimportant with digital sound, since the bandwidth needed to accurately store the sound is relatively slow compared to most digital media.


Previous Table of Contents Next