Previous Table of Contents Next


Chapter 12
An Archiving Package

Programmers and users are perhaps most frequently exposed directly to data compression through the use of an archiving program. In the MS-DOS world, the use of archiving packages is ubiquitous, with the distribution of some packages fast approaching the saturation point. Programs such as PKZIP and ARJ that are distributed through non-commercial channels tend to blanket the world of “power users,” with new releases getting world-wide distribution in a matter of only days.

Because data compression is a competitive field, these programs tend to have very good performance characteristics, having both high throughput and tight compression ratios. They tend to do their jobs well.

But for the programmer, these data-compression programs are solely lacking in one respect: their handling of source code. While it’s nice to be able to invoke PAK or ARC from the MS-DOS command line, that doesn’t help the programmer who wants to compress all of the on-line help screens in his new spreadsheet program. It would be somewhat impractical for his program to have to spawn a copy of PKUNZIP every time a new help screen needed to be accessed.

This chapter presents a solution to such dilemmas by showing you how to create a simple, stripped-down version of an archiving program. While space limitations in the book prevent this program from being a match for commercial programs, a good programmer armed with the techniques found in this book should be able to enhance this program to make it as useful as commercial equivalents.

CAR and CARMAN

This chapter deals with two topics: Compressed Archive files and the program used to maintain them. Compressed archive files conventionally have a file extension of “.CAR,” and will be referred to as CAR files. The CAR file Manager will be named CARMAN.

CARMAN is a stand alone program designed to manipulate CAR files. It has a fairly simple set of commands, and runs using command-line mode. CARMAN’s real strength lies in either its extension with more powerful compression techniques or more detailed file data, or the inclusion of portions of its code into other programs.

The CARMAN Command Set

Running CARMAN with no arguments gives a brief help screen showing the usage of the program, as shown in Figure 12.1.


Figure 12.1  The CARMAN Help Screen

Every CARMAN operation has two basic requirements. First, it must have a single letter command and, second, it must have the name of a CAR file. A brief synopsis of the commands follows.

Add files: This command is used to add new files to an archive, which may or may not already exist. Wild cards on the command line will be expanded under MS-DOS. Full path names can be used to specify input files, but CAR will strip the path components before storing the files. If the CAR file already exists, and a file to be added already exists in the archive, the new version will replace the old.
Xtract files: This command extracts files from the archive and stores them in the current directory. If no file names are listed on the command line all files are extracted from the archive.
Replace files: This command attempts to replace all of the named archive files with a new version from the current directory. If a specified file exists in the archive but not in the current directory, a warning message is printed.
Delete files: The named files are deleted from the CAR file.
Print files: The specified files are copied to stdout. If no files are named, all files will be extracted.
Test files: The specified files are tested to be sure they can be properly extracted, and that the resulting CRC value will be correct.
List files: The statistics for the specified files are listed on stdout. If no file names are specified, all files are listed. A typical listing is shown next.


Figure 12.2  CARMAN List Command Output

As can be seen from this listing, the compression method employed in CARMAN is LZSS, with the compression code being nearly identical to that shown in Chapter 8. Files that could not be compressed to less than their original size will instead be stored in uncompressed format.

While LZSS does not offer the tightest compression possible, it does provide adequate performance. In addition, it has one unique advantage: its expansion speed will meet or exceed that of nearly any compression program available. So applications that need to decompress frequently may find LZSS to be the algorithm of choice.

The CAR File

The structure of a CAR file is very simple: it simply consists of a sequential list of file header blocks followed by file data. This sequence repeats indefinitely until a special header with a null file name is encountered. An illustration of this structure is shown in Figure 12.3.


Figure 12.3  The structure of a CAR file.

A sequential structure like this has both advantages and disadvantages. The sequential nature of the data means that both searches through and updates of the archive are not done using a random access method. Instead, linear searches and copies are used. Even worse, any time files in the archive are modified, it means the entire archive has to be copied from the original to a new version of the file. These disadvantages are outweighed by the simplicity this technique offers. Good reliable code can easily be written to support this storage method. In fact, most popular archiving programs use a nearly identical format.

The Header

In the CAR format, the header for each file contains everything we know about the file. Thus, selecting what goes in the header and what doesn’t is fairly important. CARMAN uses a fairly stripped down set of information in the header file, with a C structure as follows:

   typedef struct header {
    char file_name[ FILENAME_MAX ];
    char compression_method;
    unsigned long original_size;
    unsigned long compressed_size;
    unsigned long original_crc;
    unsigned long header_crc;
   } HEADER;

Most of the information in the header is self explanatory, particularly in terms of how it is used a C program. The place where the header information gets a little confusing is in the process of storing or reading it to/from a CAR file.


Previous Table of Contents Next