The Data Compression Book-:An Archiving Package

The Header CRC

One of the header elements written out by WriteFileHeader() is called “header_crc.” The header CRC is a 32-bit number generated using the data in the header structure, and used as a checksum. The CRC is generated using the CCITT-32 formula, which is the same formula used by many other archiving programs, such as PKZIP and ARJ. It provides us with a reasonably high probability of detecting errors in the header.

The reason for creating a CRC checksum for the header data is to provide an additional check for validity of a CAR file. If for some reason one of the data elements in the header file was inadvertently modified, it could lead to to disastrous results either during decompression, or later when attempting to use erroneous file data.

int ReadFileHeader()
{

 unsigned char header_data[ 17 ];
 unsigned long header_crc;
 int i;
 int c;

 for ( i = 0 ; ; ) {
  c = getc( InputCarFile );
  Header.file_name[ i ] = (char) c;
  if ( c == '\0' )
   break;
  if ( ++i == FILENAME_MAX )
   FatalError( "File name exceeded maximum in header" );
 }
 if ( i == 0 )
  return( 0 );
  header_crc= CalculateBlockCRC32( i + 1, CRC_MASK, Header.file_name );
  fread( header_data, 1, 17, InputCarFile );

  Header.compression_method= (char)
               UnpackUnsignedData( 1, header_data + 0);
  Header.original_size = UnpackUnsignedData(4, header_data + 1);
  Header.compressed_size = UnpackUnsignedData(4, header_data + 5);
  Header.original_crc = UnpackUnsignedData(4, header_data + 9);
  Header.header_crc = UnpackUnsignedData(4, header_data + 13);
  header_crc = CalculatedBlockCRC32( 13, header_crc, header_data );
  header_crc ^=CRC_MASK;
  if ( Header.header_crc!= header_crc )
    FatalError( "Header checksum error for file %s", Header.file_name );
  return( 1 );
}

Reading the file header is essentially the reverse procedure of writing it out—with a couple of twists. During the process of reading in the file name, we need to check for a couple of different possibilities. First, if this is the last header in a CAR file, it will have a file name length of 0. If this is the case, we immediately return with a failure indication, so the calling routine will know that we have reached the end of the input CAR file.

A second possibility is that the file name may exceed the storage allocated for in the header structure. In that case, a fatal error exit is taken.

After all of the header data has been read in, we perform one last validity check by comparing the calculated CRC for the header file with the CRC that was stored in the CAR file. In case of a mismatch, we once again take the fatal error exit.

Command-Line Processing

Once we have the ability to read in a header from a CAR file, we have the capability to list the archive. A simple loop like this would be enough to do it:

  while ( ReadFileHeader() != 0 ) {
   ListCarFileEntry();
   fseek( input, header.compressed_size, SEEK_CUR );
  }

All that is needed to skip over all of the compressed data for a given file is the fseek() statement, since we know the size of the compressed data. This is the mechanism used to work our way through the CAR file when performing any type of processing. We start with the very first file, and work our way from header to header, processing each file as needed. At no time does CARMAN ever back up through an input archive, or try to seek ahead past the next file.

Now that we have the ability to start doing something with the CAR file, it is time to start putting the other pieces of the program together. The next logical step is to start adding the ability to handle the command line.

There are three components to the CARMAN command line. First, the command is one of seven single letters discussed previously. Second is the name of the CAR file. Finally comes the optional list of file names. An initial call to ParseArguments() checks for the validity of the first command, and performs some checking on the next two.

int ParseArguments( argc, argv )
int argc;
char *argv[];
{
  int command;

  if ( argc < 3 || strlen( argv[ 1 ] ) > 1 )
   UsageExit();
  switch( command = toupper( argv[ 1 ][ 0 ] ) ) {
  case 'X':
   fprintf( stderr, "Extracting files\n");
   break;
  case 'R' :
   fprintf( stderr, "Replacing files\n" );
   break;
  case 'P' :
   fprintf( stderr, "Print files to stdout\n" );
   break;
  case 'T' :
   fprint( stderr, "Testing integrity of files\n" );
   break;
  case 'L' :
   fprintf( stderr, "Listing archive contents\n" );
   break;
  case 'A' :
   if ( argc <= 3)
    UsageExit();
   fprintf( stderr, "Adding/replacing files to archive\n" );
   break;
  case 'D' :
   if ( argc <= 3 )
    UsageExit();
   fprintf( stderr, "Deleting files from archive\n" );
   break;
  default :
   UsageExit();
 };
 return( command );
}

The first step in parsing the command line is to make sure that are at least three arguments on the command line: the command name (CARMAN), a single letter command, and a CAR file name. The next step is to check the command letter for validity, to be sure it is one of the legally defined CARMAN commands. As the command letter is determined, a short message is printed to indicate that CARMAN has acknowledged the command. Finally, for two particular cases, CARMAN insists that specific file names be included on the command line. For most of the CARMAN commands, specifying no file names on the command line is defined as the equivalent of using the wildcard argument “*”, (or “*.*”, the MS-DOS equivalent). That means that “CARMAN I backup.car” will list all the files in the backup.car archive.

For the Add and Delete commands, this default mode of operation is probably a little too dangerous, so it results in an error message. If the user wants to add every file in the current directory to a CAR file, it will be necessary to specify “*” or “*.*” on the command line, which should not be too much of an inconvenience.

Table of Contents