The Data Compression Book-:A Significant Improvement: Adaptive Huffman Coding

Table of Contents

The code needed to add the node is listed below. The first step is to find the lowest-weight node, which in the example was A. The new_node variable is the node which A will occupy, and the 0_weight_node is where E will go. Since these are two new nodes, they are set to be the next_free_node and next_free_node+1. After this is done, next_free_node is incremented by two so it will be ready for the next operation.

lightest_node = tree->next_free_node - 1;
new_node = tree->next_free_node;
zero_weight_node = tree->next_free_node + 1;
tree->next_free_node += 2;

tree->nodes[ new_node ] = tree->nodes[ lightest_node ];
tree->nodes[ new_node ].parent = lightest_node;
tree->leaf[ tree->nodes[ new_node ].child] = new_node;
tree->nodes[ lightest_node ].child = new_node;
tree->nodes[ lightest_node ].child_is_leaf = FALSE;

tree->nodes[ zero_weight_node ].child          = c;
tree->nodes[ zero_weight_node ].child_is_leaf  = TRUE;
tree->nodes[ zero_weight_node ].weight         = 0;
tree->nodes[ zero_weight_node ].parent         = lightest_node;
tree->leaf[ c ] = zero_weight_node;

After the two new nodes are created, a few more bookkeeping details are needed to link the two new children to their parent, to make sure their weights are correct, and to make sure the leaf array points to the correct nodes. After that, the routine is done, and the tree is ready to be updated.

Updating the Tree

The most complicated part of the Adaptive Huffman program is in the routines called to update the tree. Updating the model is simply a matter of incrementing a symbol weight by one and taking care of all the side effects of that action. Taking care of the side effects, however, involves some strenuous effort.

if ( tree->nodes[ ROOT_NODE].weight == MAX_WEIGHT )
   RebuildTree( tree );
current_node = tree->leaf[ c ];
while ( current_node != -1 ) {
   tree->nodes[ current_node ].weight++;
   for ( new_node = current_node ; new_node > ROOT_NODE ;
          new_node—– )
       if ( tree->nodes[ new_node - 1 ].weight >=
             tree->nodes[ current_node ].weight )
            break;
   if ( current_node != new_node ) {
      swap_nodes( tree, current_node, new_node );
      current_node = new_node;
   }
   current_node = tree->nodes[ current_node ].parent;
}

This code performs all the work needed to update the tree. The first thing it checks for is to see if the tree has reached its maximum weight. If it has, the routine invokes the RebuildTree routine to scale down all the counts.

After getting past the tree-rebuilding step, the loop that increments the node weights is entered. The first node to be incremented is the leaf node associated with the symbol. The loop increments the weight of that symbol, then moves up to the parent to increment that node. This process continues till the root is reached, at which point the update is done. The single symbol added to statistical base forming the tree has been accounted for.

There is one additional step inside the loop, however, that takes place after every node has its weight incremented. The loop immediately following the increment operation works its way back through the list of nodes to see if the increased weight of the current node means it has to move up in the list. After the loop exits, new_node has the proper new location for the current node in the node list. If new_node is the same as current_node, the incremented node is fine where it is, and we can move on to the parent node. But if new_node and current_node aren’t the same, they have to be swapped.

The process of swapping nodes involves lifting two entire subtrees out of their present positions in the list and exchanging them. The use of a tree data structure makes this easier than it may first appear. The swapping process is straightforward, complicated only by the fact that each node being swapped has links to both its parent and child, and the parent and child each have links to the node. This takes no great conceptual leaps. The swapping code is illustrated:

struct node temp;

     if ( tree->nodes[ i ].child_is_leaf )
        tree->leaf[ tree->nodes [ i ].child ] = j;
     else {
        tree->nodes[ tree->nodes [ i ].child ].parent = j;
        tree->nodes[ tree->nodes [ i ].child + 1 ].parent = j;
     }
     if ( tree->nodes[ j ].child_is_leaf )
        tree->leaf[ tree->nodes [ j ].child ] = i;
     else {
        tree->nodes[ tree->nodes[ j ].child ].parent = i;
        tree->nodes[ tree->nodes[ j ].child + 1 ].parent = i;
     }
     temp = tree->nodes[ i ];
     tree->nodes[ i ] = tree->nodes[ j ];
     tree->nodes[ i ].parent = temp.parent;
     temp.parent = tree->nodes[ j ].parent;
     tree->nodes[ j ] = temp;

An update can also force the rebuilding of the tree. Rebuilding takes a fair amount of work, unfortunately, since it amounts to building a new Huffman tree.

The rebuilding process proceeds in three discrete steps. The first step is to collect all the leaf nodes, throw away all the internal nodes, and divide the leaf-node weights by 2. The node list is compacted so that the new leaf nodes are all at the start of the list.

j = tree->next_free_node - 1;
for ( i = j ; i >= ROOT_NODE ; i–– ) {
    if ( tree->nodes[ i ].child_is_leaf ) {
      tree->nodes[ j ] = tree->nodes[ i ];
      tree->nodes[ j ].weight = (tree->nodes[ j ].weight + 1 ) / 2;
      j––;
    }
}

The code to do this is shown above. Note that in this implementation, none of the nodes scale down to zero. This is accomplished by adding 1 to the node before dividing it by two. It may be desirable to throw away infrequently seen symbols by reducing their counts to zero and deleting them from the list, but we don’t do that here.

What we end up with in the above code is a list of leaf nodes that are at the start of the list, terminating at the next_free_node index. The internal nodes which start at 0 and end at the current value of j, will now be rebuilt in the next step.

In Chapter 3, we built a Huffman tree by repeatedly scanning the node list and finding the two nodes with the lowest weight. Those two nodes would be combined to form a new internal node. The tree-rebuilding phase here takes a different approach based on the sibling property.

Table of Contents