Chapter 36

Adding New MIME Types to HotJava

by David Baker


CONTENTS


Files on the Internet come in various formats, each of which is used to convey specific information. Thereare different image file formats, sound clips, video information, and HTML pages. When these documents are transmitted on the Web with the HTTP protocol, a particular MIME content type is used in order to identify how that file should be interpreted.

New document formats are constantly being introduced to the World Wide Web. However, before you can use these new formats, your browser or other applications must understand how to interpret them. Extensibility is part of the nature of the Java execution environment and the HotJava browser. To manage new MIME types, Java and HotJava can be extended through content handlers.

Content handlers are Java's way of dealing with various data formats, such as text files, images, and sounds. By creating new content handlers, additional data types can be processed and rendered. They empower you to add new functionality to your Web browser and quickly develop applications to utilize new file formats.

Writing Content Handlers

Documents on the Web are transmitted with a MIME content type identifier indicating to the receiving agent how the data is formatted. The client must understand how to decode and render that data. A content handler is a Java class that is called by either an URL or URLConnection object. The content handler obtains an input stream from the calling object and then receives data from that stream. It then processes the data and returns an object that contains that data.

Java and HotJava provide a core set of content handlers to manage commonly used types. You can write your own handlers to deal with new content types. This empowers you to extend your Java applications or your HotJava browser to understand new document formats.

The process of creating new content handlers is quite similar to creating protocol handlers. If you have read the previous chapter, some of these instructions will seem quite familiar. As an example, this chapter demonstrates a content handler that processes plain text documents, overriding the existing handling.

Note
The example in this chapter provides the somewhat frivolous task of making incoming text files appear as though spoken by a famous bald cartoon character inclined towards hunting rabbits.

The process of creating protocol handlers is very similar to that of content handlers, and is described in the section "Writing a Procotol Handler," in Chapter 35, "Adding Additional Protocols to HotJava."

Step One: Decide upon a Package Name

Like protocol handlers, content handlers must reside within a specific package. This package must end with content.type where type is the MIME type of the data. For instance, the type of text/plain documents is text, while for image/gif, it is image. As with the previous chapter, I append ORG.netspace.dwb to indicate the distribution source and author to obtain the following:

ORG.netspace.dwb.content.text

See "Step One: Decide on a Package Name" in Chapter 35, "Adding Additional Protocols to HotJava," to see the corresponding process for protocol handlers.

Step Two: Create the Directories

Caution
Java is case-sensitive. Even if your system doesn't treat upper- and lowercase characters within directory names differently, use the case of the letters as shown within these instructions.

The content handler class must be placed into a directory that corresponds to the package name. Such directories usually reside within a directory called classes in your home directory. For Windows NT and Windows 95 users, the following sequence of commands accomplishes this at the command prompt:

Note
If you have previously installed other content handlers, protocol handlers, or personal Java classes, you may have already created some of the following directories.

%HOMEDRIVE%
cd %HOMEPATH%
mkdir classes
mkdir classes\ORG
mkdir classes\ORG\netspace
mkdir classes\ORG\netspace\dwb
mkdir classes\ORG\netspace\dwb\content
mkdir classes\ORG\netspace\dwb\content\text

For UNIX users, the analogous commands are:

cd~
mkdir classes
mkdir classes/ORG
mkdir classes/ORG/netspace
mkdir classes/ORG/netspace/dwb
mkdir classes/ORG/netspace/dwb/content
mkdir classes/ORG/netspace/dwb/content/text

Step Three: Set Your CLASSPATH

The CLASSPATH environment variable tells the Java compiler and interpreter where to find Java classes, enabling the dynamic linking feature of the Java execution environment. When installing the JDK, HotJava, or a Java-aware browser, you might have set the CLASSPATH environment variable. If so, it is critical that you avoid overwriting that data. Follow these steps:

  1. Find out what your CLASSPATH current setting is. Under Windows NT/95, just type the following command from the command prompt:
    SET
    
  2. Look for the CLASSPATH value. Under UNIX systems, you can display the CLASSPATH value with this command:
    ECHO $CLASSPATH
  3. Reset your CLASSPATH, including the previous data, if any. Under Windows 95, if your CLASSPATH was
    .;C:\JAVA\LIB\CLASSES.ZIP
  4. you can add the following line to your AUTOEXEC.BAT and reboot
    SET CLASSPATH=.;%HOMEDRIVE%%HOMEPATH%\CLASSES ;C:\JAVA\LIB\CLASSES.ZIP
  5. Under Windows NT, presuming that the CLASSPATH value was the same as under the Windows 95 example, you would use the System Control Panel to add a CLASSPATH environment variable with the value:
    .;%HOMEDRIVE%%HOMEPATH%C:\JAVA\LIB\CLASSES.ZIP
  6. Under UNIX, assume that your old CLASSPATH was .:/usr/java/lib. If you are using the C shell, place the following into your CSHRC file:
    setenv CLASSPATH .:/home/myid/classes:/usr/java/lib
  7. If you are on a UNIX system using the Korn or a POSIX-compliant shell, add this line to whatever file your ENV environment variable points. If ENV is unset, then you could add the following line to your ~/.PROFILE file:
    CLASSPATH=.:/home/myid/classes:/usr/java/lib

Note
The CLASSPATH can indicate as many Java libraries as you have installed. The base directory of each library should be contained within the CLASSPATH, each separated by a ':' under UNIX systems or a ';' under Windows 95 and Windows NT systems. The part of the CLASSPATH which is only a period indicates that the current working directory should be searched for appropriate class files, making developing new Java classes more convenient.

Step Four: Write the Content Handler

The content handler must be a class that extends java.net.ContentHandler. It must also have the same name as the subtype of the MIME content-type it processes. That is, for image/gif, the class should be called gif, while my example that overrides the normal plain/text handler should be named text.

The class must have a getContent() method that takes a URLConnection as an argument and returns a generic Object. For now, HotJava supports the following returned Object instances:

The code for the example used in this chapter is shown in listing 36.1. This content handler has only one method-getContent(). It obtains an InputStream from the URLConnection object and then enters an infinite loop. Within the loop, it reads the incoming characters and makes a number of substitutions, altering the text to appear as though spoken by our cartoon friend.

The filtered characters are placed into a StringBuffer() object. Once the last character is read, the read() method returns -1, and the content handler breaks from the loop. It closes the InputStream and then returns a String object.

Note
If there is an exception, the method returns a String providing information about the problem.


Listing 36.1  plain.java
// This is the package identified for this content handler.
package ORG.netspace.dwb.content.text;

import java.lang.*;  // Import the package names used.
import java.net.*;
import java.io.*;

/**
 * This is a text/plain content handler which "fuddifies"
 * the text it receives.
 * @author David W. Baker
 * @version 1.1
 * @see sun.net.ContentHandler
 */
public class plain extends ContentHandler {
     // Stream to receive text/plain file from.
     private InputStream input;
     // Some standard replacement strings.
     private static final String QUIET = "(be vewy quiet, ";
     private static final String HEH = ", eheheheh.";
     private static final String SCREWY = "? Awe you scwewy?";
     private static final String RASCAL = ", you wascal!";
     private static final String MISCREANT = 
                                                                 ", you miscweant:";
     /**
     * This method returns an Object containing the
     * processed content from the given URLConnection.
     * @param contentConn Connection used to obtain the content.
     * @return The content.
     * @see sun.net.ContentHandler#getContent
     */
     public Object getContent(URLConnection contentConn) {
          // Create a buffer to store the filtered data.
          StringBuffer fuddBuff = new StringBuffer();
          int intChar;     // A int representation of a char.
          char nextChar; // A char.

          try {
               // Get the input.
               input  = contentConn.getInputStream();
               // Loop infinitely.
               filter: while(true) {
                    // Read in next character.
                    intChar = input.read();
                    // Make sure we aren't at the end.
                    if (intChar == -1) {
                         break filter;  // Break if end.
                    }
                    // Convert it to a char.
                    nextChar = (char)intChar;
                    // Substitute "(" for QUIET
                    if (nextChar == '(') fuddBuff.append(QUIET);
                    // Substitute "W" for "L"
                    else if (nextChar == 'L') fuddBuff.append('W');
                    // Substitute "w" for "l"
                    else if (nextChar == 'l') fuddBuff.append('w');
                    // Substitute "R" for "W"
                    else if (nextChar == 'R') fuddBuff.append('W');
                    // Substitute "r" for "w"
                    else if (nextChar == 'r') fuddBuff.append('w');
                    // For periods at the end of the file or periods
                    // followed by whitspace, substitute HEH.
                    else if (nextChar == '.') {
                         intChar = input.read();
                         if (intChar == -1) {
                              fuddBuff.append(HEH);
                              break filter;  // Break if end.
                         }
                         nextChar = (char)intChar;
                         if (nextChar == ' ') 
                              fuddBuff.append(HEH + " ");
                         else fuddBuff.append("." + nextChar);
                    }
                    // For ? the end of the file or ?
                    // followed by whitspace, substitute SCREWY.
                    else if (nextChar == '?') {
                         intChar = input.read();
                         if (intChar == -1) {
                              fuddBuff.append(SCREWY);
                              break filter;  // Break if end.
                         }
                         nextChar = (char)intChar;
                         if (nextChar == ' ') 
                              fuddBuff.append(SCREWY + " ");
                         else fuddBuff.append("?" + nextChar);
                    }
                    // For ! at the end of the file or !
                    // followed by whitspace, substitute RASCAL.
                    else if (nextChar == '!') {
                         intChar = input.read();
                         if (intChar == -1) {
                              fuddBuff.append(RASCAL);
                              break filter;  // Break if end.
                         }
                         nextChar = (char)intChar;
                         if (nextChar == ' ') 
                              fuddBuff.append(RASCAL + " ");
                         else fuddBuff.append("!" + nextChar);
                    }
                    // For : at the end of the file or :
                    // followed by whitspace, substitute MISCREANT.
                    else if (nextChar == ':') {
                         intChar = input.read();
                         if (intChar == -1) {
                              fuddBuff.append(MISCREANT);
                              break filter;  // Break if end.
                         }
                         nextChar = (char)intChar;
                         if (nextChar == ' ') 
                              fuddBuff.append(MISCREANT + " ");
                         else fuddBuff.append(":" + nextChar);
                    }
                    else fuddBuff.append(nextChar);
               }
               input.close();
          } catch(IOException excpt) {
               return "Unable to load document: " 
                                   + contentConn.getURL();
          }
          return fuddBuff.toString();
     }
}

Step Five: Compile the Source

Use javac to compile the content handler, and leave the compiled class within the directory created in Step Two (i.e., "classes\ORG\netspace\dwb\content\text" for NT/95 or "classes/ORG/netspace/dwb/content/text" for UNIX). Thus, if you created the plain.java program within that directory, you would merely issue change to that directory and then issue this command:

javac plain.java

Be sure to leave the .class file within the bottom "text" directory.

Tip
If you choose to create the plain.java file somewhere else, you use the -d option to the Java compiler in order to automatically place the .class file into the proper place. For example:
javac -d classes/ORG/netspace/dwb/content/text plain.java

Using Content Handlers with HotJava

As with protocol handlers, HotJava's goal is to eventually support dynamically downloaded content handlers. For now, only manually installed handlers are supported, created as described in the earlier section "Writing Content Handlers." In addition, at the time of this writing, HotJava supports only content handlers that extend existing MIME types. That is, the example can override the handling of text/plain, but HotJava does not support one that handles a new content-type like text/fuddify.

HotJava also needs to deal with the conflict between MIME type names and Java class names. MIME content-types can, and under certain circumstances should, contain hyphens. However, hyphens are not allowed in Java class identifiers. Because the class of the content handler must be the same as the MIME content subtype, this presents an obvious problem.

The following steps illustrate how to use the new content handler, as created in the previous section, with the HotJava browser.

Note
JavaSoft makes the HotJava browser and instructions for its installation available at <URL:http://www.javasoft.com/java.sun.com/HotJava/
CurrentRelease/installation.html
>.

Step One: Disable Special MIME Handling

On certain systems, a file called mailcap may have been created to indicate that a special helper application should be used for an incoming MIME type, regardless of which browser is loading the data. If such a file exists, ensure that any line indicating special processing is removed for the content-type you want your handler to process. Thus, remove any entry for text/plain for this example.

Step Two: Update the PROPERTIES File

HotJava stores per-user customizations in a file called PROPERTIES. This file is located within a directory named ".hotjava" that resides within your home directory. Edit this file to set the java.content.handler.pkgs property. You want to add everything up to the content token in the content handler's package. When HotJava is searching for a content handler appropriate to a specific MIME type, it will append the MIME type to this value and look for a Java package of that name; then, it looks for a Java class within that package that has the same name as the MIME subtype. If this property has not been set, add the following line to use the example handler:

java.content.handler.pkgs=ORG.netspace.dwb.content

If that property has already been set, append a pipe character (|) and ORG.netspace.dwb.content. For example:

java.content.handler.pkgs=COM.company.content|ORG.netspace.dwb.content

Note
When editing the HotJava properties file, be sure to use a text editor or, if you are using a word processor, save the file as text.

Step Three: Run HotJava

Execute HotJava and load up a text file to see the "fuddified" information. Figure 36.1 demonstrates this effect upon the HTML RFC. To view this page yourself, go under the File menu, select Open Page, and then enter the following:

ftp://ds.internic.net/rfc/rfc1866.txt

Figure 36.1 : When HotJava uses the Fuddify content handler, the HTML spec looks slightly more interesting.

Using Content Handlers with Your Own Applications

Content handlers can be used by your own applications, in addition to their usefulness with HotJava. Content handlers use a concept similar to protocol handlers for registering a new handler, that of a factory. The FetchFuddify application, shown in Listing 36.2, demonstrates this functionality.


Listing 36.2  FetchFuddify.java
import java.net.*;     // Import package names used.
import java.io.*;

/**
 * This is an application which utilizes the new
 * text/plain content handler which "fuddifies"
 * the text.
 * @author David W. Baker
 * @version 1.1
 */
public class FetchFuddify {
     /**
      * This method starts the application.
      * @param args The program arguments - should be URL.
      */
     public static void main (String args[]) {
          // Check the arguments.
          if (args.length != 1) {
               System.err.println("usage: " +
                    "java FetchFuddify <url of Fudd document>");
               System.exit(1);
          }
          // Create an instance of FetchFuddify to do its stuff.
          FetchFuddify app = new FetchFuddify(args[0]);
     }

     /**
      * This constructor does all of the work of obtaining
      * the data with the appropriate content handler and
      * sending it to standard output.
      * @param url The URL to obtain.
      */
     public FetchFuddify(String url) {
          URL fuddURL;                    // URL object to resource.
          URLConnection fuddConn; // Connection to resource.
          Object fuddObject;          // Object returned.

          // Register the content handler with our ow
          // factory.
          URLConnection.setContentHandlerFactory(
               new fuddifyCHFactory());
          try {
               // Create the URL object with the command line
               // argument used.
               fuddURL = new URL(url);
               // Open the connection.
               fuddConn = fuddURL.openConnection();
               // Get the content.
               fuddObject = fuddConn.getContent();
               // Convert the content to a String and print it.
               System.out.println(fuddObject.toString());
          } catch(MalformedURLException excpt) {
               System.err.println("Mailformed URL: " + excpt);
          } catch(IOException excpt) {
               System.err.println("Failed I/O: " + excpt);
          }
     }
}

/**
 * This class implements the ContentHandlerFactory
 * interface to register our own content handler.
 * @see java.net.ContentHandlerFactory
 */
class fuddifyCHFactory implements ContentHandlerFactory {
     /**
      * This method returns our own customer content
      * handler when given a "text/plain" content type.
      * @param contenttype MIME type - should be "text/plain".
      * @return The content handler to use.
      * @see java.net.ContentHandlerFactory#createContentHandler
      */
     public ContentHandler 
          createContentHandler(String contenttype) {
          // Ensure the content type is "text/plain".
          if (contenttype.equalsIgnoreCase("text/plain")) {
               // Create an instance of our content handler.
               return new ORG.netspace.dwb.content.text.plain();
          }
          // Otherwise, print an error message and return null.
          System.err.println("Unknown data type: " 
               + contenttype);
          return null;
     }
}

Start FetchFuddify

The main() method checks to see that the program was invoked with a single argument, which corresponds to the URL of a text file to filter. Then it creates a FetchFuddify object, passing it the String command line argument.

The constructor performs the essential task in using a new content handler: invoking the static method of the URLConnection class, setContentHandlerFactory(). Factories should be a familiar concept, this time allowing the URLConnection class to choose an appropriate content handler. The setContentHandlerFactory takes an object that implements the java.net.ContentHandlerFactory interface. This example's implementation, fuddifyCHFactory, is described next in "The ContentHandlerFactory Implementation."

The constructor then creates an URL object and opens a connection to the resource. It calls the getContent() method of the URLConnection class, which causes the code of the content handler to be invoked. getContent() returns an Object, which the constructor converts to a String with the toString() method and prints to standard output.

The ContentHandlerFactory Implementation

This interface enables you to register new content handlers with the URLConnection class. A class that implements this interface must have a createContentHandler() method. This method takes a String instance containing the value of the MIME content-type of the resource being accessed. This method returns a ContentHandler object.

The example first checks to see that the contenttype argument is text/plain. It then creates an instance of the content handler and returns it. If the method is called with a contenttype other than text/plain, it returns null.

Running the Application

First, make sure you've already installed the appropriate content handler, as described in "Writing Content Handlers." Compile the FetchFuddify application and then invoke it with the URL of a text file available somewhere on the Web. For instance, the following will "fuddify" a release notes document from JavaSoft:

java FetchFuddify http://chatsubo.javasoft.com/current/doc/rmi/release-notes.txt

Which will generate:

Wemote Method Invocation (be vewy quiet, WMI) notes fow wewease Awpha2.

- WMI is suppowted fow Java appwications and in the AppwetViewew.

- WMI wequiwes wocaw instawwation of the wmi package appwopwiate fow
  Sowawis ow Win95/NT.

- Any appwication that expowts wemote objects must be awwowed
  by the SecuwityManagew to use SewvewSockets to wisten fow and accept
  incoming socket connections, eheheheh.

- Appwets may not expowt wemote objects since theSecuwityManagew
  pwevents using SewvewSocket, eheheheh.  This wiww be suppowted in a futuwe
  update.

It is promised that with the 1.0 release version of HotJava, dynamically downloaded content handlers will be supported. Once realized, this will allow HotJava to be extended on demand with little effort from the enduser. When you encounter a new document type, HotJava will automatically download and install the new content handler necessary to render the data.