High Performance NT4 Optimization and Tuning:Clustering Technologies

Chapter 10
Clustering Technologies

• Understanding Clustering Technologies

• Choosing A Clustering Implementation

In Chapter 9, “Implementing Redundant Systems,” you learned how to use redundant disk systems and fault tolerant systems to ensure the accessibility of your file system resources. A mirror set and duplexed mirror set are both redundant systems, while a stripe set is a fault tolerant system. The differences between redundant and fault tolerant technologies are important to consider. A redundant system operates on a rollover to a shadow (or slave) resource if a failure is detected on the original (or master) resource. A fault tolerant system operates on the continuous provision of a service if a recoverable error (such as a single disk failure) is detected. Clustering technologies also operate using redundant systems, fault tolerance, or both, depending on the software you use to implement your clustering solution.

The Difference Between A Multiprocessor Server And A Cluster
Since Windows NT’s introduction into the OS market, it has supported multiple processors in a single computer. Initially, this support was provided as a performance enhancement and allowed Windows NT to be scaled up to handle bigger jobs. For the most part, multiprocessing servers were used as high-performance SQL Server platforms. But this has changed over the years to include additional platforms. Today’s multiprocessing computers use applications such as Microsoft Exchange to handle extraordinarily large numbers of mail clients. Multiprocessing computers also use SQL Server and System Management Server to provide automated onsite inventory or software installations. In other cases, multiprocessing servers support terminal mode clients (both those using console and graphical interfaces) with a modified version of Windows NT Server.

In all these cases, the multiple processors are used to provide more CPU horsepower to a server. To improve the I/O subsystem’s performance, most of these multiprocessor computers use mirror sets, duplexed mirror sets, or stripe sets with parity. Top-of-the-line multiprocessor computers use dedicated hardware RAID 0 (stripe set), RAID 1 (mirror set), or RAID 5 (stripe set with parity). Performance is the goal for a multiprocessor computer, so a hardware implementation of RAID is used most often because a hardware implementation almost always outperforms a software implementation.

In the past, these high-end multiprocessing computers were also exclusively used for mission critical jobs. A job where the company required that the service it provided would never fail. To prevent a system-wide failure, these multiprocessing servers utilized mirror sets and stripe sets with parity to provide data redundancy and fault tolerance for the I/O subsystem. They even utilized multiple processors to supply fault tolerance (most high-end implementations support processor fault tolerance at the hardware level) for the processor subsystem. Unfortunately, these setups just weren’t good enough for mission-critical jobs. After all, they did not provide protection if the entire server failed (unless you had a backup server with an identical copy of the data).

As an example, with an SQL Server database, the data could be replicated from one SQL Server database to another. This offered a form of server redundancy, because, if the primary server failed, you could continue to operate using the secondary server with the replicated database. But using a secondary server and replicated database required manual intervention. Clients had to manually connect to the new server and log on to the new SQL Server database to access their data. And if a failure occurred before the data was replicated, any work in progress was lost.

This is where clustering came into play in the marketplace. A cluster not only provides server redundancy and fault tolerance, depending on the clustering implementation, but it also supports an automatic switchover to the redundant server. The whole idea behind clustering is that it can be used to automatically switch from a failed server to a backup server. With clustering, clients might not even notice when a failure occurs because the data on the primary server is automatically replicated to the backup server as the events occur on the primary server.

Clustering technology can also take advantage of software-based mirror sets, duplex mirror sets, or stripe sets with parity to provide additional data redundancy or fault tolerance on a per-server basis. For improved performance, a server in the cluster may use hardware-based data redundancy or fault tolerant solutions. A cluster can even implement a virtual server and a load-balancing algorithm so that both servers can be accessed simultaneously. You can consider this type of clustering implementation as a super high-end multiprocessing computer with built-in fault tolerance capabilities.

Clustering technology is an outgrowth of multi-processing and redundant technology carried one step further. While a multi-processor computer is housed in a single box and is susceptible to a single point failure (if the entire server fails), a cluster is not. A cluster extends the concept of redundancy and fault tolerant features by using multiple servers. Should a single server fail, the other server will continue operating and the required services will continue to be available to your clients. This is the primary difference between a multi-processor computer and a cluster.

In this chapter, our focus will be on clustering technologies from Microsoft, Digital Equipment Corporation, and Qualix Group. These products take the concept of redundant systems a bit further than those discussed in the previous chapter. Rather than limiting redundancy to disk systems, these products apply redundancy to entire servers. Using these products, if a server fails, another server automatically takes over the current workload. Your users probably won’t even notice if a server fails.

The Microsoft product is currently in beta and is code named Wolfpack. The retail version of Wolfpack is expected to ship soon. You can find out additional availability information by perusing the Windows NT Server Web site at www.microsoft.com/ntserver. Wolfpack, while developed by Microsoft, incorporates algorithms and expertise provided by Microsoft’s core partners in the Wolfpack initiative. These core partners include Compaq Computer Corporation, Digital Equipment Corporation, Hewlett-Packard Company, IBM Corporation, Intel Corporation, NCR Corporation, and Tandem Computers, Inc. From this list, you can understand that clustering technology has a wide support base and involves some heavy hitters in the computer industry.

Table of Contents

Chapter 10Clustering Technologies

Chapter 10
Clustering Technologies