High Performance NT4 Optimization and Tuning:Clustering Technologies

Table of Contents

Digital Equipment Corporation’s product is called Clusters For Windows NT version 1.1. Digital Equipment Corporation has quite a bit of experience in clustering technologies. My first experience with clustering technologies occurred in the early 1980s using Vax computers. By connecting multiple Vax computers into a single logical entity, Digital was not only able to improve the performance of its systems, but it was also able to provide redundancy for the mission critical jobs. The jobs—at the time—were based on large relational databases. The database software working in conjunction with the Vax operating system and clustering implementation provided an automatic rollover to a secondary Vax computer if the first encountered any failure. Digital has now brought its knowledge and experience in clustering technologies to the Windows NT market. You can find out more about Digital Equipment Corporation’s clustering solutions for Windows NT Server at www.digital.com.

Qualix Group has a product called Octopus HA+, and this product differs in its implementation from both Microsoft’s and Digital Equipment Corporation’s version in that the cluster does not have to physically reside in the same area. You can find out more about the Qualix Group’s clustering solutions at www.octopustech.com. You can even download a trial version of its software to determine if its product can fulfill your requirements.

The solutions offered by the Qualix Group and Digital Equipment Corporation operate on Windows NT Server 3.51 and Windows NT Server 4. The Microsoft solution is geared toward Windows NT Server 4 only. But, enough about the products; let’s look at how the technology actually works and what it can do for you.

Understanding Clustering Technology

The basic idea behind clustering technology is to provide full redundancy of a network server. For a redundant disk system, you have a master disk that becomes the primary disk used by the system and a slave disk that maintains an active copy of the master disk. If the master disk fails, the slave disk is used for all future access. These two disks together are called a mirror set (or a duplexed mirror set if you use two disk controllers). Clustering technology operates on similar principles. Clustering incorporates a master server and a slave server. If one server fails, the other server will be used for further access. These servers, when combined into a single unit, referred to as a node, are shown in Figure 10.1. Some implementations also refer to the combined nodes as a virtual server. The solutions we will be discussing are based on the node concept, with a slight variation for the Qualix Group’s clustering solution.

If you look closely at Figure 10.1, you can see that both servers are connected to each other, as well as individually connected to the network. The connection from Server A to Server B is called the node connection and is used to replicate information between the two servers. The node connection is often a high-speed connection (usually 100MB/sec.), but the actual speed depends on the clustering software. The Microsoft and Digital Equipment Corporation clustering implementations are designed to provide server redundancy for a local area network (LAN) server. The Qualix Group clustering implementation, however, is also designed to operate over a wide area network (WAN). While I expect you could use the Digital Equipment Corporation clustering solution this way, I do not expect it to perform as well as it could, because it is not designed to operate using a slower connection.

Figure 10.1 An example of two servers combined into a single node.

The node connection is used to transmit data and system-related messages without impacting the performance of your network clients. Your clients will access the server using the regular connection from a hub on the network that is connected to the network adapter on your server. If the master server fails, your network clients will utilize the slave server for all future access. The method of switching from a failed server to the operational server varies from implementation. The clustering implementation from Digital Equipment Corporation will actually assume the IP address from the master server for the slave server if the master fails. Most clustering implementations use a similar method to access the redundant server, as well.

Of course, you might be wondering what will happen if a single service (such as SQL Server) fails on the master server rather than the entire server. In this case, the slave server can either assume the place of the master server in its entirety or just provide the missing functionality of the service that failed. What happens in the case of a single failed service really depends on the clustering implementation and how you configure it. This is one point where research on your clustering implementation pays off in big benefits. Decide what you want beforehand, and then see which solution fits your requirements. Let me illustrate this point a bit more by discussing how your choice of a clustering system can impact your ability to mirror an entire server (redundancy), a portion of a server (fault tolerance), or improve overall system performance.

Table of Contents