High Performance NT4 Optimization and Tuning:Overview Of Windows NT Architecture

Fault Tolerant Capabilities

The primary purpose of a server is to provide the ability to share resources. This can include CPU, file, and print resources. But what happens if you have a hardware failure? In many cases, this means your entire server and all its resources go down. And, if you can’t get your server back online quickly enough, your job may go down with it. This is where the fault tolerant capabilities come into play to protect both your data and your job.

In the previous section, we noted how an NTFS partition can provide data integrity at the file system level, but this may not be enough for mission-critical data. And the way I define mission-critical data is any data that is required to continue your business practice. This could be an SQL Server database, the source code to build the applications you sell, or just everyday applications, such as Microsoft Word for Windows, which everyone else in the office uses to perform their day-to-day jobs. Windows NT Server provides three options that can be used either separately or in conjunction with each other to safeguard your data. These options are:

• Disk Mirroring—This method works at the partition level to make a duplicate copy of your data. For every write to the primary partition, a second write is made to the secondary partition. If the fault tolerant driver detects a device failure while accessing the primary partition, it can automatically switch to the secondary partition and continue providing access to your data.

• Disk Duplexing—This method works at the partition level, as well, but it also includes the ability to detect disk controller errors. It does this by using two separate disk controllers with separate disk subsystems. Should you have a failure to access data on the primary partition or a hardware failure of the primary disk controller, then access to your data can be maintained by using the redundant copy on the secondary controller and disk subsystem.

• Disk Striping With Parity—This option works by combining equally sized disk partitions on separate physical drives—a minimum of 3 through a maximum of 32 disk drives—to create one logical partition. Data is written to the disk in discrete blocks. When data is written to the disk, an error correction code (ECC) block is written, as well. When each block is combined, the whole is referred to as a stripe. In the case of a single disk failure, the data blocks can be combined with the ECC block to rebuild the missing data stripe.

TIP: The best types of disk subsystems to use are SCSI based, because these types of subsystems support several important features. First, most SCSI controllers are bus masters (which means they have their own dedicated controller to transfer data to and from the disk and system memory). Secondly, most SCSI drives support command queuing, which is used to issue several commands to multiple drives. Finally, bus detachment allows one command to be processed by a peripheral while another peripheral is possibly transferring data.
Most SCSI controllers also include a dedicated CPU, so your system CPU does not have to watch over each byte of data transferred from the SCSI controller (as most IDE disk controllers do). This way, your CPU can continue processing data and enhancing system throughput. One other important feature is that SCSI drives have spare sectors that can be mapped to a failing sector (IDE drives have a similar feature, but do not support a specific command to replace a failing sector), and the NTFS file system driver can take advantage of this.

The question is—When do you use each of these techniques? Each technique provides increased data integrity, but at a cost. For instance, disk mirroring can be used to split a single disk into two partitions. One partition could be used to contain the primary data, while the second partition would contain a copy (i.e., mirror) of the data in the primary partition. However, in order to create this copy, your disk controller must make two physical writes—one to each disk partition. And if the entire disk becomes defective, all your data is lost. To offset this problem, you might use two separate disk drives, but if the disk controller becomes defective, you again lose access to your data. Table 1.1 summarizes the abilities of the fault tolerant drivers.

**Table 1.1** Summary of the fault tolerant driver capabilities.
Fault Tolerant Driver	Pro	Con

Disk Mirroring	May be used on a single physical disk with at least two equally sized partitions. Can be used to mirror separate physical disks on a single disk controller, as well.	Does not protect against disk failure if a single disk is used. Does not protect against controller errors. Limits disk storage to half of system capacity. Performance hit for disk writes.
Disk Duplexing	Used to mirror separate physical disks on separate physical disk controllers. Protects against disk failures and disk controller errors. Performance increase on disk reads since the drive can utilize both physical disks simultaneously (on disk subsystems—such as SCSI— that support simultaneous access).	Requires twice the hardware (two disk drives, two disk controllers) to provide half the storage capacity.
Stripe Set With Parity	Provides fault tolerance for single drive failure. Increase disk read performance.	Does not provide protection from disk controller errors. Does not protect against multiple disk failures. Requires percentage of disk partition to contain ECC stripe. Slight CPU performance degradation for disk writes to calculate ECC stripe.

Table of Contents