NT Disc System Reliability

By Dmitry Mikhailov

This article is sequential of "NT File System". In this review we will consider some issues, which were just slightly described or absent at all in the previous article. We shall notice, that NT disc system is too difficult and complicated, so we won't be able to describe all features in detail. And this article is only an attempt to answer the questions followed the first publication.

Part 4. Journalized NT file system

When we mentioned last time that NTFS is a journalized file system, numerous fans of other FS and OS were indignant. Many letters that I received contained the words that NTFS was just quasi-journalized or it wasn't journalized at all. They described numerous NTFS fatal failures with data losses. In this article we will try to explain the philosophy of journalizing, security facilities from failure and also show you causes of fatal failure. We will try to justify Microsoft approach, at least, we will show you reasons of realized technological solutions that were made by NTFS developers.

Journalized operations

First, we'd like to describe those operations that can be journalized. It is obvious that the full undo-file is impossible. However, it would be nice, i.e.: when re-recording 3 MBytes in the middle of a file, first, we could save new data in the log-file, then we'd record the present 3 MBytes in this file and finally we'd operate the real data. This approach guarantees safety of the data. But there is one drawback - processing speed loss. We spend much more time, because we operate the date three times instead of one. Full journalizing is used when working with data base. In this case, the data safety is guaranteed. Full journalizing of OS, in our opinion, is not rational for home PCs and servers.

When creating NTFS, the developers minded the "processing speed" rather than "reliability". Journalizing shouldn't only disturb FS working. In NTFS there used journalizing of logic structures rather than user's data. The absence of file data journalizing can lead to their losses. The journalized operations in NTFS are the operations with the structure of the system itself: i.e. operations with files, folders, also file adding, renaming, moving, creating, deleting and defragmentation operations. That is, all logic operations are journalized.

Lazy write and control points of journalizing

Any modern system uses caching for speeding-up file operations. Lazy write is a principle of caching when the data are kept in cache some time and then, when the system is not busy, are physically being written on the disc. The lazy write increases the efficiency of disc operations. The matter is that the lazy write allows to work with more urgent operations, i.e. urgent reading. How to synchronize the lazy write with journalizing? It's quite a difficult question, since the lazy write makes data losses possible. The record time of different data might become unsynchronized.

NTFS solves this problem with the help of understandability integration of lazy write and journalizing. At the attempt to start a journalized operation, the intentions are at once got written in the log-file. The only held operation is the record of successful completion of the previous transactions, so called control point. In a definite time interval the system records all the held operations on the disc. After that the control point is recorded, it means that all previous operations are completed correctly (physically and logically).

Such a working mode doesn't cause the work slowing (the control points recording is made immediately, and the record of starting operation in the journal corresponds to the record of the data without lazy caching). The physical record is made later and doesn't hurt the system performance.

Lazy journalizing problem: the conception of data doubling

The described theory is quite good, let's take a close look at some features.

Let's consider the situation: Journal has received a record "file N is being deleted". Then cache made a mark that the space occupied by the file got free. And after that it deletes the file in MFT physical structure. Let's assume that the disc work actively, and the free place is taken by another file. There happens a failure. The system studies the journal and mark the uncompleted operation: "the file N is being deleted". The control point after this operation is absent. There must follow the file undeleting. However, that physical space already contains other data.

To prevent such situations, there is a principle of "temporary occupied space". The space that got free is not free until all operations with logical structure do not complete physically. The given mechanism in NTFS is not synchronized with the control point marking.

Permissions that ensure reliability

Why could NTFS still fail?

The hard drive, in the normal mode, must record those data due to that path that were defined by OS. It may not work this way, if the system has unreliable cables, the CPU, RAM or controllers. It's the most common reason of NTFS failure. An unoverclocked CPU, high quality memory, good motherboard and UDMA protocol will help you.
In case of damage, power supply off or a signal "fault" from the controller, the hard drive must complete the work correctly (the temporary sector status is prohibited). The modern HDDs will help you.
The hard drive has to make records of the data with flag "do not cache" at once. The modern HDDs support a lazy record. Metafiles of NTFS got renewed in the mode "write immediately". The controller (HDD) must implement it.
The hard drive must read those data which have been recorded. Otherwise, it sends the signal "fault". All modern HDD are able to implement this.

The fulfilling these requirements ensures the reliable working of NTFS. Unfortunately, in most cases the failure happens because of the hardware. I understand that there cannot be the absolute reliability. And Microsoft has taken the way of labor differentiation, the company doesn't account for the system reliability. Unfortunately, the most of PCs have defective components. Many users are overclocking CPUs. All these things are damaging your NTFS.

Part 5. The soft RAID

As we have already mentioned, NTFS journalizing doesn't prevent failures with user's data losses. However, NT offers several variants of system where everything is guaranteed. Several discs can be used to provide the reliability and speeding-up. This is the topic we are going to speak of later.

RAID is a Redundant Array of Inexpensive Disks. The technology consists in simultaneous usage of several discs. It assures the reliability and high speed of the system.

Windows NT supports 3 levels of RAID. The brief characteristics are presented in the table.

	System performance in comparison to the usual discs	Reliability	Total disc space
RAID 0 Parallel discs Performance increasing at the expense of disc doubling.	The speed of reading/writing increases proportionally the number of discs (in theory). In practice, performance increase is less (it constitutes 50-90% of this figure).	Goes lower. A fatal failure of one disc will lead to another failure.	Equals the sum of disc sizes.
RAID 1 Mirror discs The reliability increase at the expense of data doubling.	The read speed increases proportionally the number of the discs (in theory). Although, in practice, it's lower. The writing speed goes down.	The data loss can happen in case of all discs fail.	Remains the same.
RAID 5 Parallel discs with parity The combination of RAID 1 and RAID 0	The read speed increases (alike RAID 0), but the number of the discs, that influence the performance, are to be reduced (minus 1). The writing speed is higher than each disc has separately.	The data loss can occur in case of failure of two discs. One disc failure decreases the speed of the whole array.	Increases.

Now, we will describe each type of RAID in detail.

RAID 0 (Parallel discs)

This strategy is aimed at performance increase. Some discs store the disc structures, which are collected in one part only when all discs are available.

The simple realization of RAID 0 (two discs): each first sector of the volume is situated on the physical disc A, and each second one is on the disc B. The speed of reading/writing increases proportionally the number of discs.

The performance of operations with the data depends on whether disc is free and whether it's ready to fulfill your requirements. I.e., RAID 0 includes 2 discs. The first disc is occupied, there comes a new command; the probability of our applying to the free disc constitutes 50%. This corresponds to the performance increase in 1.5 times.
The sequential operations (reading/writing) is N times faster than on a separate physical disc (N is the number of discs). It is so, because the probability, that the next operation gets on the free disc, constitutes 100%.

RAID 0 tremendously increases the performance of the linear operations and random data with the number of discs increasing. For an effective work with disc system there required a multitask mode of one or several controllers. Bus-Mastering driver is a mandatory requirement for working with IDE interface. Windows 2000 includes these drivers. NT4 might have needed additional drivers.

RAID 0 reliability is quite low. A fatal failure of one disc will lead to another failure. The more discs you are using, the higher is the probability of any disc failure.

RAID 1 (mirror discs)

The simple way to assure the data safety is to create the copy. The record is made on two discs (it makes the process slower). The reading is realized from a free disc.

Some problems can occur when the system is not sure in the identity of the two discs. The operation of comparing after damage can take a lot of time. If you have chosen this way, you had better buy a hardware RAID controller which can replace a broken disc while working.

The complete damage of one disc doesn't cause the data losses because the discs are mirror.

RAID 5 (Parallel discs with parity)

This strategy is the most successful and effective scheme of RAID, which consists of 3 or more discs. The data of parity complete the information. These data are located on another disc than the controlling information.

Here is an example of parity conception. We have 5 bits {0, 1, 1, 0, 1}. Then we create the 6-th bit, a bit of parity. If the number of "1s" is even, then the bit of parity is equal to 1. Otherwise, it's 0. So, we have got 6 bits {0, 1, 1, 0, 1, <0>}.

Let's assume, we have lost 1 bit - {0, X, 1, 0, 1, <0>}. And knowing the bit of parity, we can recover the whole set.

The operations of parity can be implemented not only with bits. This method is used in data recovery algorithms.

Come back to RAID 5:

On the figure you can see an array of 5 discs. Each disc contains 4 parts of real data and one block of data of parity. The latter can recover one fragment that was lost. All these fragments, in their turn, can recover the block of parity. According to RAID structure, the data for recovery of the whole column are on other discs. The data reading from a failure-free disc is being carried out without blocks of parity (alike the case of RAID 0). The performance in this case is higher than that of RAID 0.

In case of failure the performance of the array decreases tremendously. For example, it's impossible to read D4, this block has to be recovered with the help of other blocks. In our case, they are the blocks 4 parity B4, C4 and E4.

Permissions that ensure reliability

RAID doesn't ensure the absolute guarantee. It can got broken on the unreliable PC as well as on the disc system. RAID won't help you in the following cases:

The correct writing of incorrect data and the writing beside the mark. Bad memory, CPU, cable, controller can cause it.
Disc can't inform of the reading error.

RAID is intended for damage minimization in case of hard drive failure (or controller). Failure of other parts of hardware are left out of account.

For more details about the system of soft RAID Windows NT, refer to the Help of the program Disc Administrator, which creates this type of discs. Notice, that facilities of working stations to create and use RAIDs are limited (i.e. NT4 supports only RAID 0).

Part 6. The strategy of NTFS volume recovery

The computer with NTFS can't boot. What to do? How to recover the data? There are two solutions.

1. The first way: the system is on the same NTFS disc. The system has just stopped booting. In 90% cases you are to recover not NTFS but NT. We will describe how to install NT into the same NTFS partition. This system will read your data.

NT4 users can install the system by running setup program.

You will need a CD which contains a correct distributive NT4. On this disc NT must be located in the folder i386 (in the root directory). The command winnt /?, that is run in this directory, will help you to choose the keys for creating of 3 bootable diskettes. They will help you to set NT4 onto the disc with NTFS. You can choose another directory for installation, and then to try to recover your own installation of NT. The installed system will write itself in the booting list correctly and won't hurt your old NT4.

In case of absence of CD in the appropriate format, you have to install NT into another partition, since the disc with NTFS is not available from the system different from NT.

NT4 can't be installed into NTFS that has got a new format from Windows 2000. NT4 reads such NTFS only when the packet SP4 or higher is present.

Notice, that it's impossible to recover NT without the recovering disc (it is created in NT4 with the command rdisk /s, in Windows2000 - backup). But this work is only for an expert.

2. The system works, but the disc is not available. Disk Administrator shows a type of unknown for your partition. It means that there was a rewriting of the boot sector. The OS NT stores a copy of a boot sector at the end of the partition. If you copy it in the required place, the disc will be identified as NTFS.

The process of calculation of the right addresses is quite difficult. We won't describe it. For more information, refer to MSDN site and find there an article "Knowledge Base" (Q153973). After you complete all the instructions, the system will be identified as NTFS. The command chkdsc will help you as well, it is an utility of recovery of NT disc system.

Write a comment below. No registration needed!