Chapter 18 - Choosing a File System
This chapter provides an overview of the capabilities and limitations of the FAT and NTFS file systems. Chapter 17, "Disk and File System Basics," describes the structure of FAT and NTFS volumes from the perspective of how each file system organizes the data on the disk. This chapter focuses on the user aspects of each file system.
You can use either or both of these file systems on a computer. Which one you choose for your computer, or for an individual volume on your computer, depends on such things as:
NTFS provides several advantages when compared to FAT, which are described in this chapter. But, if you will be using another operating system in addition to Windows NT, you can access files on NTFS volumes only from Windows NT. You must use a file system other that NTFS for the system and boot partitions for the other operating systems.
For more information about file systems, especially NTFS, see Inside the Windows NT File System, by Helen Custer (Microsoft Press 1994, ISBN 1-55615-660-X), which documents the NTFS file system design.
Comparing FAT and NTFS File Systems
This section presents a high-level overview of why you might decide to use each of the file systems.
The FAT and NTFS file systems both support long filenames (up to 255 characters), so naming conventions do not matter when it comes to choosing the file system.
FAT File System Advantages
The FAT file system can be used with operating systems other than Windows NT, such as Windows 95, Windows for Workgroups, MS-DOS, and OS/2.
The FAT file system is best used on smaller volumes, because the FAT file system starts out with very little overhead. It works well up to about a 500 MB volume, but is very inefficient for volumes larger than 1 gigabyte (GB).
The FAT file system is a better choice than NTFS for volumes that are smaller than approximately 400-500 MB, because of the amount of disk space overhead involved in NTFS. This overhead is in the form of NTFS system files and the log file, which can use several percent of the total disk space on a small volume.
NTFS File System Advantages
The NTFS file system is best for use on volumes of about 400 MB or more, because performance does not degrade as much on larger NTFS volumes as compared to larger FAT volumes. However, if you want to use features that are available only on NTFS, such as file security or compression, you can use NTFS on smaller volumes.
There are several features that are provided only by the NTFS file system:
There is more information about each of these features presented later in this chapter.
Which is Faster, FAT or NTFS?
There are no simple answers to this question.
For small folders, the FAT file system might provide faster access to files, because:
The NTFS file system uses a B-tree structure for all folders. This structure minimizes the number of disk accesses required to find a file, which makes access to the file faster than for a file on the FAT file system. In addition, if a folder is small enough to fit into the MFT record, you read the entire folder when you read its MFT record.
A FAT folder entry contains an index into the file allocation table, which identifies the cluster number for the first cluster of the folder. When you want to view a file, the FAT file system has to walk the folder structure to get to the file. For example, to start Disk Administrator, which is C:\Winnt40\System32\Windisk.exe, Windows NT finds the file on a FAT primary partition or logical drive by:
In comparing performance on large folders having both long and short filenames, the speed of a FAT operation depends on the operation itself and the size of the folder. Creating files on a FAT folder might be faster. Opening a file might be faster on a FAT folder if the file is at the front of the folder. If the file doesn't exist, FAT has to search the entire folder to find this out. The search takes longer on FAT than on the B-tree structure used by the NTFS file system. Folder enumeration might be faster on a FAT folder. In mathematical terms, the average time to find a file on a FAT folder is a function of N/2, where N is the number of files. On an NTFS folder, the average time is a function of Log N. For small values of N, N/2 might actually be less than or equal to LogN. For larger values of N, LogN is less than N/2.
Several factors unique to the two file systems affect the speed with which Windows NT reads or writes a file.
The maximum file size and volume size for the FAT file system is 232 bytes (4 GB). FAT can support a maximum of 65,535 clusters per volume.
The NTFS architecture is designed to use numbers up to 264 bytes (16 exabytes, or 18,446,744,073,709,551,616 bytes). In theory, it is possible to have a volume 264 bytes in size. Even if there were hardware available to supply a logical volume of that capacity, there are other limitations to the maximum size of a volume.
Partition Tables are limited by industry standards to 232 sectors. (For more information about Partition Tables, see the sections titled "Partition Table" and "Using Hard Disks With More Than 1024 Cylinders (x86-based Computers)" in Chapter 17, "Disk and File System Basics."
The sector size is a function of hardware and industry standards, and is normally 512 bytes. While sector sizes might increase in the future, the current size puts a limit on a single volume of 2 terabytes (232 * 512 bytes, or 241 bytes).
Windows NT provides the capability to combine discontiguous disk areas when creating volume sets, and stripe sets, but these volumes have the same limitations. Even if there were physical disks with 2 terabyte (TB) capacity, they could not be combined to create larger volumes without increasing the physical sector size.
There is one case where you can combine disks that exceed the 2 TB limit. In Disk Administrator, there is an option to extend a volume set. You do this by selecting an existing primary partition, logical drive, or volume set that is already formatted as NTFS. You can then combine it with one or more areas of free space and select Extend Volume Set from the Partition menu. You end up with a larger NTFS volume set. However, if anything goes wrong with a volume set larger than 2 TB, you cannot just reformat it. You must again start with a smaller volume set (<=2 TB) and use the Extend Volume Set selection to extend the volume set to larger that 2 TB.
For the time being, 2 TB should be considered the practical limit for both physical and logical volumes using NTFS.
With the NTFS file system, you can have a maximum of 232 clusters per file. This is an implementation limitation, not a file size restriction.
There is no specific limit to the number of files on an NTFS volume. However, you will not be able to create any new files when your volume is so full that Windows NT cannot allocate another entry in the MFT.
Which File System to Use on Which Volumes
This is a general recommendation for determining which file systems to use for which volumes:
This arrangement gives you the following benefits:
Here is some additional information to help you choose your file system(s):
These additional guidelines might affect your decision as to which file system(s) to use on your computers:
Controlling Access to Files and Folders
On NTFS volumes, you can set permissions on files and folders that specify which groups and users have access to them, and what level of access is permitted. NTFS file and folder permissions apply both to users working at the computer where the file is stored and to users accessing the file over the network when the file is in a shared folder. With NTFS you can also set share permissions, which operate on shared folders in combination with file and folder permissions.
Note To preserve permissions when you copy or move files between NTFS folders, use the Scopy program on the Windows NT Workstation Resource Kit CD.
Although the NTFS file system provides access controls to individual files and folders, users can perform certain actions to files or folders even if permissions are set on a file or folder to prevent access to users.
For example, you have a folder (Dir1) containing a file (File1), and you grant Full Control to a user for the folder Dir1. If you specify that the user have No Access to File1, the user can still delete File1. This is because the user's Full Control rights in the folder allow the user to delete contents (or children) of the folder.
To prevent files from being deleted, you must set permissions on the file itself, and you must set permissions for the folder containing the file. Anyone who has Full Control in a folder can delete files from the folder.
Similarly, anyone who has List, Read, or greater permissions in a folder can view file properties on any file in the folder, even if they are prevented by file permissions from seeing the contents of the file.
With FAT volumes, you cannot set any permissions on the individual files and folders. The only security is share permissions that are set on the entire share, affect all files and folders on that share, and only function over the network. Once a folder is shared, you can protect the shared folder by specifying one set of share permissions that applies to users for all files and subfolders of the shared folder. Share permissions are set in very much the same way as file and folder permissions are set in NTFS. But because share permissions apply globally to all files and folders in the share, they are significantly less versatile than the file and folder permissions used for NTFS volumes.
Share permissions apply equally to NTFS and FAT volumes. They are enforced by Windows NT, not the individual file system.
POSIX compliance permits UNIX programs to be ported to Windows NT. Windows NT is fully compliant with the Institute of Electrical and Electronic Engineers (IEEE) standard 1003.1, which is a standard for file naming and identification.
The following POSIX-compliant features are included in the NTFS file system:
Caution You must use POSIX-based programs to manage filenames that differ only in case.
Using POSIX-based programs, you can create case-sensitive filenames, where two or more filenames can differ only in case (for example, annm.doc and AnnM.Doc).
While the Windows NT file systems support both case-preservation and case-sensitivity, you cannot use standard commands to manage filenames that differ only in case. (Standard commands include those used at the command-line — such as copy, del, and move — and their My Computer and Windows NT Explorer equivalents.) For example, if you type the following at the command prompt:
both annm.doc and AnnM.Doc are deleted.
For more information about POSIX, see Chapter 29, "POSIX Compatibility."
Features Unique to NTFS
This section describes NTFS functionality that is not supported by the FAT file system.
Multiple Data Streams
The NTFS file system supports multiple data streams. The stream name identifies a new data attribute on the file. Streams have separate opportunistic locks, file locks, allocation sizes, and file sizes, but sharing is per file.
The following is an example of an alternate stream:
This feature permits related data to be managed as a single unit. For example, Macintosh computers use this type of structure to manage resource and data forks. Or, a company might create a program to keep a list of changes to the file in an alternate stream, thus keeping archive information with the current version of the file.
As another example, a library of files might exist where the files are defined as alternate streams, as in the following example:
library:file1 :file2 :file3
You could use a smart compiler to create a file structure like the following example:
program:source_file :doc_file :object_file :executable_file
You can use the Win32 API CreateFile to create an alternate data stream. Or, from the command prompt, you can enter commands such as:
echo text>program:source_file more <program:source_file
Note Because the NTFS file system is not supported on floppy disks, when you copy an NTFS file to a floppy disk, data streams and other attributes not supported by the FAT file system are lost.
Windows NT supports compression on an individual file basis for NTFS volumes. Files that are compressed on an NTFS volume can be read and written without first being uncompressed by another program. Uncompression happens automatically during the read of the file. The file is compressed again when it is closed or explicitly saved.
Compressed files and folders have an attribute of C when viewed in My Computer or Windows NT Explorer. You can select an alternate color for compressed files and folders by:
Only the NTFS file system can read the compressed form of the data. When an program like Microsoft Word for Windows or an operating system command like Copy requests access to the file, the NTFS file system uncompresses the file before making it available. For example, if you copy a compressed file from a another Windows NT computer to a compressed folder on your hard disk, the file will be uncompressed, copied, and recompressed.
This compression algorithm is similar to that provided by the MS-DOS 6.0 DoubleSpaceŽ and MS-DOS 6.22 DriveSpaceŽ compression, with one important difference — the MS-DOS functionality compresses the entire primary partition or logical drive, while the NTFS file system enables the user to compress individual files and folders in the NTFS volume.
NTFS compression is not supported for cluster sizes larger than 4K, because
Therefore, you should not use compression and large clusters at the same time. In other words, compression performs reasonably well on sizes up to 4K, but beyond that size, the savings in disk space is not worth the decrease in performance. When the cluster size is >4K on an NTFS volume, none of the NTFS compression functions are available.
For information about creating and formatting volumes, see "Creating and Formatting Volumes," in Chapter 17, "Disk and File System Basics."
Note Windows NT does not support DoubleSpace or DriveSpace compression.
Compressing and Uncompressing Folders and Files
Each file and folder on an NTFS volume has a compression state, which is either Compressed or Uncompressed. The compression state of a folder does not reflect the compression state of the files in that folder. For instance, a folder can have a compression state of Compressed, yet all of the files in that folder could be Uncompressed. Or some of the files in an Uncompressed folder could be Compressed.
You can set the compression state of folders, and compress or uncompress files by using My Computer, Windows NT Explorer, or a command-line program called Compact. When using My Computer or Windows NT Explorer, you can set the compression state of an NTFS folder without changing the compression state of existing files in that folder.
You can set My Computer and Windows NT Explorer to display compressed files and folders with a different color than uncompressed files and folders. On the View menu, click Options. On the View tab, check Display compressed files and folders with alternate color.
Note Compressing the page file does not accomplish much. Windows NT does not compress an open page file, so the user sees an error message box. When trying to compress a closed paging file, the file is compressed without warning (as with any other file). When you restart Windows NT, the page file automatically reverts to the uncompressed state. For information about the page file, look at topics for virtual or virtual memory in Windows NT Help.
Using My Computer or Windows NT Explorer
Each of these programs provide the same compression functionality on NTFS volumes. With My Computer or Windows NT Explorer, you can:
To set the compression state of the folder:
My Computer or Windows NT Explorer pop up a dialog box asking whether all the files and subfolders in the folder should be compressed or uncompressed. Existing files or subfolders in the NTFS folders retain their compression state unless you select Yes in this dialog box.
To work with individual files:
You can also use the Properties tab to view the compressed size and compression ratio of the selected file.
Using the Compact Program
The Compact program is the command line version of the compression functionality in My Computer or Windows NT Explorer. The Compact command displays and alters the compression of folders and files on NTFS volumes. It also displays the compression state of folders.
There are reasons why you would want to use this program instead of My Computer or Windows NT Explorer:
Note Unlike My Computer or Windows NT Explorer, the Compact program does not prompt you on whether you want to compress or uncompress files and subfolders when you set the compression state of a folder. It automatically does the compression or uncompression of any files that are not already in the compression state you just set for the folder.
For more information about this program, type compact /? at the command prompt, or see "File System Utilities" in Chapter 22, "Disk, File System, and Backup Utilities."
Effects of Moving and Copying Files
When you replace an existing file in an NTFS folder, the file might retain its compression state regardless of the compression state of the folder and the compression state of the source file. Thus, if you copy a compressed NTFS file to a compressed NTFS folder, but it replaces an uncompressed file, the resulting file will probably be uncompressed.
For example, you have a file named Testthis in two folders, To and From. The To folder has a compression state of compressed, but the file To\Testthis is uncompressed. The From folder also has a compression state of compressed, and the file From\Testthis is compressed. When you use Windows NT Explorer to copy From\Testthis to To\Testthis, the file To\Testthis will be uncompressed.
Note The effects described in this section are generally true when using Microsoft programs. Third-party utilities might function differently.
Moving and Copying Files Within NTFS Volumes
On an NTFS volume, when you move a file or folder from one folder to another, the compression attribute, like any other attribute, is retained regardless of the compression of either the target or source folder. For example, if you move an uncompressed file to a compressed folder, the file remains uncompressed after the move.
However, if you copy a file from one folder to another, the compression attribute of the file is changed to that of the target folder. For example, if you copy a compressed file to an uncompressed folder, the file is automatically uncompressed when it is copied to the folder.
Moving and Copying Files Between NTFS Volumes
When you move or copy a file or folder from one NTFS volume to another NTFS volume, it inherits the compression attributes of the target folder.
Moving and Copying Files Between FAT and NTFS Volumes
Like files copied between NTFS folders, files moved or copied from a FAT folder to an NTFS folder always inherit the compression attribute of the target folder. Moving a file from a FAT folder to an NTFS folder causes a copy of the file, followed by a delete.
Since Windows NT supports compression only for NTFS files, any compressed NTFS files moved or copied to a FAT volume are automatically uncompressed. Similarly, compressed NTFS files copied or moved to a floppy disk are automatically uncompressed.
Adding Files to an Almost Full NTFS Volume
When adding files to an NTFS volume that is almost full, you can get unexpected error messages. The philosophy behind these errors is that the NTFS file system wants to make sure it has enough disk space to write the entire file if it cannot be compressed, regardless of the degree of compression in the file when it is opened. For instance, it is possible to get a read error when you are trying to open a compressed file.
If you copy files to a compressed NTFS folder that doesn't have enough room for all of the files in their uncompressed state, you might get an error that says "...there is not enough space on the disk" even though they will all fit when compressed. Because NTFS allocates space based upon the uncompressed size of the file, you can get this error even if the files are already compressed.
This situation occurs because compression is handled asynchronously, and Windows NT uses lazy write (see Inside the Windows NT File System for information about lazy writes). NTFS does not wait for the compression and write of one file to complete before it begins work on subsequent files, and the system doesn't get the unused space back from compression until after the buffer is compressed.
When you are running a program and save files to a compressed folder on a volume that is almost full, the save might or might not be successful — it depends on how much the file compresses, whether the beginning of the file compresses well, and numerous other factors.
If you can't delete any files or don't have any files that you can compress, you can usually copy all of the files if you copy the largest and/or the ones that compress best first. You can also try copying them in smaller groups rather than all at one time.
The book Inside the Windows NT File System provides a high-level description of NTFS compression.
Basically, the NTFS file system provides real-time access to a compressed file, uncompressing the file when it is opened and only compressing when it is closed.
When writing a compressed file, the system reserves disk space for the uncompressed size. The system gets back unused space as each individual compression buffer gets compressed.
Note Some programs do not allocate space before beginning the save, and only pop up an error message when they run out of disk space.
NTFS compression uses a 3-byte minimum search rather that the two-byte minimum used by DoubleSpace. This type of search enables a much faster compression and uncompression (roughly two times faster), while only sacrificing two percent compression for the average text file.
Each NTFS data stream contains information that indicates if any part of the stream is compressed. Individual compressed buffers are identified by "holes" following them in the information stored for that stream. If there is a hole, then the NTFS file system knows to uncompress the preceding buffer to fill the hole.
NTFS Compression Compared to Other Methods
Other compression utilities are available to compress files on computers running Windows NT. These utilities differ from NTFS compression in the following ways:
The Windows NT Workstation Resource Kit includes a compress program and two expand utilities. The compress program can only be run from the command line. There are two versions of the expand program: one runs from the command line, and the other is a Windows NT–based program. For more information, see "File System Utilities" in Chapter 22, "Disk, File System, and Backup Utilities."
As described earlier, the DoubleSpace and DriveSpace compression features in MS-DOS cannot be used when running Windows NT.
NTFS Compression Issues
The only check for whether a user is allowed to change the compression state on a file is whether they have read or write permission. If they have write permission, they can change the compression state locally or across the network.
The two ways to measure the performance of NTFS data compression are size and speed.
You can tell how well compression works by comparing the uncompressed and compressed file and folder sizes. The earlier section of this chapter titled "Using My Computer or Windows NT Explorer" describes using the Properties tab to view the compressed size and compression ratio of a selected file. The section "File System Utilities" in Chapter 22, "Disk, File System, and Backup Utilities," describes using the DirUse program to see the compressed size of folders.
Using NTFS compression might cause performance degradation. One of the reasons this might happen is that, even when copied inside the same computer, a compressed NTFS file is uncompressed, copied, and then recompressed as a new file. Similarly, on network transfers, the file is uncompressed, which affects bandwidth as well as speed.
With data compression, the question is "How will it affect performance on a computer running Windows NT Workstation or Windows NT Server?"
The current implementation of NTFS compression is definitely oriented toward Windows NT Workstation. Compression on a computer running Windows NT Workstation does not seem to produce a substantial performance degradation. No one who has started using NTFS data compression at Microsoft has lodged any complaints about performance degradation.
A computer running Windows NT Server can be another story. Some normal production servers at Microsoft (source servers and binary release servers) have been converted to use NTFS data compression, without any complaints about performance. However, using a tough server benchmark, like NetBench, shows a performance degradation in excess of 50%. Read-only or read-mostly servers, or any lightly loaded servers, may not see a severe performance penalty. Heavily loaded servers with lots of write traffic are poor candidates for data compression, as shown by NetBench.
You really need to measure the effects of data compression in your own environment.
This section briefly discusses file system technology and describes how the NTFS file system implements data recoverability.
Until now, there were two types of file systems — careful-write file systems and lazy-write file systems. The NTFS file system introduces a third type — a recoverable file system.
A careful-write file system is designed around the idea that it is important to keep the volume structure consistent. Disk writes for each update are ordered so that if the system failed between two disk writes, the volume would be left in an understandable state, but with the possibility of an inconsistency. You seldom need to run utilities such as Chkdsk on a careful-write file system. An example of a careful-write file system is the FAT file system on MS-DOS.
The disadvantage of careful-write file systems is that serialized writes can be slow, because each disk write must be completed before the next disk write can begin.
A second kind of file system, such as the FAT file system on Windows NT and most UNIX file systems, is called a lazy-write file system. This type was designed to speed up disk accesses. A lazy-write file system uses an intelligent cache-management strategy and provides a way to recover data (such as the Chkdsk program) if there is an error when writing to the disk. All data are accessed via the file cache. While the user searches folders or reads files, data to be written to disk accumulates in the file cache. If the same data are modified several times, all those modifications are captured in the file cache. The result is that the file system needs to write to disk only once to update the data.
Chapter 5, "Windows NT 4.0 Workstation Architecture," and Chapter 15, "Detecting Cache Bottlenecks," contain more information about the file cache.
The disadvantage of a lazy-write file system is that, in the event of a disk crash, recovery is slower, because a program such as Chkdsk must then scan the disk to check that what should have been written to disk matches what was written.
NTFS is a third kind of file system — a recoverable file system. It combines the speed of a lazy-write file system with virtually instant recovery.
NTFS guarantees the consistency of the volume by using standard transaction logging and recovery techniques, although it does not guarantee the protection of user data. It includes a lazy-write technique plus a volume-recovery technique that takes typically only a second or two to insure the integrity of all NTFS volumes each time the computer is restarted. The transaction logging, which allows NTFS to recover quickly, requires a very small amount of overhead compared with careful-write file systems.
NTFS also uses a technique called bad-cluster remapping to minimize the effects of a bad sector on an NTFS volume. For more information, see the section "Cluster Remapping," presented later in this chapter.
It is possible for the Master Boot Record or Partition Boot Sector to be corrupted due to a disk error or system crash. When either of these sectors is corrupted, you might not be able to access any data on the volume. Recovery from errors with the Master Boot Record or the Partition Boot Sector are discussed in Chapter 20, "Preparing for and Performing Recovery."
Lazy commit is an important feature of NTFS. It allows NTFS to minimize the cost of logging to maintain high performance.
Lazy commit is similar to lazy write. Instead of using system resources to mark a transaction as successfully completed as soon as it is performed, the commitment information is cached and written to the log as a background process. If the power source or computer system should fail before the commit is logged, NTFS rechecks the transaction to see whether it was successfully completed. If NTFS cannot guarantee that the transaction was completed successfully, it backs out the transaction. No incomplete modifications to the volume are allowed.
Every few seconds, NTFS checks the cache to determine the status of the lazy writer and marks the status as a checkpoint in the log. If the system crashes subsequent to that checkpoint, the system backs up to that checkpoint for recovery. This method makes recovery faster by minimizing the number of queries that are required during recovery.
Transaction Logging and Recovery
NTFS considers an I/O operation to be a transaction, in the sense that it is an integral operation. Once a disk operation is started, all of its subcomponents must be completed. If the operation fails, there must be a way to rollback the I/O request to a previously known state.
To insure that an I/O operation can be completed or rolled back, NTFS uses the Log File Service to log all redo and undo information for the transaction. Redo is the information that NTFS uses to repeat the transaction, and undo enables NTFS to roll back the transaction that was incomplete or that had an error.
If a transaction completes successfully, the file update is committed. If the transaction is incomplete, NTFS ends or rolls back the transaction by following instructions in the undo information. If NTFS detects an error in the transaction, the transaction is also rolled back.
File system recovery is straightforward with NTFS. If the system crashes, NTFS performs three passes — an analysis pass, a redo pass, and an undo pass. During the analysis pass, NTFS determines exactly which clusters must now be updated, based on the information in the log file. The redo pass performs all transaction steps logged from the last checkpoint. The undo pass backs out any incomplete (uncommitted) transactions.
Effects of File and Disk Cache on Recoverability
The file cache is just an area of RAM that contains the data. When you write data to disk, Windows NT's lazy-write technique says that the data are written, when, in fact, they are still in RAM.
There can also be cache memory on the disk controller or on the disk itself. If you are running MS-DOS or Windows 3.1x, using the disk or controller caches can really help performance. The lazy-write technique in Windows NT improves performance to the point where you might not see much more improvement by using the cache on the disk or the controller.
This information should help you decide whether you want to enable the disk or controller cache:
With transaction logging and recovery, NTFS guarantees that the volume structure will not be corrupted, so all files remain accessible after a system crash. However, user data can be lost because of a system crash or a bad sector.
The NTFS file system implements a recovery technique called cluster remapping. When Windows NT returns a bad sector error to the NTFS file system, NTFS dynamically replaces the cluster containing the bad sector and allocates a new cluster for the data. If the error occurred during a read, NTFS returns a read error to the calling program, and the data are lost. When the error occurs during a write, NTFS writes the data to the new cluster, and no data are lost.
NTFS puts the address of the cluster containing the bad sector in its Bad Cluster File so the bad sector will not be reused.